As requested by @wookieejedi and investigated by Claude.
HUD Performance Investigation — Report
Investigation dispatched five parallel subagents across batching/draw-call efficiency, algorithmic hot spots, radar & iteration cost, Lua/scripting overhead, and existing profiling instrumentation. Consolidated findings below.
TL;DR — The Smoking Gun
The graphics layer already ships a 2D batching API designed exactly for this problem, and nothing in the codebase calls it. From code/graphics/render.h:195-211:
"Start buffering 2D rendering operations. This will defer rendering 2D interface elements until gr_2d_stop_buffer is called. This can improve performance when doing a lot of 2D operations since the actual drawing will only be done once."
A grep for gr_2d_start_buffer returns zero callers anywhere — HUD, UI, mission load, debug overlays. The HUD renders ~75+ separate renderLine/renderRect/renderCircle calls per frame through this NanoVG path, and each one issues its own beginFrame/endFrame pair (see code/graphics/render.cpp:771,785). Wrapping the gauge loop in hud.cpp:2124-2147 with gr_2d_start_buffer()/gr_2d_stop_buffer() is a ~2-line change that the engine was designed to accept and that the suspecting devs were right to flag.
The caveat in the header — "might change the drawing order if incompatible rendering commands are executed" — is worth heeding (HUD bitmaps go through a different path than the NanoVG primitives), so this needs validation, not blind enable. But it's the single highest-leverage change identified.
Architecture Summary
The HUD render pipeline per frame:
hud_render_preprocess() (hud.cpp:1929) — targeting, navigation, brackets, missile tracking
hud_render_all() (hud.cpp:2077) → hud_render_gauges() iterates every gauge in the ship's hud_gauges vector (typically 50+), calling preprocess() → onFrame() → setupRenderCanvas() → canRender() → render() on each
- Each gauge's
render() calls primitives that route to gr_line, gr_rect, gr_string, gr_bitmap, etc.
Batching status by primitive:
| Primitive |
Batched? |
Notes |
renderLine / renderRect / renderCircle / renderGradientLine |
No — per-call beginFrame/endFrame |
NanoVG path; would be batched if buffering_nanovg=true |
renderBitmap / gr_aabitmap |
No — immediate material submission per call |
Separate path; not affected by gr_2d_start_buffer |
renderString (VFNT fonts) |
Partially — chars batched into 300-vertex CPU buffer, GPU submission is immediate |
One submission per string |
renderString (TTF/NVG fonts) |
No equivalent CPU buffering |
One draw per glyph in the worst case |
Algorithmic Hot Spots (file:line)
Multiple full-list walks per frame (none individually O(n²), but they stack):
Redundant per-frame recomputation in the targetbox (hudtargetbox.cpp:1700-1873): 20+ sprintf/snprintf calls and 4+ gr_get_string_size calls per frame for hull %, subsystem names, weapon names, AI mode — with no caching even when the target hasn't changed. Subsystem name pipe-tokenization via strtok happens per frame at line 1841.
Math hot spots:
- hudtarget.cpp:3981
polish_predicted_target_pos — vm_vec_dist_quick inside an iterative loop (multiple sqrts per lead-indicator calculation)
- 14 distance-calculation sites in hudtarget.cpp alone, several inside ship/missile loops
- radardradis.cpp:118 —
vm_vec_normalize per blip in the render path, when blip position is already known at plot time (single cache slot would fix it)
- hudshield.cpp:671 — generated 3D shield icons use the full
g3_start_frame / matrix / projection pipeline per quadrant when most missions could use baked textures
Wasted gauge work (hud.cpp:2124-2147): preprocess() and onFrame() are called on every gauge before canRender() is checked. Gauges that are off-screen, configured off, or popup-only-and-not-popped pay full preprocessing cost.
Shared coordinate transforms are not shared. The same target may be g3_rotate_vertex/g3_project_vertex-ed by hud_show_targeting_gauges, hud_show_selection_set, individual targeting gauges, and bracket drawing — no per-frame projection cache keyed by object signature.
What's Actually Fine
- Radar blip generation (radarsetup.cpp). Plotted once per object at post-move time and shared across all radar gauges via global blip lists. No per-gauge re-iteration. This is the right pattern; the rest of the HUD should learn from it.
- Scripting overhead is well-optimized for the unhooked common case —
ActiveActions hash lookup is the cost. Heavily-scripted HUDs would benefit from frame-constant Lua-value caching (Player.Position, etc.) but vanilla missions pay essentially nothing.
- Mission parsing (hudparse.cpp's 5729 lines) is load-time only, not per-frame.
What's Already Instrumented
Only three TRACE_SCOPE points currently emit usable data: RenderHUDGauge (hud.cpp:2142,2162), RenderTargetingBracket (hudbrackets.cpp:396), RenderNavBracket (hudbrackets.cpp:509). The categories RenderMainFrame, RenderHUD, RenderHUDHook are declared in tracing/categories.h but never emit events — instrumenting them first would let you verify these recommendations against real numbers rather than estimates.
Recommendations (Prioritized by Estimated Impact ÷ Effort)
-
Wire up the existing gr_2d_start_buffer/gr_2d_stop_buffer around the gauge loop in hud.cpp:2124-2147. Two lines plus validation that draw order isn't disturbed by the mixed bitmap/NanoVG paths. Highest-leverage single change.
-
Reorder the gauge-render loop so canRender() is checked before preprocess() and onFrame(). Off-screen/disabled gauges should pay nothing. ~5-line change in hud.cpp:2124-2147.
-
Add proper tracing first — instrument the declared-but-unused RenderMainFrame, RenderHUD categories and add per-gauge-class scopes. This stops being a guessing game once you have numbers. Without it, the rest of these recommendations are educated estimates.
-
Cache targetbox strings keyed on target signature + last-changed timestamp (hudtargetbox.cpp:1700-1873). 20+ sprintfs and 4+ string-size measurements per frame collapse to ~0 when target/state hasn't changed. Subsystem-name tokenization should be done once at target acquisition, not per frame.
-
Merge the two Missile_obj_list walks at hudtarget.cpp:3199 and 3369 into a single pass that handles both homing-missile tracking and remote-detonate brackets.
-
Temporal cache for hud_show_hostile_triangle (hudtarget.cpp:3694) — the "current top threat" object rarely changes between frames; recompute only on a fixed interval or when invalidated by death/IFF change.
-
Coalesce bracket lines (hudtarget.cpp:2842-2843, 6041-6114) — bracket corners are currently four separate renderLines; with the existing line_draw_list machinery (already used by draw_brackets_square_quick), audit which call sites still bypass it.
-
Cache vm_vec_normalize result at plot time for DRADIS blips (radardradis.cpp:118) — one float[3] added to the blip struct.
-
Cache escort list and invalidate on ship birth/death events rather than rebuilding from Ship_obj_list each frame (hudescort.cpp:608).
-
Lower-priority: consider whether polish_predicted_target_pos (hudtarget.cpp:3953-3990) needs as many iterations as it does, and whether the generated 3D shield icons (hudshield.cpp:671) could be baked to textures at mission load for the common case.
Honest Caveats
- The "75+ render calls per frame" and "85-95% reduction" figures from the batching investigation are estimates from reading code, not measured. Before doing big work on item 1, instrument and measure (item 3) so you have a before/after.
- This investigation was read-only — no code was modified.
- Several recommendations (especially temporal caching of hostile triangle and escort list) need careful invalidation logic; the "save once on event" pattern is correct but bug-prone if any event source is missed.
As requested by @wookieejedi and investigated by Claude.
HUD Performance Investigation — Report
Investigation dispatched five parallel subagents across batching/draw-call efficiency, algorithmic hot spots, radar & iteration cost, Lua/scripting overhead, and existing profiling instrumentation. Consolidated findings below.
TL;DR — The Smoking Gun
The graphics layer already ships a 2D batching API designed exactly for this problem, and nothing in the codebase calls it. From code/graphics/render.h:195-211:
A grep for
gr_2d_start_bufferreturns zero callers anywhere — HUD, UI, mission load, debug overlays. The HUD renders ~75+ separaterenderLine/renderRect/renderCirclecalls per frame through this NanoVG path, and each one issues its ownbeginFrame/endFramepair (see code/graphics/render.cpp:771,785). Wrapping the gauge loop in hud.cpp:2124-2147 withgr_2d_start_buffer()/gr_2d_stop_buffer()is a ~2-line change that the engine was designed to accept and that the suspecting devs were right to flag.The caveat in the header — "might change the drawing order if incompatible rendering commands are executed" — is worth heeding (HUD bitmaps go through a different path than the NanoVG primitives), so this needs validation, not blind enable. But it's the single highest-leverage change identified.
Architecture Summary
The HUD render pipeline per frame:
hud_render_preprocess()(hud.cpp:1929) — targeting, navigation, brackets, missile trackinghud_render_all()(hud.cpp:2077) →hud_render_gauges()iterates every gauge in the ship'shud_gaugesvector (typically 50+), callingpreprocess()→onFrame()→setupRenderCanvas()→canRender()→render()on eachrender()calls primitives that route togr_line,gr_rect,gr_string,gr_bitmap, etc.Batching status by primitive:
renderLine/renderRect/renderCircle/renderGradientLinebeginFrame/endFramebuffering_nanovg=truerenderBitmap/gr_aabitmapgr_2d_start_bufferrenderString(VFNT fonts)renderString(TTF/NVG fonts)Algorithmic Hot Spots (file:line)
Multiple full-list walks per frame (none individually O(n²), but they stack):
hud_process_homing_missiles— walks every entry inMissile_obj_list, distance check per missilehud_process_remote_detonate_missile— separate walk of the sameMissile_obj_listwithg3_rotate_vertex+g3_project_vertexper missilehud_show_hostile_triangle— fullShip_obj_listwalk + nested subsystem loop (3742-3763) computing turret distances every frame, with no temporal caching of "current top threat"hud_create_complete_escort_list— fullShip_obj_listiteration rebuilt every HUD frameRedundant per-frame recomputation in the targetbox (hudtargetbox.cpp:1700-1873): 20+
sprintf/snprintfcalls and 4+gr_get_string_sizecalls per frame for hull %, subsystem names, weapon names, AI mode — with no caching even when the target hasn't changed. Subsystem name pipe-tokenization viastrtokhappens per frame at line 1841.Math hot spots:
polish_predicted_target_pos—vm_vec_dist_quickinside an iterative loop (multiple sqrts per lead-indicator calculation)vm_vec_normalizeper blip in the render path, when blip position is already known at plot time (single cache slot would fix it)g3_start_frame/ matrix / projection pipeline per quadrant when most missions could use baked texturesWasted gauge work (hud.cpp:2124-2147):
preprocess()andonFrame()are called on every gauge beforecanRender()is checked. Gauges that are off-screen, configured off, or popup-only-and-not-popped pay full preprocessing cost.Shared coordinate transforms are not shared. The same target may be
g3_rotate_vertex/g3_project_vertex-ed byhud_show_targeting_gauges,hud_show_selection_set, individual targeting gauges, and bracket drawing — no per-frame projection cache keyed by object signature.What's Actually Fine
ActiveActionshash lookup is the cost. Heavily-scripted HUDs would benefit from frame-constant Lua-value caching (Player.Position, etc.) but vanilla missions pay essentially nothing.What's Already Instrumented
Only three
TRACE_SCOPEpoints currently emit usable data:RenderHUDGauge(hud.cpp:2142,2162),RenderTargetingBracket(hudbrackets.cpp:396),RenderNavBracket(hudbrackets.cpp:509). The categoriesRenderMainFrame,RenderHUD,RenderHUDHookare declared in tracing/categories.h but never emit events — instrumenting them first would let you verify these recommendations against real numbers rather than estimates.Recommendations (Prioritized by Estimated Impact ÷ Effort)
Wire up the existing
gr_2d_start_buffer/gr_2d_stop_bufferaround the gauge loop in hud.cpp:2124-2147. Two lines plus validation that draw order isn't disturbed by the mixed bitmap/NanoVG paths. Highest-leverage single change.Reorder the gauge-render loop so
canRender()is checked beforepreprocess()andonFrame(). Off-screen/disabled gauges should pay nothing. ~5-line change in hud.cpp:2124-2147.Add proper tracing first — instrument the declared-but-unused
RenderMainFrame,RenderHUDcategories and add per-gauge-class scopes. This stops being a guessing game once you have numbers. Without it, the rest of these recommendations are educated estimates.Cache targetbox strings keyed on target signature + last-changed timestamp (hudtargetbox.cpp:1700-1873). 20+ sprintfs and 4+ string-size measurements per frame collapse to ~0 when target/state hasn't changed. Subsystem-name tokenization should be done once at target acquisition, not per frame.
Merge the two
Missile_obj_listwalks at hudtarget.cpp:3199 and 3369 into a single pass that handles both homing-missile tracking and remote-detonate brackets.Temporal cache for
hud_show_hostile_triangle(hudtarget.cpp:3694) — the "current top threat" object rarely changes between frames; recompute only on a fixed interval or when invalidated by death/IFF change.Coalesce bracket lines (hudtarget.cpp:2842-2843, 6041-6114) — bracket corners are currently four separate
renderLines; with the existingline_draw_listmachinery (already used bydraw_brackets_square_quick), audit which call sites still bypass it.Cache
vm_vec_normalizeresult at plot time for DRADIS blips (radardradis.cpp:118) — one float[3] added to the blip struct.Cache escort list and invalidate on ship birth/death events rather than rebuilding from
Ship_obj_listeach frame (hudescort.cpp:608).Lower-priority: consider whether
polish_predicted_target_pos(hudtarget.cpp:3953-3990) needs as many iterations as it does, and whether the generated 3D shield icons (hudshield.cpp:671) could be baked to textures at mission load for the common case.Honest Caveats