HUD performance audit

As requested by @wookieejedi and investigated by Claude.

# HUD Performance Investigation — Report

Investigation dispatched five parallel subagents across batching/draw-call efficiency, algorithmic hot spots, radar & iteration cost, Lua/scripting overhead, and existing profiling instrumentation. Consolidated findings below.

## TL;DR — The Smoking Gun

The graphics layer already ships a 2D batching API designed exactly for this problem, and **nothing in the codebase calls it.** From [code/graphics/render.h:195-211](code/graphics/render.h#L195-L211):

> "Start buffering 2D rendering operations. This will defer rendering 2D interface elements until gr_2d_stop_buffer is called. **This can improve performance when doing a lot of 2D operations** since the actual drawing will only be done once."

A grep for `gr_2d_start_buffer` returns zero callers anywhere — HUD, UI, mission load, debug overlays. The HUD renders ~75+ separate `renderLine`/`renderRect`/`renderCircle` calls per frame through this NanoVG path, and each one issues its own `beginFrame`/`endFrame` pair (see [code/graphics/render.cpp:771,785](code/graphics/render.cpp#L771)). Wrapping the gauge loop in [hud.cpp:2124-2147](code/hud/hud.cpp#L2124-L2147) with `gr_2d_start_buffer()`/`gr_2d_stop_buffer()` is a ~2-line change that the engine was designed to accept and that the suspecting devs were right to flag.

The caveat in the header — "might change the drawing order if incompatible rendering commands are executed" — is worth heeding (HUD bitmaps go through a different path than the NanoVG primitives), so this needs validation, not blind enable. But it's the single highest-leverage change identified.

## Architecture Summary

**The HUD render pipeline per frame:**
1. `hud_render_preprocess()` ([hud.cpp:1929](code/hud/hud.cpp#L1929)) — targeting, navigation, brackets, missile tracking
2. `hud_render_all()` ([hud.cpp:2077](code/hud/hud.cpp#L2077)) → `hud_render_gauges()` iterates every gauge in the ship's `hud_gauges` vector (typically 50+), calling `preprocess()` → `onFrame()` → `setupRenderCanvas()` → `canRender()` → `render()` on each
3. Each gauge's `render()` calls primitives that route to `gr_line`, `gr_rect`, `gr_string`, `gr_bitmap`, etc.

**Batching status by primitive:**

| Primitive | Batched? | Notes |
|---|---|---|
| `renderLine` / `renderRect` / `renderCircle` / `renderGradientLine` | No — per-call `beginFrame`/`endFrame` | NanoVG path; would be batched if `buffering_nanovg=true` |
| `renderBitmap` / `gr_aabitmap` | No — immediate material submission per call | Separate path; not affected by `gr_2d_start_buffer` |
| `renderString` (VFNT fonts) | Partially — chars batched into 300-vertex CPU buffer, GPU submission is immediate | One submission per string |
| `renderString` (TTF/NVG fonts) | No equivalent CPU buffering | One draw per glyph in the worst case |

## Algorithmic Hot Spots (file:line)

**Multiple full-list walks per frame** (none individually O(n²), but they stack):

- [hudtarget.cpp:3199 `hud_process_homing_missiles`](code/hud/hudtarget.cpp#L3199) — walks every entry in `Missile_obj_list`, distance check per missile
- [hudtarget.cpp:3369 `hud_process_remote_detonate_missile`](code/hud/hudtarget.cpp#L3369) — **separate** walk of the same `Missile_obj_list` with `g3_rotate_vertex` + `g3_project_vertex` per missile
- [hudtarget.cpp:3694 `hud_show_hostile_triangle`](code/hud/hudtarget.cpp#L3694) — full `Ship_obj_list` walk + nested subsystem loop (3742-3763) computing turret distances every frame, with no temporal caching of "current top threat"
- [hudescort.cpp:608 `hud_create_complete_escort_list`](code/hud/hudescort.cpp#L608) — full `Ship_obj_list` iteration rebuilt every HUD frame

**Redundant per-frame recomputation in the targetbox** ([hudtargetbox.cpp:1700-1873](code/hud/hudtargetbox.cpp#L1700-L1873)): 20+ `sprintf`/`snprintf` calls and 4+ `gr_get_string_size` calls per frame for hull %, subsystem names, weapon names, AI mode — **with no caching even when the target hasn't changed.** Subsystem name pipe-tokenization via `strtok` happens per frame at line 1841.

**Math hot spots:**
- [hudtarget.cpp:3981 `polish_predicted_target_pos`](code/hud/hudtarget.cpp#L3981) — `vm_vec_dist_quick` inside an iterative loop (multiple sqrts per lead-indicator calculation)
- 14 distance-calculation sites in hudtarget.cpp alone, several inside ship/missile loops
- [radardradis.cpp:118](code/radar/radardradis.cpp#L118) — `vm_vec_normalize` per blip in the render path, when blip position is already known at plot time (single cache slot would fix it)
- [hudshield.cpp:671](code/hud/hudshield.cpp#L671) — generated 3D shield icons use the full `g3_start_frame` / matrix / projection pipeline per quadrant when most missions could use baked textures

**Wasted gauge work** ([hud.cpp:2124-2147](code/hud/hud.cpp#L2124-L2147)): `preprocess()` and `onFrame()` are called on every gauge **before** `canRender()` is checked. Gauges that are off-screen, configured off, or popup-only-and-not-popped pay full preprocessing cost.

**Shared coordinate transforms are not shared.** The same target may be `g3_rotate_vertex`/`g3_project_vertex`-ed by `hud_show_targeting_gauges`, `hud_show_selection_set`, individual targeting gauges, and bracket drawing — no per-frame projection cache keyed by object signature.

## What's Actually Fine

- **Radar blip generation** (radarsetup.cpp). Plotted once per object at post-move time and shared across all radar gauges via global blip lists. No per-gauge re-iteration. This is the right pattern; the rest of the HUD should learn from it.
- **Scripting overhead** is well-optimized for the unhooked common case — `ActiveActions` hash lookup is the cost. Heavily-scripted HUDs would benefit from frame-constant Lua-value caching (Player.Position, etc.) but vanilla missions pay essentially nothing.
- **Mission parsing** (hudparse.cpp's 5729 lines) is load-time only, not per-frame.

## What's Already Instrumented

Only three `TRACE_SCOPE` points currently emit usable data: `RenderHUDGauge` ([hud.cpp:2142,2162](code/hud/hud.cpp#L2142)), `RenderTargetingBracket` ([hudbrackets.cpp:396](code/hud/hudbrackets.cpp#L396)), `RenderNavBracket` ([hudbrackets.cpp:509](code/hud/hudbrackets.cpp#L509)). The categories `RenderMainFrame`, `RenderHUD`, `RenderHUDHook` are declared in [tracing/categories.h](code/tracing/categories.h) but never emit events — instrumenting them first would let you verify these recommendations against real numbers rather than estimates.

## Recommendations (Prioritized by Estimated Impact ÷ Effort)

1. **Wire up the existing `gr_2d_start_buffer`/`gr_2d_stop_buffer` around the gauge loop** in [hud.cpp:2124-2147](code/hud/hud.cpp#L2124-L2147). Two lines plus validation that draw order isn't disturbed by the mixed bitmap/NanoVG paths. Highest-leverage single change.

2. **Reorder the gauge-render loop** so `canRender()` is checked *before* `preprocess()` and `onFrame()`. Off-screen/disabled gauges should pay nothing. ~5-line change in [hud.cpp:2124-2147](code/hud/hud.cpp#L2124-L2147).

3. **Add proper tracing first** — instrument the declared-but-unused `RenderMainFrame`, `RenderHUD` categories and add per-gauge-class scopes. This stops being a guessing game once you have numbers. Without it, the rest of these recommendations are educated estimates.

4. **Cache targetbox strings keyed on target signature + last-changed timestamp** ([hudtargetbox.cpp:1700-1873](code/hud/hudtargetbox.cpp#L1700-L1873)). 20+ sprintfs and 4+ string-size measurements per frame collapse to ~0 when target/state hasn't changed. Subsystem-name tokenization should be done once at target acquisition, not per frame.

5. **Merge the two `Missile_obj_list` walks** at [hudtarget.cpp:3199](code/hud/hudtarget.cpp#L3199) and [3369](code/hud/hudtarget.cpp#L3369) into a single pass that handles both homing-missile tracking and remote-detonate brackets.

6. **Temporal cache for `hud_show_hostile_triangle`** ([hudtarget.cpp:3694](code/hud/hudtarget.cpp#L3694)) — the "current top threat" object rarely changes between frames; recompute only on a fixed interval or when invalidated by death/IFF change.

7. **Coalesce bracket lines** ([hudtarget.cpp:2842-2843, 6041-6114](code/hud/hudtarget.cpp#L2842)) — bracket corners are currently four separate `renderLine`s; with the existing `line_draw_list` machinery (already used by `draw_brackets_square_quick`), audit which call sites still bypass it.

8. **Cache `vm_vec_normalize` result at plot time for DRADIS blips** ([radardradis.cpp:118](code/radar/radardradis.cpp#L118)) — one float[3] added to the blip struct.

9. **Cache escort list and invalidate on ship birth/death events** rather than rebuilding from `Ship_obj_list` each frame ([hudescort.cpp:608](code/hud/hudescort.cpp#L608)).

10. **Lower-priority:** consider whether `polish_predicted_target_pos` ([hudtarget.cpp:3953-3990](code/hud/hudtarget.cpp#L3953)) needs as many iterations as it does, and whether the generated 3D shield icons ([hudshield.cpp:671](code/hud/hudshield.cpp#L671)) could be baked to textures at mission load for the common case.

## Honest Caveats

- The "75+ render calls per frame" and "85-95% reduction" figures from the batching investigation are estimates from reading code, not measured. Before doing big work on item 1, instrument and measure (item 3) so you have a before/after.
- This investigation was read-only — no code was modified.
- Several recommendations (especially temporal caching of hostile triangle and escort list) need careful invalidation logic; the "save once on event" pattern is correct but bug-prone if any event source is missed.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

HUD performance audit #7500

HUD Performance Investigation — Report

TL;DR — The Smoking Gun

Architecture Summary

Algorithmic Hot Spots (file:line)

What's Actually Fine

What's Already Instrumented

Recommendations (Prioritized by Estimated Impact ÷ Effort)

Honest Caveats

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Primitive	Batched?	Notes
`renderLine` / `renderRect` / `renderCircle` / `renderGradientLine`	No — per-call `beginFrame`/`endFrame`	NanoVG path; would be batched if `buffering_nanovg=true`
`renderBitmap` / `gr_aabitmap`	No — immediate material submission per call	Separate path; not affected by `gr_2d_start_buffer`
`renderString` (VFNT fonts)	Partially — chars batched into 300-vertex CPU buffer, GPU submission is immediate	One submission per string
`renderString` (TTF/NVG fonts)	No equivalent CPU buffering	One draw per glyph in the worst case

Uh oh!

HUD performance audit #7500

Description

HUD Performance Investigation — Report

TL;DR — The Smoking Gun

Architecture Summary

Algorithmic Hot Spots (file:line)

What's Actually Fine

What's Already Instrumented

Recommendations (Prioritized by Estimated Impact ÷ Effort)

Honest Caveats

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions