fix: exclude paused sections from instrumentation measurement#44
fix: exclude paused sections from instrumentation measurement#44not-matthias wants to merge 3 commits into
Conversation
Merging this PR will improve performance by ×32
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ❌ | WallTime | BM_memcpy[8192] |
472.7 ns | 540.1 ns | -12.49% |
| ❌ | WallTime | BM_FibonacciRecursive_Darwin[35] |
35.3 ms | 39.6 ms | -10.78% |
| ⚡ | Simulation | BM_large_setup |
13,179,310.3 ns | 882.8 ns | ×15,000 |
| ⚡ | Simulation | BM_large_setup_teardown |
13,179,436.1 ns | 989.2 ns | ×13,000 |
| ⚡ | Simulation | BM_large_setup |
10,561,210.6 ns | 913.9 ns | ×12,000 |
| ⚡ | Simulation | BM_large_setup_teardown |
10,561,302.5 ns | 933.6 ns | ×11,000 |
| ⚡ | WallTime | BM_Vector_PushBack[10000] |
39.7 µs | 34.1 µs | +16.49% |
| ⚡ | WallTime | BM_Vector_Reserve[10000] |
36.1 µs | 31.2 µs | +15.46% |
| ⚡ | WallTime | BM_Vector_Reserve[1000] |
3.6 µs | 3.2 µs | +14.54% |
| ⚡ | WallTime | BM_Vector_PushBack[1000] |
4.3 µs | 3.7 µs | +14.25% |
| ⚡ | WallTime | BM_Vector_Reserve[100] |
390.6 ns | 350.8 ns | +11.34% |
Tip
Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.
Comparing cod-2033-codspeed-cpp-includes-setupteardown-in-instrumentation-mode (5f8b3af) with main (f5a917f)
Congrats! CodSpeed is installed 🎉
You will start to see performance impacts in the reports once the benchmarks are run from your default branch.
|
646ed4c to
696eaa1
Compare
Greptile SummaryThis PR hooks
Confidence Score: 4/5Safe to merge with minor follow-up; the core toggle logic is sound and the boolean guard correctly handles all PauseTiming/ResumeTiming parity scenarios. The Callgrind toggle guard is designed correctly and handles double-pause, post-loop-pause, and skip paths. The main concern is core/include/measurement.hpp — the Important Files Changed
|
4dd226e to
8f254f0
Compare
PauseTiming/ResumeTiming now stop/start Callgrind instrumentation in CODSPEED_ANALYSIS mode, ensuring setup/teardown code is not included in benchmark measurements.
Replace the per-pause CALLGRIND_STOP/START_INSTRUMENTATION calls with CALLGRIND_TOGGLE_COLLECT. Toggling instrumentation flushes callgrind's simulated cache, so every benchmark measured an artificial cold-cache warmup phase and the cache-warming repetition in RunAnalysis was wasted, regressing 76 benchmarks. Toggling collection excludes paused sections without touching simulator state. Gate the toggles to explicit user PauseTiming()/ResumeTiming() calls: TOGGLE_COLLECT is parity-based and collection starts enabled, so the implicit StartKeepRunning()/FinishKeepRunning() bracket must not toggle or it would invert the collection state for the whole run. Bump instrument-hooks for the new callgrind_toggle_collect helper. Fixes COD-2033
…-state Toggle callgrind collection off once at process start and make the PauseTiming()/ResumeTiming() toggles unconditional. Collection is now only enabled inside the benchmark loop, so State setup, timer reads and instrument-hooks zero/dump requests no longer appear in the measurement, and the codspeed_in_benchmark_loop_ gating flag is no longer needed. The toggle is inlined via CALLGRIND_TOGGLE_COLLECT directly (instead of calling the instrument-hooks wrapper) so no toggle frame shows up in flamegraphs; the counted boundary shrinks to the ResumeTiming() epilogue (~6 instructions). SkipWithMessage/SkipWithError restore the toggle parity when a benchmark is skipped mid-loop without pausing. Refs COD-2033
8f254f0 to
5f8b3af
Compare
| // simulated cache, so it adds no artificial cold-cache cost to the | ||
| // measured region. | ||
| if (codspeed_in_benchmark_loop_) { | ||
| callgrind_toggle_collect(); |
There was a problem hiding this comment.
We are in an explicit "PauseTiming" function, using a toggle function rather than an explicit "stop_callgrind" function is most likely a source of bugs
| callgrind_start_instrumentation(); | ||
| // Re-enable callgrind cost collection after a paused section. | ||
| if (codspeed_in_benchmark_loop_) { | ||
| callgrind_toggle_collect(); |
There was a problem hiding this comment.
same remark here
| // Suspend callgrind cost collection for the paused section. Toggling | ||
| // collection (unlike stopping instrumentation) does not flush the | ||
| // simulated cache, so it adds no artificial cold-cache cost to the | ||
| // measured region. |
There was a problem hiding this comment.
Comments are a bit superfluous here
| // True between the implicit ResumeTiming() in StartKeepRunning() and the | ||
| // implicit PauseTiming() in FinishKeepRunning(). Only explicit user | ||
| // PauseTiming()/ResumeTiming() calls (inside the benchmark loop) may | ||
| // toggle callgrind collection: CALLGRIND_TOGGLE_COLLECT is parity-based, | ||
| // and collection starts enabled, so the implicit bracket calls must not | ||
| // toggle or they would invert the collection state for the whole run. |
There was a problem hiding this comment.
Can we trim down this comment and just focus on the sequence of events with the implicit pause/resume that explain why we need this?
Also, if we switch to explicit start and stop callgrind functions, is this still needed?
|
I had not noticed that at the valgrind level there was only toggle. I dont think we should expose explicit start stop then, lgtm |
PauseTiming/ResumeTiming now stop/start Callgrind instrumentation in
CODSPEED_ANALYSIS mode, ensuring setup/teardown code is not included
in benchmark measurements.
Depends on CodSpeedHQ/instrument-hooks#29