[efficiency-improver] perf: eliminate redundant ConcurrentDictionary updates in DiscoveryDataAggregator hot path#16213
Conversation
…taAggregator hot path
In MarkSourcesBasedOnDiscoveredTestCases, the original code called
MarkSourcesWithStatus(new[] { currentSource }, PartiallyDiscovered)
for every test case when the source had not changed. This resulted in:
- O(N) ConcurrentDictionary.AddOrUpdate calls per source (N = test count)
- O(N) single-element array allocations
After the first call, the status is already PartiallyDiscovered so all
subsequent calls with the same source are no-ops. The fix is to only
call MarkSourceWithStatus when previousSource is null (first encounter).
Also extracts a private MarkSourceWithStatus(string?, DiscoveryStatus)
helper to avoid the array allocation and IEnumerable overhead on the
three remaining single-source call sites.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR optimizes the CrossPlatEngine discovery hot path by avoiding redundant ConcurrentDictionary.AddOrUpdate calls and single-element array allocations while tracking per-source discovery status in DiscoveryDataAggregator.
Changes:
- Refactors source-status updates by extracting a single-source helper (
MarkSourceWithStatus) and havingMarkSourcesWithStatusdelegate to it. - Updates
MarkSourcesBasedOnDiscoveredTestCasesto skip redundant “same source” updates and call the single-source helper directly to avoidnew[] { ... }allocations.
| } | ||
| if (status != DiscoveryStatus.NotDiscovered) | ||
| { | ||
| EqtTrace.Warning($"DiscoveryDataAggregator.MarkSourcesWithStatus: Undiscovered {source} added with status: '{status}'."); |
| } | ||
| else | ||
| { | ||
| EqtTrace.Verbose($"DiscoveryDataAggregator.MarkSourcesWithStatus: Adding {source} with status: '{status}'."); |
| return status; | ||
| }, | ||
| (_, previousStatus) => | ||
| EqtTrace.Warning($"DiscoveryDataAggregator.MarkSourcesWithStatus: Downgrading source {source} status from '{previousStatus}' to '{status}'."); |
| return status; | ||
| }); | ||
| } | ||
| EqtTrace.Verbose($"DiscoveryDataAggregator.MarkSourcesWithStatus: Upgrading {source} status from '{previousStatus}' to '{status}'."); |
nohwnd
left a comment
There was a problem hiding this comment.
🧠 Expert Review — DiscoveryDataAggregator hot-path optimization
Dimensions activated: Parallel Execution & Scheduling Safety · Algorithmic Correctness · Performance & Allocations · Error Reporting & Diagnostic Clarity
Algorithmic Correctness ✅
The condition change from previousSource is null || previousSource == currentSource to previousSource is null is semantically equivalent for all reachable states.
The key invariant being relied on is: the value returned by MarkSourcesBasedOnDiscoveredTestCases is always a source already recorded as PartiallyDiscovered in the dictionary. This holds because:
- Branch 1 (
previousSource is null): addscurrentSourceasPartiallyDiscoveredbefore settingpreviousSource. - Branch 2 (
currentSource != previousSource): markscurrentSourceasPartiallyDiscoveredbefore it becomes the newpreviousSource. - Skip case (
currentSource == previousSource, non-null): the source was alreadyPartiallyDiscoveredfrom a prior iteration or batch, so no update needed.
ConcurrentDictionary has no Remove, so a recorded status can't be erased between batches — the invariant is stable across calls. ✅
Parallel Execution & Scheduling Safety ✅
Multiple proxies share this aggregator but each maintains its own previousSource locally. All shared state is ConcurrentDictionary.AddOrUpdate, which is thread-safe. The optimization skips AddOrUpdate only when the local previousSource equals currentSource, which is purely local state. No new race window is introduced. ✅
The non-volatile read of _isMessageSent is consistent with the pre-existing pattern and was not changed by this PR.
Performance & Allocations ✅
The claimed savings are real. For N tests in a single assembly, the old path allocated N string[1] arrays and called AddOrUpdate N times (all no-ops after the first). The new path does one call on first encounter then zero for the rest — O(1) per source rather than O(N).
The new[] allocation and the IEnumerable<string?> dispatch overhead in the former MarkSourcesWithStatus call are also eliminated at the three loop call sites.
Error Reporting & Diagnostic Clarity ⚠️ (minor, non-blocking)
The trace messages inside MarkSourceWithStatus still reference DiscoveryDataAggregator.MarkSourcesWithStatus: as the method identifier. With the refactoring, this log prefix now points to the public entry point rather than the actual call site. This is a mild diagnostic inconsistency if someone is triaging from traces alone — the message origin will appear to be MarkSourcesWithStatus when the actual frame is the private helper. Not a correctness issue; noting it for completeness.
PR Description Alignment ✅
Title and description accurately describe all three changes in the diff (condition narrowing, private helper extraction, MarkSourcesWithStatus delegation). The before/after table, invariant explanation, and complexity claims all match the code.
No blocking issues found. The optimization is correct, the refactoring is clean, and test coverage (45 existing unit tests) validates the unchanged observable semantics.
🧠 Reviewed by expert-reviewer
🧠 Reviewed by Expert Code Reviewer 🧠
Goal and Rationale
Eliminate O(N) redundant
ConcurrentDictionary.AddOrUpdatecalls and single-element array allocations on the discovery hot path inDiscoveryDataAggregator.MarkSourcesBasedOnDiscoveredTestCases.Focus area: Code-Level Efficiency
Problem
The original inner loop called:
This fires for every test case when consecutive tests share a source assembly (the common case). For a project with N tests in one assembly it means:
new[] { currentSource }array allocations (~32 bytes each on 64-bit)ConcurrentDictionary.AddOrUpdatecalls with lock acquisitionPartiallyDiscoveredunchangedAfter the first call, the status is already
PartiallyDiscovered; all subsequent calls with the same source are pure no-ops.Approach
Skip redundant updates: change the condition from
previousSource is null || previousSource == currentSourceto justpreviousSource is null. When the source matches the previous one, the status is already set — skip entirely.Extract
MarkSourceWithStatus(string?, DiscoveryStatus): private single-source helper that callsAddOrUpdatedirectly, eliminating thenew[]array, theIEnumerableoverhead, and the null guard onsourcesfor the three remaining single-source call sites in the loop.Refactor
MarkSourcesWithStatusto delegate to the new helper, removing the duplicatedAddOrUpdatelogic.Energy Efficiency Evidence
Proxy metric: Memory allocation (GC pressure) and CPU cycles (lock contention) on the discovery path
string[1]per test caseConcurrentDictionary.AddOrUpdatecallsFor a solution with 10,000 tests in a single assembly, this eliminates ~10,000 heap allocations and ~10,000 lock acquisitions from the discovery path per
dotnet testrun.Green Software Foundation context:
dotnet testinvocation, proportional to test countTrade-offs
None. The semantics are identical:
AddOrUpdatewith a value that already matches the stored value is a pure no-op. All 45DiscoveryDataAggregatorunit tests (including source-tracking regression tests) pass without modification.Reproducibility
Test Status
DiscoveryDataAggregatortests: 45/45 passed, 0 failuresmainand this branch)