feat(detect): surface windowed match count/proportion in evidence by Lioscro · Pull Request #247 · ArcInstitute/cyto

Lioscro · 2026-06-23T22:58:07Z

Summary

Adds windowed_match_count: usize and windowed_match_proportion: f64 to ComponentEvidence. They sum hits across [pos ± remap_window] on the same mate, surfacing what cyto map --remap-window N would score for each component.
Plumbs per-file PositionAccumulators into validate_and_aggregate so the aggregated path merges accumulators and re-walks at max_remap_window (centered on file 0's best position).
Surfaces the new values inline in both stderr log functions. No CLI changes, no mapper changes, no stdout-contract change.

Why

Per-component evidence today reports counts at a single canonical position only. When a library has positional drift -- V2 GEX [:18] spacer drift, V1 [probe] jitter -- the headline proportion looks low even though cyto map --remap-window N would score most of those reads. The windowed values let users see the effective rate at the recommended window.

What this looks like

[2026-06-23T23:00:43.804Z INFO  cyto_map::detect] Detected geometry: `[barcode][umi:12][:13][probe] | [gex]`
[2026-06-23T23:00:43.804Z INFO  cyto_map::detect] Recommended --remap-window: 2
[2026-06-23T23:00:43.804Z INFO  cyto_map::detect] Detection sampled 400000 reads total (4 files)
[2026-06-23T23:00:43.804Z INFO  cyto_map::detect]   [barcode] R1 pos=0 count=371755 proportion=0.9294 windowed_count=372522 windowed_proportion=0.9313
[2026-06-23T23:00:43.804Z INFO  cyto_map::detect]     alt: R2 pos=17 count=19615
[2026-06-23T23:00:43.804Z INFO  cyto_map::detect]     alt: R2 pos=3 count=9423
[2026-06-23T23:00:43.804Z INFO  cyto_map::detect]     alt: R2 pos=31 count=6000
[2026-06-23T23:00:43.804Z INFO  cyto_map::detect]   [gex] R2 pos=0 count=371995 proportion=0.9300 windowed_count=371995 windowed_proportion=0.9300
[2026-06-23T23:00:43.804Z INFO  cyto_map::detect]   [probe] R1 pos=41 count=101140 proportion=0.2529 windowed_count=378998 windowed_proportion=0.9475
[2026-06-23T23:00:43.804Z INFO  cyto_map::detect]     alt: R1 pos=40 count=96612
[2026-06-23T23:00:43.804Z INFO  cyto_map::detect]     alt: R1 pos=42 count=90595
[2026-06-23T23:00:43.804Z INFO  cyto_map::detect]     alt: R1 pos=43 count=82843

Test plan

cargo test -p cyto-map -- 76 unit tests (69 + 7 new) + 4 integration tests, all pass.
cargo test --workspace green.
cargo clippy -p cyto-map --all-targets --no-deps -- -D warnings -- zero new errors beyond the 10 lib + 1 lib-test pre-existing on base.
Mutation experiment (two rounds): Mutation A (range-predicate collapse to *p == best_pos) catches tests 2, 3, 5, 6, 7. Mutation B (formula short-circuit to 0) catches tests 1, 2, 3 + integration assertion. Union covers every new test; production code restored after each.
Manual fixture smoke (cyto detect gex and cyto detect crispr): windowed tokens emitted on every per-component line; probe shows windowed_count > match_count on V1 fixture; recommended-remap-window line still emitted; stdout contract unchanged.
Integration test test_detect_gex_geometry_from_binseq asserts probe positional drift via strict windowed_match_count > match_count.

🤖 Generated with Claude Code

Single-position `match_count`/`match_proportion` describe only the canonical position; they understate what `cyto map --remap-window N` would actually score when libraries have positional drift (V2 GEX `[:18]` spacer drift, V1 `[probe]` jitter). Add `windowed_match_count`/`windowed_match_proportion` that sum hits over `[pos ± remap_window]` on the same mate, so detect's stderr lets users see the effective match rate at the recommended window. In `validate_and_aggregate`, per-file `PositionAccumulator`s are now plumbed through and merged for an aggregated re-walk at `max_remap_window` -- naive sum-of-per-file under-counts when per-file `W` differs from the aggregated `W`. Test `test_validate_and_aggregate_windowed_cross_file` exercises this divergence (merged 12500 vs sum-of-per-file 12000). For short references like the 8bp `[probe]` multiplex barcode, a single read can contribute hits at multiple positions in the window, so `windowed_match_count` is an upper bound on what `cyto map` would score, not an exact count. The doc comment on `ComponentEvidence::windowed_match_count` calls this out explicitly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

gemini-code-assist

Code Review

This pull request introduces windowed match counts and proportions to the geometry detection module to better estimate read counts within a remap window, addressing positional drift in short references like probes. It updates ComponentEvidence and PositionAccumulator to calculate and aggregate these metrics across files, and adds comprehensive unit and integration tests. The feedback suggests using saturating_add when calculating the upper bound of the window to defensively prevent potential overflow panics.

Mirror the saturating_sub already used for the lower bound. Overflow is unreachable in practice (both best_pos and window are bounded by read length) but the symmetry makes the intent obvious and matches the doc-comment range expression. Addresses Gemini PR #247 review. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

gemini-code-assist Bot reviewed Jun 23, 2026

View reviewed changes

Comment thread crates/cyto-map/src/detect.rs Outdated

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(detect): surface windowed match count/proportion in evidence#247

feat(detect): surface windowed match count/proportion in evidence#247
Lioscro wants to merge 2 commits into
joseph.min/remap-window-probe-inclusionfrom
joseph.min/windowed-match-proportion

Lioscro commented Jun 23, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Lioscro commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

What this looks like

Test plan

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Lioscro commented Jun 23, 2026 •

edited

Loading