Skip to content

Phase 3: write-path correctness, CRAM-writer hardening, contract docs#22

Merged
mbreese merged 3 commits into
mainfrom
phase3-correctness
Jun 11, 2026
Merged

Phase 3: write-path correctness, CRAM-writer hardening, contract docs#22
mbreese merged 3 commits into
mainfrom
phase3-correctness

Conversation

@mbreese

@mbreese mbreese commented Jun 11, 2026

Copy link
Copy Markdown
Member

Summary

Third QC pass, covering valid-input correctness and cross-implementation consistency that Phases 1–2 (cosmetic cleanups; malformed-input hardening) didn't address. Theme continues the prior phases: turn silent corruption/loss into explicit errors, and make SAM/BAM/CRAM behave consistently.

Write-path error handling

  • BAM encodeRecord now checks the return of every variable-length write (name/cigar/seq/qual/aux) — previously only the fixed fields were checked, so a mid-stream I/O error was silently swallowed and the writer kept emitting onto a failed sink.
  • Propagate the final gzip flush error in CRAM encodeBlock, and the gzip Close error in the SAM text reader's Close.

CRAM writer correctness & contract

  • Close() is now idempotent (matches BAM): a double-close no longer writes a second EOF container or double-closes the file; Write after Close errors.
  • New htsio.CigarQueryLen / htsio.ValidateCigarSeq; the CRAM writer rejects a record whose CIGAR query length ≠ SEQ length instead of silently dropping inserted/soft-clipped bases (it reconstructs SEQ from the CIGAR). The BAM writer stores SEQ verbatim, so it intentionally stays lenient.
  • Copy WriterOpts at construction so caller mutation can't change writer behavior mid-stream.
  • Clarifying comment on the RR flag (investigated — confirmed correct).

Concurrency audit (verify-then-fix)

The sorted BAM writer and parallel BGZF writer were reviewed and confirmed race-free (go test -race clean) and leak-free — no code changes needed. Documented the Semaphore.Close() contract. (Reporting this explicitly so the "no change" is a deliberate, verified outcome, not an omission.)

API contracts

Aligned and documented Close() semantics (idempotent, flushes, writes trailer) across all three writers — the SAM writer is now idempotent too — and documented SamReader's single-pass / shared-header / freshly-allocated-record semantics on the interface.

Tests

CIGAR query-length + validation; CRAM double-Close, write-after-close, CIGAR/SEQ-mismatch rejection; BAM mismatch leniency + failing-sink error propagation; SAM double-Close idempotency.

Verification

go build ./..., go vet ./..., go test ./... all green (with samtools 1.21), plus go test -race clean on the bgzf/bam/ontcmd/align packages.

🤖 Generated with Claude Code

mbreese and others added 3 commits June 10, 2026 20:25
Audits valid-input correctness and behavioral consistency that Phases 1–2
(cosmetic cleanups, malformed-input hardening) did not cover.

Write-path error handling:
- BAM encodeRecord now checks the return of every variable-length write
  (name/cigar/seq/qual/aux), matching the already-checked fixed fields, so
  a mid-stream I/O error propagates instead of silently truncating output.
- Propagate the final gzip flush error in the CRAM encodeBlock path, and the
  gzip Close error on the SAM text reader's Close.

CRAM writer correctness & contract:
- Close() is now idempotent (closed flag), so a double-close no longer writes
  a second EOF container or double-closes the file; Write after Close errors.
- New htsio.CigarQueryLen / htsio.ValidateCigarSeq; the CRAM writer rejects a
  record whose CIGAR query length disagrees with its SEQ length rather than
  silently dropping inserted/soft-clipped bases (it reconstructs SEQ from the
  CIGAR). The BAM writer stores SEQ verbatim, so it stays lenient by design.
- Copy WriterOpts at construction so post-construction caller mutation cannot
  change writer behavior.
- Clarify the RR (reference-required) flag (confirmed correct, not a bug).

Concurrency audit (verify-then-fix): the sorted BAM writer and the parallel
BGZF writer were reviewed and confirmed race-free (go test -race clean) and
leak-free — no code changes needed. Documented the Semaphore.Close() contract.

API contracts: aligned and documented Close() semantics (idempotent, flushes,
writes trailer) across the SAM/BAM/CRAM writers — the SAM writer is now
idempotent too — and documented SamReader's single-pass / shared-header /
freshly-allocated-record semantics.

Tests: cigar query-length + validation; CRAM double-Close, write-after-close,
and CIGAR/SEQ-mismatch rejection; BAM mismatch leniency + failing-sink error
propagation; SAM double-Close idempotency.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Phase 3 added htsio.ValidateCigarSeq to the CRAM writer but left the BAM
writer lenient. A BAM record whose CIGAR query length disagrees with len(SEQ)
is malformed per the SAM spec, so the BAM writer now rejects it at Write time
too, consistent with CRAM.

Reworked the UMI-dedup tests to build well-formed records: the shared rec()
helper now sizes SEQ to the CIGAR query length (via htsio.CigarQueryLen), using
the caller's seq as a base pattern repeated/truncated to fit — a single helper
change, no call-site edits. Flipped the BAM writer's mismatch test from
"allows" to "rejects".

No production path synthesizes mismatched records (UMI-dedup only rewrites the
Flag on records read from a valid BAM), so this cannot reject legitimate output.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Adds a static-analysis gate (Printf format checks, mutex-by-value copies,
struct tags, lost results, etc.) ahead of `go test`. Vet compiles the
packages but needs no samtools; it runs after the samtools install for
ordering simplicity.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@mbreese mbreese merged commit 00ca937 into main Jun 11, 2026
1 check passed
@mbreese mbreese deleted the phase3-correctness branch June 11, 2026 01:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant