Skip to content

feat(fingerprint): switch to new fingerprint algorithm#243

Draft
dmcilvaney wants to merge 28 commits into
microsoft:mainfrom
dmcilvaney:damcilva/schema_version_parts/reset-cutover
Draft

feat(fingerprint): switch to new fingerprint algorithm#243
dmcilvaney wants to merge 28 commits into
microsoft:mainfrom
dmcilvaney:damcilva/schema_version_parts/reset-cutover

Conversation

@dmcilvaney

@dmcilvaney dmcilvaney commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Full e2e set of all changes. Designed to be split into multiple PRs

dmcilvaney added 26 commits June 4, 2026 16:37
…emission probe

The coverage-invariant paragraph said the one maximal golden config 'doubles as
the emission-probe input'. That holds only at the cutover: a golden vector freezes
a config's digest, so the maximal config cannot grow, while the emission probe
needs an every-measured-field-set input that grows with each additive field. Note
the two-config split (frozen maximalConfig + growing emissionProbeConfig) and the
additive-field workflow (set in the probe input, pin with a new <toml-key>-set
vector, never edit the frozen maximal digest).
…ctuation

Replace the stale 'extend the maximal vector' loop with the frozen-maximal /
growing-probe workflow, and state which steps are hard CI gates (mandatory tag,
projectV1 emission probe) versus the recommended-but-ungated golden vector.

Also replace horizontal-ellipsis characters with ASCII '...' and one stray
loanword accent; intentional diagram/math notation is left as-is.
…256 combiner

Phase 1 (PR A1) of the schema-version-parts cutover. Adds the pure projection-substrate primitives in internal/fingerprint, beside the existing hashstructure path: the canonicalBuf length-prefixed encoder with the split omit-predicate, the fingerprint version-set tag parser, and the sha256 combiner step. Nothing is wired into ComputeIdentity and hashstructure is untouched, so no lock byte or scenario snapshot changes. Includes in-package unit tests and the phase 1 report; updates plan status.
Add the v1 projection layer beside the existing hashstructure path, additive
and not yet wired into ComputeIdentity:

- projectV1 + frozen nested sub-projectors, emitting measured fields by literal
  Go path under their frozen TOML emit-key
- canonicalizeForFingerprint: reflective nil-or-empty scalar-slice normalizer at
  the hash boundary, pruning at fingerprint:"-" edges (cycle-safe, no field
  inventory)
- fingerprint:"v1..*" tags on every measured field across the 10 fingerprinted
  structs; hazard comments on each '-'-pruned composite; Packages kept measured
- golden vectors with an append-only -update-golden guard; emission probe;
  composite-'!' placeholder gate; mandatory-tag decision test
- frozen maximalConfig vs growing emissionProbeConfig split + documented
  additive-field workflow and naming convention

No lock byte or scenario snapshot moves; hashstructure untouched.
…heck in phase 3

Phase 2 experimentation ran projectV1 over all 7417 azurelinux components and
confirmed it partitions the fleet identically to hashstructure (244==244 groups,
0 differences; errored 0; deterministic). Record this as a pre-cutover de-risking
gate in the Phase 3 plan so the irreversible freeze is validated against the real
fleet, not just synthetic golden vectors.
Copilot AI review requested due to automatic review settings June 17, 2026 22:09

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR switches the component fingerprinting substrate from hashstructure over live Go structs to a frozen, canonical v1 projection (projectV1) hashed with stdlib sha256, and stores fingerprints as an atomic content-version token (v1:sha256:<digest>). It also tightens/clarifies the fingerprint field-decision model by requiring explicit fingerprint tags on all fingerprinted fields and removes the hashstructure module dependency.

Changes:

  • Rewired ComputeIdentity to hash projectV1(canonicalizeForFingerprint(cfg)) and to emit the atomic v1:sha256: token; removed the old config-hash artifact and hashstructure dependency.
  • Introduced and guarded the v1 projection substrate (canonical encoder, version-set tag parser, golden vectors, emission probe) and enforced mandatory per-field fingerprint decisions.
  • Relaxed lockfile read gating to accept format versions in [1..currentVersion] while explicitly pinning format Version == 1 independent of token content version.

Reviewed changes

Copilot reviewed 44 out of 46 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
report/schema-version-parts/report-phase1.md Phase 1 implementation report (encoder + tag parser + combiner).
report/schema-version-parts/report-phase2.md Phase 2 implementation report (projectV1 + golden vectors + tags).
report/schema-version-parts/report-phase3.md Phase 3 implementation report (cutover wiring + atomic token + hashstructure removal).
report/schema-version-parts/.gitkeep Documents expected phase report files for this workstream.
plan/schema-version-parts/phase1-encoder-tag-parser.md Phase 1 plan/checklist updated to completed.
plan/schema-version-parts/phase2-projection-golden-vectors.md Phase 2 plan/checklist updated to completed.
plan/schema-version-parts/phase3-reset-cutover.md Phase 3 plan/checklist updated to completed.
plan/schema-version-parts/overview.md Workstream overview updated to completed, including guardrails and phase status.
plan/schema-version-parts/handoff-prompt.md Handoff prompt for phased execution of the workstream.
internal/projectconfig/component.go Adds fingerprint:"v1..*"/"-" tags and hazard comments on pruned subtrees.
internal/projectconfig/build.go Adds fingerprint tags and hazard comments for excluded composites.
internal/projectconfig/distro.go Adds fingerprint tags to distro reference fields.
internal/projectconfig/overlay.go Adds fingerprint tags to overlay fields that affect build inputs.
internal/projectconfig/package.go Adds hazard comment for excluded publish subtree in package config.
internal/projectconfig/render.go Adds fingerprint tag to render config field.
internal/projectconfig/specsource.go Adds fingerprint tags to spec source fields; keeps path excluded.
internal/projectconfig/fingerprint_test.go Enforces mandatory fingerprint-tag decisions via fingerprint.ValidateFieldTag and central type list.
internal/fingerprint/fingerprint.go Switches ComputeIdentity to projection-based hashing and stamps v1:sha256: token.
internal/fingerprint/combine.go Defines combineProjection for folding projection bytes + non-config inputs.
internal/fingerprint/combine_internal_test.go Unit tests for combineProjection.
internal/fingerprint/canonical.go Canonical length-prefixed encoder for projection bytes.
internal/fingerprint/canonical_internal_test.go Unit tests for canonical encoder behaviors and edge cases.
internal/fingerprint/versiontag.go Version-set fingerprint tag parser (vN..*, !, key=) and validation.
internal/fingerprint/versiontag_internal_test.go Unit tests for tag parsing/validation and emit-key resolution.
internal/fingerprint/project.go Implements projectV1, canonicalizer, tag validation, and the fingerprinted-type list.
internal/fingerprint/project_internal_test.go Tests canonicalization, projection behaviors, emission probe, and composite-! placeholder.
internal/fingerprint/golden_internal_test.go Golden-vector freeze + append-only guard + -update-golden support.
internal/fingerprint/testdata/golden_v1.json Frozen v1 (config -> digest) golden vector table.
internal/fingerprint/fingerprint_test.go Updates identity tests to assert v1:sha256: token shape.
internal/lockfile/lockfile.go Relaxes format-version read gate to [1..currentVersion] with updated error message.
internal/lockfile/lockfile_test.go Adds format-version pinning/round-trip test independent of token content version.
internal/app/azldev/core/components/resolver.go Documents force-rehash behavior via string inequality for legacy tokens.
internal/app/azldev/cmds/component/update.go Documents force-rehash behavior via string inequality at update restamp site.
internal/app/azldev/cmds/component/update_test.go Adds test verifying legacy prefix-less tokens force-rehash to v1:sha256: on update.
docs/developer/reference/component-identity-and-locking.md Updates developer reference to projection substrate + version-set tags + atomic token.
docs/developer/schema-migration/README.md Adds executive summary for the RFC/workstream.
docs/developer/schema-migration/problem-and-motivation.md Adds problem statement and motivation summary doc.
docs/developer/schema-migration/part-1-the-reset.md Adds Part 1 (reset) summary doc.
docs/developer/schema-migration/part-2-lazy-migration.md Adds Part 2 (deferred) summary doc.
docs/developer/schema-migration/delivery-plan.md Adds delivery plan summary doc.
.github/instructions/projectconfig-fingerprint.instructions.md Adds repo guidance for safe edits to config structs/fingerprint substrate.
.github/instructions/go.instructions.md Updates Go instructions to point at the new fingerprint/config guidance.
.github/copilot-instructions.md Adds a critical note to read the fingerprint/config instruction doc before such edits.
go.mod Removes github.com/mitchellh/hashstructure/v2 dependency.
go.sum Removes hashstructure checksums.

Comment on lines +17 to +19
// maximalConfig returns a config with every measured scalar-leaf and map field
// maximalConfig is the frozen v1-cutover field set: every field measured at the
// cutover, each set to a distinct non-zero value, golden-vectored as "maximal".
…e scalar ! (R1+L1+S2)

R1: move the nil-or-empty scalar-slice rule into emit's omit predicate
(isScalarZero) and delete the ~75-line canonicalizeForFingerprint/Into/Slice
family plus its tests. Output-preserving - every golden digest is unchanged - and
it removes the self-inflicted SourceConfigFile cycle hazard the deep-copy needed.

the projector (emitAlways bypasses the omit predicate); without it a build-
meaningful zero would validate and pass the probe but be silently dropped.

S2: golden test now asserts every non-minimal vector differs from the empty
projection, catching a dropped/mis-wired emit (incl. a duplicate-key one or a
'!' field using emit instead of emitAlways).

NOTE: ComputeIdentity does not yet call projectV1 on this (Phase 2) branch, so the
canonicalize call there is a reset-cutover concern; see phase3-post-rebase-todos.
…a256 token (PR B)

The one-time substrate swap: ComputeIdentity hashes the canonical projectV1
projection via sha256 and stamps the atomic v1:sha256: content token; hashstructure
is removed; lock format Version stays 1 with force-rehash reconciliation of
pre-reset tokens.

Includes the review-driven consolidations and fixups (squashed):
- single FingerprintedStructTypes() source of truth for the decision test and
  emission probe, with a completeness/reachability guard
  (TestFingerprintedStructTypesIsComplete);
- dropped the dead ComponentIdentity.Inputs breakdown (componentInputs is now the
  internal combiner input only);
- SkipReason documented as deliberately unmeasured (render-only comment);
- v1: token test-hardening (HasPrefix(v1:sha256:), meaningful bump placeholder);
- config/fingerprinting guardrail instructions, the as-built RFC reconciliation
  (read-gate floor; parser-free reconciliation), and the component-identity doc
  pointer to the RFC.
@dmcilvaney dmcilvaney force-pushed the damcilva/schema_version_parts/reset-cutover branch from d33009a to 0c2a32a Compare June 18, 2026 01:27
Copilot AI review requested due to automatic review settings June 18, 2026 01:27

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 46 out of 48 changed files in this pull request and generated 3 comments.

Comment on lines +21 to 26
// ComponentIdentity holds the computed fingerprint for a single component.
type ComponentIdentity struct {
// Fingerprint is the overall SHA256 hash combining all inputs.
// Fingerprint is the atomic "v<N>:sha256:..." content token combining the
// canonical config projection with the non-config inputs.
Fingerprint string `json:"fingerprint"`
// Inputs provides the individual input hashes that were combined.
Inputs ComponentInputs `json:"inputs"`
}
Comment on lines +15 to +19
1. **Config projection digest** - `sha256` of the canonical `projectV1` projection of the resolved `ComponentConfig` (after all merging). Only fields whose `fingerprint` tag measures them at v1 are emitted; `fingerprint:"-"` fields are excluded. A nil-or-empty scalar slice is treated as zero and omitted by the projection's omit predicate, so a merge-order nil-vs-`[]` difference never moves the digest.
2. **Source identity** - content hash for local specs (all files in the spec directory), commit hash for upstream.
3. **Overlay file hashes** - SHA256 of each file referenced by overlay `Source` fields.
4. **Distro name + version**
5. **Manual release bump counter** — increments with each manual release bump, ensuring a new fingerprint even if there are no config or source changes.
5. **Manual release bump counter** - increments with each manual release bump, ensuring a new fingerprint even if there are no config or source changes.
Comment on lines +17 to +25
from the module. Because Phases 1-2 already shipped `canonicalizeForFingerprint`,
`projectV1`, and `combineProjection` beside the live path, this phase is
**deletion-heavy rewiring** (net +139/-95 across source), not new machinery:

1. **3.1 substrate swap** - the `hashstructure.Hash` config-hash step is replaced
by `projectV1(canonicalizeForFingerprint(component))` invoked inside the hash
boundary, so every path into the hasher is canonicalized. The `uint64`
`ComponentInputs.ConfigHash` artifact and the old `combineInputs` fold are
deleted; everything is `sha256` now.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants