Skip to content

feat(migration): statehistory migration#3658

Open
MaksymMalicki wants to merge 10 commits into
mainfrom
maksym/statehistory-migration
Open

feat(migration): statehistory migration#3658
MaksymMalicki wants to merge 10 commits into
mainfrom
maksym/statehistory-migration

Conversation

@MaksymMalicki

@MaksymMalicki MaksymMalicki commented May 19, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds the statehistory migration: rewrites the three deprecated contract history layouts (class-hash, nonce, per-slot storage) so each entry stores the post-update value at its block instead of the pre-update value. Gated behind the existing --new-state flag. Depends on the headstate migration (consolidated Contract record) shipped in the sibling PR — the new layout reads contract.{ClassHash, Nonce} and the head storage trie as the "last value" source.

How it works

block │ old (pre-update) │ new (post-update)
──────┼──────────────────┼───────────────────
 100  │ —                │ V₀  ← explicit deploy (class-hash only)
 200  │ V₀               │ V₁
 500  │ V₁               │ V₂
 head │ V₂ (Contract)    │ (read from history)

Runs three sequential phases — class-hash, nonce, storage — each iterating the Contract bucket. Four worker goroutines (ingestorCount) per phase walk one contract's deprecated entries at a time, shift them into the new layout in shared db.Batches, and DeleteRange the deprecated rows in the same batch. One committer drains batches to disk; a semaphore caps in-flight batches at ingestorCount + 1.

  • Class-hash: the deprecated layout never wrote a deploy entry — the deploy-time hash lives only in the first replace's "pre-value". The migration inserts an explicit deploy_h entry on top of the shifted history, growing the count by one per replaced contract.
  • Nonce / storage: the first change entry's pre-value is 0 (the deploy default). Shift only; entry count per contract / per slot is unchanged.
  • Storage: the "last value for a slot" comes from the head storage trie, not the Contract record. The ingestor walks the deprecated history and the head trie in lockstep (both sorted by raw slot bytes); slots with no head leaf (zeroed out) resolve to felt.Zero.

What changes

  • New migration/statehistory/ package (migrator, three per-phase ingestors, shared baseIngestor, committer, counter, parse helpers, tests).
  • Registered in node/migration.go as an optional migration gated by cfg.NewState, running after the headstate migration.

Resume safety

  • Per-contract writes + deprecated-row deletion happen in the same batch; pebble batches commit atomically.
  • A contract whose history is large may span more than one batch. Each new entry's value is a pure function of the deprecated source, so re-running over a partially-rewritten contract overwrites with identical values and then deletes the (still-present) deprecated rows.
  • Contracts whose deprecated entries are already gone short-circuit on an empty iterator.
  • Phases run sequentially; a later phase only starts after the earlier phase completes.
  • Ctx cancellation returns (shouldRerun, ctx.Err()).

Alternatives considered

Two earlier attempts were benchmarked and dropped:

  1. Wipe + rewrite from state updates. Drop the deprecated history entirely and rebuild the new layout by replaying state updates block-by-block. Conceptually clean but the resulting writes touch every history bucket in a near-random order — pebble compaction has to merge many small per-block updates across overlapping key ranges, so compact pressure dominated runtime.

  2. Per-address instead of per-phase. Loop over contracts once and run all three phases (class-hash, nonce, storage) inside the same per-contract worker, finishing each contract before moving on. Saves two passes over the Contract bucket but interleaves writes to three different history buckets per contract — again scattered, again heavy on compaction.

The current per-phase approach writes are tightly clustered: one phase writes only one history bucket, in contract-address order, with deprecated DeleteRanges landing in the same batch as the new rows that replace them. Sequential, large, mostly-sorted writes — pebble's happy path. The two extra Contract-bucket scans are negligible compared to the compaction savings.

@MaksymMalicki MaksymMalicki marked this pull request as draft May 19, 2026 11:20
@codecov

codecov Bot commented May 19, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 69.78610% with 113 lines in your changes missing coverage. Please review.
✅ Project coverage is 75.13%. Comparing base (2edbd87) to head (4dda0e2).
⚠️ Report is 31 commits behind head on main.

Files with missing lines Patch % Lines
migration/state/history/storage_ingestor.go 64.21% 21 Missing and 13 partials ⚠️
migration/state/history/class_hash_ingestor.go 57.14% 15 Missing and 12 partials ⚠️
migration/state/history/migrator.go 68.85% 14 Missing and 5 partials ⚠️
migration/state/history/nonce_ingestor.go 64.10% 7 Missing and 7 partials ⚠️
migration/state/history/parse.go 42.85% 6 Missing and 2 partials ⚠️
migration/state/common/ingestor.go 75.00% 6 Missing ⚠️
migration/state/common/committer.go 91.66% 1 Missing and 1 partial ⚠️
node/migration.go 0.00% 2 Missing ⚠️
migration/state/headstate/ingestor.go 91.66% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3658      +/-   ##
==========================================
- Coverage   76.45%   75.13%   -1.32%     
==========================================
  Files         401      431      +30     
  Lines       36747    38629    +1882     
==========================================
+ Hits        28094    29023     +929     
- Misses       6673     7581     +908     
- Partials     1980     2025      +45     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@MaksymMalicki MaksymMalicki force-pushed the maksym/statehistory-migration branch from 5c0928e to 5a09f6c Compare May 19, 2026 22:36
@MaksymMalicki MaksymMalicki marked this pull request as ready for review May 19, 2026 22:41
@MaksymMalicki MaksymMalicki force-pushed the maksym/headstate-migration branch from 0f8a352 to c2eacce Compare May 20, 2026 12:55
@MaksymMalicki MaksymMalicki force-pushed the maksym/statehistory-migration branch from fa47934 to 05dd000 Compare May 20, 2026 12:56
Comment thread migration/statehistory/counter.go Outdated
elapsed := now.Sub(c.start).Seconds()
if elapsed > c.timeLogRate.Seconds() {
mbs := float64(c.size) / float64(db.Megabyte)
c.logger.Info(

@brbrr brbrr May 26, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

example log output:

statehistory/counter.go:39      write speed     {"phase": "class-hash", "MB": 96.00005149841309, "MB/s": 1.0337050014199167, "completedContracts": 1074100, "completedContracts/s": 11565.64527513286, "entries": 1378950, "entries/s": 14848.195281765626, "time": 92.86987232}

ideally, we should round to ~last 2 digits after .

@rodrodros rodrodros force-pushed the maksym/headstate-migration branch 2 times, most recently from b0f86f9 to c1fcadb Compare May 28, 2026 09:26
Base automatically changed from maksym/headstate-migration to main May 28, 2026 09:39
@rodrodros rodrodros force-pushed the maksym/statehistory-migration branch from 05dd000 to 5f59583 Compare May 28, 2026 09:44
@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Claude finished @rodrodros's task in 4m 44s —— View job


Review: statehistory migration

Done — reviewed the full diff (migrator, three ingestors, shared common scaffolding, committer/counter, pipeline integration, and tests). This is a careful, well-documented PR and I found no correctness or concurrency blockers. Two nits posted inline.

What I verified

  • Concurrency / batching. The IngestorCount workers each touch a distinct Tasks[index] slot, Task is copied by value on the channel send (old batch goes to the committer, worker immediately rebinds to a fresh one), and Database is read-only — no shared mutable state, no data race. The semaphore capacity (IngestorCount+1) plus the unbuffered hand-off means a worker is down to 0 held batches before it GetBlockings a new one, and the committer always drains/Puts, so the worst case is 5 outstanding = capacity. No deadlock.
  • Resource cleanup. Every iterator (addressSeq, deprecated history, head storage trie) has a matching defer Close(). On error the committer keeps draining its input channel (the for range doesn't break on Run error), so all batches are Put back — no semaphore/goroutine leak.
  • Error handling. Source iterator errors are surfaced via errors.Join(sourceErr(), res.Err); malformed keys produce explicit errors; ctx cancellation returns (shouldRerun, ctx.Err()).
  • Storage lockstep. Walked the head-trie/deprecated-history lockstep for the tricky cases — missing leaf before/after a present one, fully-zeroed slots, interleaving. The head iterator only advances on the last entry of a slot that has a matching leaf, so it stays in sync regardless of zeroed-slot ordering. The prefix+251 leaf prefix correctly selects only leaves (Value always set ⇒ no nil-deref), and tests exercise all these paths.
  • Resume safety. New keys live in different buckets than the deprecated rows, so the same-batch DeleteRange never races the rewrites; re-running a partially-migrated contract overwrites with identical values then re-deletes. Cross-contract batch interleaving is handled by the empty-iterator short-circuit. Holds up.

Findings

nit

  • common/counter.goMB/MB/s are rounded to 2 dp but completedContracts/s, entries/s, and time are still full-precision (the bulk of the noise in @brbrr's earlier example log). Suggestion posted inline.
  • class_hash_ingestor.gowriteShiftedHistory re-counts EntryCount/CompletedAddrs on a resume pass (unlike the Has-guarded writeDeployOnly), so progress throughput can over-count after a restart. Metrics-only; correctness unaffected.

Both are non-blocking. Nice work on the per-phase design and the detailed doc comments.
· maksym/statehistory-migration

Comment on lines +52 to +60
fields = append(fields,
zap.Float64("MB", math.Round(mbs*cent)/cent),
zap.Float64("MB/s", math.Round(mbs/elapsed*cent)/cent),
zap.Uint64("completedContracts", c.completedAddrs),
zap.Float64("completedContracts/s", float64(c.completedAddrs)/elapsed),
zap.Uint64("entries", c.entryCount),
zap.Float64("entries/s", float64(c.entryCount)/elapsed),
zap.Float64("time", elapsed),
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit (follow-up to @brbrr's earlier rounding request): MB and MB/s are now rounded to 2 decimals, but completedContracts/s, entries/s, and time are still emitted at full float precision — which is most of the noise in the example log line. Consider rounding them the same way:

Suggested change
fields = append(fields,
zap.Float64("MB", math.Round(mbs*cent)/cent),
zap.Float64("MB/s", math.Round(mbs/elapsed*cent)/cent),
zap.Uint64("completedContracts", c.completedAddrs),
zap.Float64("completedContracts/s", float64(c.completedAddrs)/elapsed),
zap.Uint64("entries", c.entryCount),
zap.Float64("entries/s", float64(c.entryCount)/elapsed),
zap.Float64("time", elapsed),
)
fields = append(fields,
zap.Float64("MB", math.Round(mbs*cent)/cent),
zap.Float64("MB/s", math.Round(mbs/elapsed*cent)/cent),
zap.Uint64("completedContracts", c.completedAddrs),
zap.Float64("completedContracts/s", math.Round(float64(c.completedAddrs)/elapsed*cent)/cent),
zap.Uint64("entries", c.entryCount),
zap.Float64("entries/s", math.Round(float64(c.entryCount)/elapsed*cent)/cent),
zap.Float64("time", math.Round(elapsed*cent)/cent),
)

Comment on lines +128 to +162
t.EntryCount++
if err := i.Flush(t, outputs); err != nil {
return err
}

for {
block, err := parseBlockKey(depIt.Key(), prefix)
if err != nil {
return fmt.Errorf("class-hash(%s): %w", addr, err)
}
hasNext := depIt.Next()
historyValue := *headClassHash
if hasNext {
rawValue, err := depIt.Value()
if err != nil {
return fmt.Errorf("class-hash(%s): %w", addr, err)
}
historyValue = felt.FromBytes[felt.Felt](rawValue)
}
if err := state.WriteClassHashHistory(t.Batch, addr, block, &historyValue); err != nil {
return err
}
t.EntryCount++
if err := i.Flush(t, outputs); err != nil {
return err
}
if !hasNext {
break
}
}

if err := t.Batch.DeleteRange(prefix, dbutils.UpperBound(prefix)); err != nil {
return fmt.Errorf("class-hash: DeleteRange deprecated(%s): %w", addr, err)
}
t.CompletedAddrs++

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit (metrics only): writeShiftedHistory unconditionally increments EntryCount/CompletedAddrs on every pass, whereas writeDeployOnly short-circuits via Has(deployKey) when the entry already exists. On a resume that re-runs an already-rewritten-but-not-yet-deleted contract, the shifted path re-counts every entry, so the reported entries/completedContracts throughput can over-count after a crash/restart. Correctness is unaffected (writes are idempotent); just noting the progress numbers aren't resume-exact here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants