feat(encryption): Stage 6E-2e-1 — Applier FSM-apply wrap installer hook#944
feat(encryption): Stage 6E-2e-1 — Applier FSM-apply wrap installer hook#944bootjp wants to merge 2 commits into
Conversation
Closes BLOCKER (b) at the apply layer (codex P1 round-3 on PR933).
Every replica's FSM-apply of the EnableRaftEnvelope cutover marker now
invokes a registered RaftCutoverWrapInstaller callback that publishes
the §4.2 raft envelope wrap closure on this node — so a follower that
becomes leader post-cutover already has wrap active without needing the
EnableRaftEnvelope handler to re-run.
Without this hook, the per-leader InstallWrap call in
adapter/encryption_admin.go's runRaftEnvelopeCutoverBarrier is the only
path that installs the wrap. A leader failover between cutover commit
and InstallWrap would let the new leader admit cleartext user writes at
indexes gt cutoverIdx; followers applying those entries would halt on
the §6.3 strict-gt unwrap hook cluster-wide.
What ships:
- RaftCutoverWrapInstaller func type (cutoverIdx, activeRaftDEKID) -> error.
- WithRaftCutoverWrapInstaller Applier option. nil = no-op (preserves
the pre-6E-2e-1 test surface and the production posture until
6E-2e-3 wires the installer in main.go).
- applyEnableRaftEnvelope invokes the installer on:
* Fresh-success branch (after WriteSidecar, after RefreshFromSidecar)
with (raftIdx, p.DEKID).
* Already-active branch (FSM replay safety: snapshot-excluded cutover
re-applies on restart) with (sc.RaftEnvelopeCutoverIndex, p.DEKID).
- Stale-DEKID no-op branch deliberately SKIPS the installer — no
cutover took effect, installing would publish a closure keyed to the
wrong DEK.
- Installer errors halt apply (fail-closed). Sidecar write precedes
installer call so crash recovery converges via the already-active
branch on restart.
5-lens self-review:
1. Data loss — none; sidecar write is durable before installer call.
Reverse ordering would lose install state on crash.
2. Concurrency / distributed failures — FSM apply is single-threaded
per replica; each replica installs independently against the same
sidecar values, so wrap closure key is consistent cluster-wide. No
cross-replica coordination needed.
3. Performance — one closure call per cutover apply. Cutover is rare.
4. Data consistency — installer key matches sidecar state on success.
On installer error, sidecar is durable + apply halts; operator
restart converges via already-active branch re-install.
5. Test coverage — four new tests: FreshSuccess invokes installer with
(raftIdx, p.DEKID); AlreadyActive invokes installer with
(firstIdx, p.DEKID) for replay safety; StaleDEKID skips installer;
InstallerError halts apply with sidecar still written.
Caller audit for the new semantic:
- applyEnableRaftEnvelope is only called from ApplyRotation's dispatch
in the same file. No external callers affected.
- Default nil installer = no behavior change for callers that don't
opt in to WithRaftCutoverWrapInstaller (test surface and current
main.go wiring).
Cyclop-driven refactor: extracted invokeRaftCutoverWrapInstaller helper
shared between fresh-success and already-active branches so
applyEnableRaftEnvelope stays under the per-function complexity cap.
Next slices (6E-2e-2: admin RPCs through wrap-aware proposer to close
BLOCKER (a); 6E-2e-3: main.go closure source + startup-time install;
6E-2f: gate flip) tracked in the partial design doc.
|
Warning Review limit reached
More reviews will be available in 17 minutes and 25 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@claude review |
|
Claude finished @bootjp's task in 3m 1s —— View job PR Review: feat(encryption): Stage 6E-2e-1 — Applier FSM-apply wrap installer hook
Overall AssessmentSolid, focused implementation. The crash-recovery ordering ( FindingsFinding 1 — Minor doc inaccuracy:
|
There was a problem hiding this comment.
Code Review
This pull request implements Stage 6E-2e-1 of the Raft envelope cutover design by introducing the WithRaftCutoverWrapInstaller hook in internal/encryption/applier.go. This hook is invoked during both fresh-success and already-active branches of applyEnableRaftEnvelope to ensure that every replica's FSM-apply of the cutover marker publishes the wrap closure, mitigating leader-failover hazards. Comprehensive unit tests have been added in internal/encryption/applier_test.go to validate the installer's invocation, replay safety, stale-DEK handling, and error propagation. The review feedback suggests a minor improvement to pass sc.Active.Raft directly instead of p.DEKID when invoking the installer to ensure robustness and consistency with the state cache.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| return errors.Wrap(err, "applier: write sidecar for already-active raft-cutover no-op") | ||
| } | ||
| return nil | ||
| return a.invokeRaftCutoverWrapInstaller(sc.RaftEnvelopeCutoverIndex, p.DEKID, "already-active replay") |
There was a problem hiding this comment.
In the already-active replay branch, the active Raft DEK ID is passed to the installer. While p.DEKID is guaranteed to be equal to sc.Active.Raft due to the preceding check on line 1214, passing sc.Active.Raft directly is more robust, self-documenting, and strictly adheres to the RaftCutoverWrapInstaller contract which specifies that the second argument should be the sidecar.Active.Raft value at apply time.
| return a.invokeRaftCutoverWrapInstaller(sc.RaftEnvelopeCutoverIndex, p.DEKID, "already-active replay") | |
| return a.invokeRaftCutoverWrapInstaller(sc.RaftEnvelopeCutoverIndex, sc.Active.Raft, "already-active replay") |
References
- Prioritize using the primary state cache directly rather than passing state variables through call chains to ensure consistency and correctness of critical state updates.
| // first) would leave the wrap published but the sidecar | ||
| // pre-cutover on crash, breaking the equality the §6.3 hook | ||
| // relies on cluster-wide. | ||
| return a.invokeRaftCutoverWrapInstaller(raftIdx, p.DEKID, "fresh-success apply") |
There was a problem hiding this comment.
In the fresh-success apply branch, the active Raft DEK ID is passed to the installer. While p.DEKID is guaranteed to be equal to sc.Active.Raft due to the preceding check on line 1214, passing sc.Active.Raft directly is more robust, self-documenting, and strictly adheres to the RaftCutoverWrapInstaller contract which specifies that the second argument should be the sidecar.Active.Raft value at apply time.
| return a.invokeRaftCutoverWrapInstaller(raftIdx, p.DEKID, "fresh-success apply") | |
| return a.invokeRaftCutoverWrapInstaller(raftIdx, sc.Active.Raft, "fresh-success apply") |
References
- Prioritize using the primary state cache directly rather than passing state variables through call chains to ensure consistency and correctness of critical state updates.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6a510abcab
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| return errors.Wrap(err, "applier: write sidecar for already-active raft-cutover no-op") | ||
| } | ||
| return nil | ||
| return a.invokeRaftCutoverWrapInstaller(sc.RaftEnvelopeCutoverIndex, p.DEKID, "already-active replay") |
There was a problem hiding this comment.
Install wrap even when replayed cutover DEK is stale
This installer call is only reachable after the stale-DEK guard above, so a crash/replay after the cutover has already applied and a later raft-DEK rotation has advanced sidecar.Active.Raft will skip installation entirely because the original cutover payload's p.DEKID no longer matches the sidecar. That is exactly the replay path this hook is meant to repair: the sidecar already has RaftEnvelopeCutoverIndex != 0, but the in-process ShardGroup wrap pointer is nil, so a restarted node that becomes leader can admit cleartext proposals above the cutover. The already-active case should be handled before treating the payload DEK as stale, and should install using the current sc.Active.Raft DEK.
Useful? React with 👍 / 👎.
Round-1 review fixes for PR #944: Codex P1 round-1 + Gemini medium x2 — Install wrap even when replayed cutover DEK is stale. applyEnableRaftEnvelope tested the stale-DEK condition BEFORE the already-active condition. After a successful cutover at index N with DEK D1, a subsequent RotateDEK advances sidecar.Active.Raft to D2. On process restart, FSM replay re-applies the original cutover marker (p.DEKID=D1). The reverse order short-circuits at the stale-DEK check: p.DEKID != sc.Active.Raft → benign no-op → installer NEVER fires → a freshly-elected leader on this node admits cleartext proposals above the cutover index and bricks the §6.3 strict-gt unwrap hook on every replica. Fix: - Swap the check order: test RaftEnvelopeCutoverIndex != 0 FIRST so a replay of the original marker after RotateDEK lands on the already-active branch (which DOES invoke the installer). - Pass sc.Active.Raft to the installer (gemini medium x2): the wrap closure must key to the CURRENT active DEK on every replica so the §6.3 hook unwraps with the same key. p.DEKID was technically equivalent to sc.Active.Raft in the fresh-success branch (the stale-DEK guard above ensures it), but the already-active replay branch is precisely the case where they diverge. The storage variant applyEnableStorageEnvelope keeps its original check order because it has no installer hook — the stale-DEK and already-active branches both result in advanceRaftAppliedIndex-only behavior, so the order is observationally irrelevant. The raft path's order swap is documented in the function-level comment. New regression test: TestApplyRotation_EnableRaftEnvelope_AlreadyActive_StaleDEKReplay_InvokesInstaller pins the exact codex P1 scenario: 1. Original cutover at idx 500 with D1. 2. RotateDEK advances Active.Raft to D2. 3. FSM replay of the original marker (p.DEKID=D1, sc.Active.Raft=D2). 4. Asserts: installer fires; activeRaftDEKID == D2 (sc.Active.Raft, NOT replayed p.DEKID); cutoverIdx == 500 (original sidecar value). Updated existing assertions in TestApplyRotation_EnableRaftEnvelope_FreshSuccess_InvokesInstaller and TestApplyRotation_EnableRaftEnvelope_AlreadyActive_InvokesInstaller remain valid because Active.Raft == p.DEKID in those scenarios. Caller audit for the semantic change: - applyEnableRaftEnvelope is internal; only called from ApplyRotation dispatch. No external callers affected. - The behavior change affects ONLY the replay-after-rotate scenario; every other branch (fresh-success, already-active with matching DEK, stale-DEK race without cutover) is unchanged.
|
@claude review |
Summary
Stage 6E-2e-1 closes BLOCKER (b) from PR #933's codex P1 round-3:
the leader-failover hazard between cutover commit and the
EnableRaftEnvelope handler's
InstallWrapcall.Every replica's FSM-apply of the EnableRaftEnvelope cutover marker now
invokes a registered
RaftCutoverWrapInstallercallback that publishesthe §4.2 raft envelope wrap closure on this node. A follower that
becomes leader post-cutover already has wrap active without needing the
admin RPC to re-run.
This is the apply-layer half of BLOCKER (b). Production wiring of
the installer closure (the main.go side that constructs a wrap closure
from cipher + active raft DEK and calls
SetRaftPayloadWrapon everyparticipating
kv.ShardGroup) lands in 6E-2e-3 alongside theCutoverBarrierControllerfanout.The PR is production-inert until 6E-2e-3 wires the installer:
default-nil installer = no behavior change for any caller that does
not opt in via
WithRaftCutoverWrapInstaller.What ships
RaftCutoverWrapInstallerfunc type(cutoverIdx, activeRaftDEKID) error.WithRaftCutoverWrapInstallerApplier option.applyEnableRaftEnvelopeinvokes the installer on:WriteSidecar, afterRefreshFromSidecar) with(raftIdx, p.DEKID).excludes the cutover entry replays it on restart) with
(sc.RaftEnvelopeCutoverIndex, p.DEKID).cutover took effect, installing would publish a closure keyed to the
wrong DEK.
installer call so crash recovery converges via the already-active
branch on restart.
invokeRaftCutoverWrapInstallerhelper keeps
applyEnableRaftEnvelopeunder the per-functioncomplexity cap.
Behavior change, risk, test evidence
Default nil installer = the existing apply path runs unchanged.
6E-2e-3 with its own review cycle.
go test -race ./internal/encryption/...— all pass (existingTestApplyRotation_EnableRaftEnvelope_*tests confirmed unchangedgolangci-lint --config=.golangci.yaml run— 0 issues.FreshSuccess_InvokesInstaller,AlreadyActive_InvokesInstaller,StaleDEKID_SkipsInstaller,InstallerError_HaltsApply.Caller audit
applyEnableRaftEnvelope(semantic change: installer-call insertion):only callee path is
ApplyRotationdispatch in the same file. Noexternal callers affected.
unchanged; no existing test or production caller observes the new
branches.
5-lens self-review
call. The reverse ordering (install first, write second) would lose
the install state on crash; we chose write→install so a crash
between fsync and installer error converges on restart via the
already-active branch.
single-threaded per replica. Each replica installs independently
against the same sidecar values, so the wrap closure key is
consistent cluster-wide. No cross-replica coordination needed.
one-shot operation. Negligible cost.
success. On installer error, sidecar is durable + apply halts;
operator restart converges via the already-active branch
re-install.
the dispatch.
Test plan
go test -race ./internal/encryption/...golangci-lint runcleanmake testin CIuntil 6E-2e-3 wires the installer. Jepsen coverage will attach
to 6E-2f / Stage 9.
Next slices
routed through the wrap-aware proposer; closes BLOCKER (a).
OpenConfig.RaftCipher+RaftCutoverIndexwiring + concrete
RaftCutoverWrapInstallerclosure +CutoverBarrierControllerfanout + startup-time install onsidecar.RaftEnvelopeCutoverIndex != 0.raftEnvelopeWrapEnabled.Store(true)(single-line gateflip).