Skip to content

feat(redis): flip one-phase txn dedup default to on (closes #937)#943

Open
bootjp wants to merge 3 commits into
mainfrom
feat/redis-onephase-dedup-default-on
Open

feat(redis): flip one-phase txn dedup default to on (closes #937)#943
bootjp wants to merge 3 commits into
mainfrom
feat/redis-onephase-dedup-default-on

Conversation

@bootjp

@bootjp bootjp commented Jun 10, 2026

Copy link
Copy Markdown
Owner

Summary

Flips adapter.RedisServer.onePhaseTxnDedup from default-off to default-on, closing the rollout of the option-2 idempotency design landed by docs/design/2026_05_21_proposed_txn_secondary_idempotency.md. No new mechanism — the FSM probe (dedupProbeOnePhase), the wire change (TxnMeta.PrevCommitTS, V2-only when non-zero), and every adapter-side reuse path already ship and are exercised every day by the dedup-mode Jepsen workflow.

Closes #937.

Authorization

The parent design's M4 7-day criterion (source) reads:

7 consecutive days without :duplicate-elements / :G-single-item-realtime in the dedup-mode workflow.

jepsen-test-scheduled-dedup.yml (Redis stress profile, ELASTICKV_REDIS_ONEPHASE_DEDUP=1) has produced 12 consecutive green runs over 10 calendar days, 2026-05-31 → 2026-06-10. Run IDs are listed in the new design doc; the most recent are:

Date (UTC) Run Conclusion
2026-06-10 04:50 27253991270 success
2026-06-09 04:44 27184402445 success
2026-06-08 04:57 27116904871 success
2026-06-07 04:54 27083142341 success
2026-06-06 04:40 27052820868 success
2026-06-05 04:49 26996058409 success
2026-06-04 04:57 26931749014 success
2026-06-04 04:16 26930308744 success
2026-06-03 04:58 26864667493 success
...(2 more) success

Exceeds the 7-day threshold. The parent's R5 (FSM determinism across a rolling upgrade) is discharged — the probe code has shipped on every production node for months.

Change

File Change
docs/design/2026_06_10_proposed_redis_onephase_dedup_default_on.md New design doc (commit 1) recording M4 evidence and the rollout.
adapter/redis.go Flips env-var sense from == "1" to != "0"; updates the comment, godoc, and field comment.
.github/workflows/jepsen-test-scheduled.yml Adds explicit ELASTICKV_REDIS_ONEPHASE_DEDUP: "0" to the job env so the control baseline keeps its meaning across the default change. Retirement of the control workflow is a 30-day follow-up.

Behaviour change

  • Default: dedup gate is now on for any process that does not opt out.
  • Opt-out: ELASTICKV_REDIS_ONEPHASE_DEDUP=0 (single-env-var rollback, no binary revert). The constructor option WithOnePhaseTxnDedup(false) still trumps the env.
  • Existing opt-in users: =1 (the dedup-mode workflow's existing setting) continues to resolve to true. No breakage.

Risk

Concern Disposition
Data loss None — the dedup path only short-circuits on a confirmed prior commit at the exact prev_commit_ts (the parent design's R1 + the existing kv/fsm.go probe). No write is dropped.
Concurrency / distributed failures None — every node already runs the probe code; no new fanout, no new round-trip. R5 discharged as above.
Performance Default-on adds the PrevCommitTS probe on each retryable attempt. Cost is one CommittedVersionAt MVCC lookup at the primary key + stale commit_ts; this is the same cost the dedup-mode workflow has been paying for 12 days without regression. No hot-path allocation change.
Data consistency Improved — the failure modes the parent design predicted (:duplicate-elements, :future-read, :G-single-item-realtime) become unreachable on the protected path. Issue #937 is the most recent control-baseline reproduction.
Test coverage No new branches added; existing adapter/redis_*_dedup_test.go suites cover both gate states. Tests that construct &RedisServer{...} directly (zero-value onePhaseTxnDedup) still observe the legacy path, so the off-path remains covered at unit level.

Verification

  • go vet ./... — clean
  • go build ./... — clean
  • go test -run 'Dedup|OnePhase|PrevCommit|Idempot' ./adapter/ — pass (1.2s)
  • go test -run 'TestRedis|TestList|TestSet|TestZSet|TestStream|TestExec|TestMulti' ./adapter/ — pass (169s)
  • golangci-lint run ./adapter/... — 0 issues

Follow-ups

  • 30 days post-merge — review whether jepsen-test-scheduled.yml (now explicit dedup-off) still adds signal. If not, retire it.
  • DynamoDB default-on — parallel work tracked by 2026_06_03_partial_dynamodb_onephase_dedup.md's M2. Not part of this PR.
  • S3, SQS dedup paths — not yet routed through one-phase reuse; out of scope for the parent design.

Summary by CodeRabbit

  • Documentation

    • Added a design doc describing a planned default-on change for Redis transaction dedup and rollout/verification plans.
  • Chores

    • Flipped adapter default behavior so transaction dedup is enabled by default, with an environment opt-out preserved.
    • Introduced a separate opt-in gate for standalone SET dedup to avoid unintended behavior changes.
  • Tests

    • Updated standalone SET tests and added a regression test to verify overwrite semantics remain correct under current defaults.

bootjp added 2 commits June 10, 2026 23:46
Closes the rollout of the option-2 idempotency design landed by
docs/design/2026_05_21_proposed_txn_secondary_idempotency.md by flipping
adapter.RedisServer.onePhaseTxnDedup from default-off to default-on. No
new mechanism — the FSM probe, the PrevCommitTS wire field, and every
adapter-side write-set reuse path already ship. Default-on is authorized
by the parent design's M4 7-day green criterion in the dedup-mode Jepsen
workflow; 12 consecutive green runs on the stress profile (2026-05-31 →
2026-06-10) exceed that threshold.

Triggered by Jepsen scheduled-stress run 27033231956 (2026-06-05) on the
control baseline workflow (gate off) which surfaced :duplicate-elements
and the related real-time cycle anomalies, exactly the shape the parent
design was written to prevent. Issue #937 tracks that failure; this
proposal closes it by retiring the unprotected default.

Doc commit only. Implementation commit follows in the same PR.
Implements the proposal in
docs/design/2026_06_10_proposed_redis_onephase_dedup_default_on.md.
The Redis adapter's one-phase idempotency dedup
(adapter.RedisServer.onePhaseTxnDedup) now defaults on. The
ELASTICKV_REDIS_ONEPHASE_DEDUP env var inverts from opt-in (=1
enabled, anything else disabled) to opt-out (=0 disabled, anything
else enabled). The WithOnePhaseTxnDedup constructor option still
trumps the env var.

Authorization for the flip is the parent design's M4 7-day green
criterion in the dedup-mode Jepsen workflow
(.github/workflows/jepsen-test-scheduled-dedup.yml), met by 12
consecutive green runs over 10 days (2026-05-31 → 2026-06-10) on the
stress profile that produced the parent design's trigger anomaly. The
parent's R5 (FSM determinism across a rolling upgrade) is discharged
— the probe code has shipped on every production node for months, so
the writer (emission) flipping on is uniformly understood.

To preserve the legacy-path coverage during the post-flip observation
window, the control workflow
(.github/workflows/jepsen-test-scheduled.yml) now sets
ELASTICKV_REDIS_ONEPHASE_DEDUP=0 explicitly at the job env level so
its semantics survive the default change. Retirement of that workflow
is a 30-day follow-up.

Closes #937 — the failure that triggered this work is the control
baseline's expected unprotected behaviour; with default-on it falls
back into the path the dedup-mode workflow has been exercising clean
for 12 consecutive days.

Verified:
- go vet ./... clean
- go build ./... clean
- go test -run 'Dedup|OnePhase|PrevCommit|Idempot' ./adapter/ — pass
- go test -run 'TestRedis|TestList|TestSet|TestZSet|TestStream|TestExec|TestMulti'
  ./adapter/ — pass (169s)
- golangci-lint run ./adapter/... — 0 issues
@coderabbitai

coderabbitai Bot commented Jun 10, 2026

Copy link
Copy Markdown

Review Change Stack

Note

Currently processing new changes in this PR. This may take a few minutes, please wait...

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 294ab44c-9393-4817-b5c6-4bc9a627e627

📥 Commits

Reviewing files that changed from the base of the PR and between a6af333 and a5909eb.

📒 Files selected for processing (4)
  • adapter/redis.go
  • adapter/redis_set_dedup_test.go
  • adapter/redis_set_overwrite_default_test.go
  • docs/design/2026_06_10_proposed_redis_onephase_dedup_default_on.md
 _____________________________________
< CORS error? I'll cross that bridge. >
 -------------------------------------
  \
   \   \
        \ /\
        ( )
      .( o ).
📝 Walkthrough

Walkthrough

This PR flips the default state of one-phase transaction dedup in the Redis adapter from off to on. The initialization logic in NewRedisServer now enables dedup unless explicitly disabled via ELASTICKV_REDIS_ONEPHASE_DEDUP=0. A design proposal justifies this change with Jepsen evidence, and the scheduled test workflow is updated to preserve a dedup-OFF baseline control.

Changes

Redis One-Phase Transaction Dedup Default-On Implementation

Layer / File(s) Summary
Design proposal and rollout plan
docs/design/2026_06_10_proposed_redis_onephase_dedup_default_on.md
Documents the default-on flip with M4 Jepsen evidence (green runs), environment-variable semantics change (opt-in =1 → opt-out =0), control-workflow baseline preservation, Issue #937 closure notes, and rollout/monitoring instructions.
Adapter initialization and documentation update
adapter/redis.go
NewRedisServer logic flips from ELASTICKV_REDIS_ONEPHASE_DEDUP == "1" to != "0" to enable dedup by default. Comments on onePhaseTxnDedup and WithOnePhaseTxnDedup updated to reflect default-on behavior with opt-out semantics and constructor override precedence.
Scheduled test workflow baseline preservation
.github/workflows/jepsen-test-scheduled.yml
Explicitly sets ELASTICKV_REDIS_ONEPHASE_DEDUP: "0" at job level with comments explaining this workflow as the legacy dedup-OFF baseline, ensuring two scheduled workflows exercise distinct dedup states after the default change.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • bootjp/elastickv#887: Added the runTransactionWithDedup EXEC reuse path that this PR now makes the default behavior by flipping the dedup gate.

Poem

🐰 A flip of the gate, from OFF to ON,
Dedup now defaults when the sunrise shines—
One less thing for ops to configure,
The scheduled test keeps the old way alive,
Safe baseline guarding the new default's stride. ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: flipping the Redis one-phase transaction dedup default to on, and references the closing issue.
Linked Issues check ✅ Passed The PR directly addresses issue #937 by implementing the one-phase txn dedup default-on flip, backed by M4 Jepsen evidence (12 green runs), which resolves the duplicate-elements and future-read anomalies.
Out of Scope Changes check ✅ Passed All changes (design doc, redis.go env-var inversion, scheduled workflow baseline preservation) are directly scoped to implementing the dedup default flip and documenting the rollout.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/redis-onephase-dedup-default-on

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request flips the default state of the Redis one-phase transaction deduplication feature (onePhaseTxnDedup) to enabled, updating the configuration logic and adding a detailed design document. The review feedback highlights a broken reference link in the new design document and suggests improving the environment variable parsing in adapter/redis.go to also recognize 'false' as a disabling value, preventing potential operator misconfiguration.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.


The control workflow (`jepsen-test-scheduled.yml`, gate off) continues
to surface `:duplicate-elements` and the related real-time cycle
anomalies as expected; the 2026-06-05 18:37 failure ([#937]) is the

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The reference link [#937] on this line is broken because there is no corresponding link definition at the bottom of the document. Please use an inline link here to ensure it resolves correctly.

Suggested change
anomalies as expected; the 2026-06-05 18:37 failure ([#937]) is the
anomalies as expected; the 2026-06-05 18:37 failure ([#937](https://github.com/bootjp/elastickv/issues/937)) is the

Comment thread adapter/redis.go
// See docs/design/2026_06_10_proposed_redis_onephase_dedup_default_on.md.
// ELASTICKV_REDIS_ONEPHASE_DEDUP=0 opts out; the WithOnePhaseTxnDedup
// constructor option still trumps the env var.
onePhaseTxnDedup: os.Getenv("ELASTICKV_REDIS_ONEPHASE_DEDUP") != "0",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Checking only != "0" to enable the feature by default can be a usability footgun for operators. If an operator attempts to explicitly disable the feature using common falsey values like "false", the check will evaluate to true and keep the feature enabled. Consider also checking against "false" to prevent accidental misconfiguration.

Suggested change
onePhaseTxnDedup: os.Getenv("ELASTICKV_REDIS_ONEPHASE_DEDUP") != "0",
onePhaseTxnDedup: os.Getenv("ELASTICKV_REDIS_ONEPHASE_DEDUP") != "0" && strings.ToLower(os.Getenv("ELASTICKV_REDIS_ONEPHASE_DEDUP")) != "false",

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a6af333f48

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread adapter/redis.go
// See docs/design/2026_06_10_proposed_redis_onephase_dedup_default_on.md.
// ELASTICKV_REDIS_ONEPHASE_DEDUP=0 opts out; the WithOnePhaseTxnDedup
// constructor option still trumps the env var.
onePhaseTxnDedup: os.Getenv("ELASTICKV_REDIS_ONEPHASE_DEDUP") != "0",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve SET overwrites before defaulting dedup on

With this default flip, every standalone SET now takes the set() dedup branch unless operators opt out; that branch calls runTransactionWithDedup, whose applySet path does not match the legacy standalone executeSet semantics for existing collection keys. For example, SET k v after RPUSH k x now returns WRONGTYPE (and for other collection encodings can write the string without deleting old collection rows), whereas the previous setLegacy path used replaceWithStringTxn/deleteLogicalKeyElems to overwrite the logical key. This breaks normal Redis overwrite behavior for any default-configured server that sets a key previously used as a list/hash/set/zset/stream.

Useful? React with 👍 / 👎.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
.github/workflows/jepsen-test-scheduled.yml (1)

42-54: ⚡ Quick win

Add a fail-fast assertion that this workflow is truly dedup-OFF.

The explicit env var is good, but adding a launch-step assertion (== "0") will prevent silent control/baseline drift in future edits.

Suggested hardening
       - name: Launch demo cluster
         run: |
           set -euo pipefail
+          if [ "${ELASTICKV_REDIS_ONEPHASE_DEDUP:-}" != "0" ]; then
+            echo "FATAL: ELASTICKV_REDIS_ONEPHASE_DEDUP must be '0' in the control workflow"
+            exit 2
+          fi
           mkdir -p "$GOCACHE" /tmp/go-tmp
           export GOTMPDIR=/tmp/go-tmp
           nohup go run cmd/server/demo.go > /tmp/elastickv-demo.log 2>&1 &
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/jepsen-test-scheduled.yml around lines 42 - 54, Add a hard
fail-fast assertion that ELASTICKV_REDIS_ONEPHASE_DEDUP is exactly "0" at
workflow startup to prevent silent drift; implement this as an early job/step
(e.g., a "validate-dedup-flag" step) that reads the
ELASTICKV_REDIS_ONEPHASE_DEDUP env var and exits non‑zero if its value is not
the literal string "0", causing the workflow to fail immediately when the flag
is incorrect. Reference the ELASTICKV_REDIS_ONEPHASE_DEDUP env var in this
validation step and place it before any test/launch steps that depend on the
dedup-OFF baseline.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/design/2026_06_10_proposed_redis_onephase_dedup_default_on.md`:
- Around line 83-84: The markdown uses an unresolved reference token "[`#937`]" in
the sentence about the 2026-06-05 failure; fix by either adding a matching
reference definition for "[`#937`]" (e.g., add a reference entry at the end of the
document with the correct URL and optional title) or replace the token with an
inline link such as [`#937`](https://example.com/issue/937) so the link renders
correctly; update the reference to point to the actual issue/PR URL and ensure
the label matches the token used in the text.

---

Nitpick comments:
In @.github/workflows/jepsen-test-scheduled.yml:
- Around line 42-54: Add a hard fail-fast assertion that
ELASTICKV_REDIS_ONEPHASE_DEDUP is exactly "0" at workflow startup to prevent
silent drift; implement this as an early job/step (e.g., a "validate-dedup-flag"
step) that reads the ELASTICKV_REDIS_ONEPHASE_DEDUP env var and exits non‑zero
if its value is not the literal string "0", causing the workflow to fail
immediately when the flag is incorrect. Reference the
ELASTICKV_REDIS_ONEPHASE_DEDUP env var in this validation step and place it
before any test/launch steps that depend on the dedup-OFF baseline.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8efe9515-79ab-4fbd-9052-b117197c57a1

📥 Commits

Reviewing files that changed from the base of the PR and between ed82233 and a6af333.

📒 Files selected for processing (3)
  • .github/workflows/jepsen-test-scheduled.yml
  • adapter/redis.go
  • docs/design/2026_06_10_proposed_redis_onephase_dedup_default_on.md

Comment thread docs/design/2026_06_10_proposed_redis_onephase_dedup_default_on.md Outdated
@bootjp

bootjp commented Jun 10, 2026

Copy link
Copy Markdown
Owner Author

@claude review

PR #943 round-1 codex P1: flipping onePhaseTxnDedup to default-on
exposed a pre-existing semantic divergence on the standalone SET path.
txnContext.applySet (adapter/redis.go:2356) returns WRONGTYPE when the
target key already holds a list/hash/set/zset/stream, while the legacy
setLegacy → executeSet → replaceWithStringTxn path correctly deletes
the collection's logical elements and writes the string. Real Redis
allows SET to overwrite any prior type unconditionally; the default
flip would have silently broken normal overwrite semantics for any
default-configured server.

This commit narrows the default flip — the standalone SET path now
requires *two* gates to route through runTransactionWithDedup:
onePhaseTxnDedup (the umbrella, default on) AND a new
standaloneSetDedup sub-gate (default off, opt-in via
WithStandaloneSetDedup or ELASTICKV_REDIS_ONEPHASE_DEDUP_SET=1). With
the default config, the standalone SET branch falls through to
setLegacy, preserving legacy overwrite semantics. MULTI/EXEC and
list-push paths — the parent design's actual M4-validated workloads —
keep the dedup default-on win unchanged.

Bringing applySet to parity (collection-deletion + string write inside
the dedup txn buffer) is a larger surgery tracked as a follow-up;
until then the sub-gate stays default-off.

Per the CLAUDE.md "add failing test then fix" convention:
- TestRedis_SET_OverwritesList_UnderDefaultGate (new file
  adapter/redis_set_overwrite_default_test.go) seeds RPUSH k x, runs
  SET k v under default config, asserts OK + GET returns v.
  Fails on the pre-fix build (WRONGTYPE), passes on the fixed build.
- Existing TestStandaloneSetDedup_* tests now set standaloneSetDedup:
  true alongside onePhaseTxnDedup: true to keep pinning the dedup-on
  routing they were designed to exercise.

Also addresses gemini medium: "[#937]" broken reference-style link in
the design doc replaced with an inline link.

Verified: go vet ./... clean, go build ./... clean,
golangci-lint run ./adapter/... 0 issues,
TestRedis_SET_OverwritesList_UnderDefaultGate + TestStandaloneSetDedup_*
+ Dedup/OnePhase test family all pass.
@bootjp

bootjp commented Jun 10, 2026

Copy link
Copy Markdown
Owner Author

@claude review

Round-1 responses (commit a5909eb):

codex P1 — Preserve SET overwrites before defaulting dedup on ✅ FIXED

You're right — txnContext.applySet (adapter/redis.go:2356) returns WRONGTYPE for SET-over-collection, while the legacy setLegacy → executeSet → replaceWithStringTxn path correctly deletes the collection's logical elements and writes the string. Flipping onePhaseTxnDedup default-on without a separate gate on standalone SET would have broken normal Redis overwrite semantics.

Fix: introduced a per-path sub-gate standaloneSetDedup (default off). The standalone SET branch in set() now requires both onePhaseTxnDedup and standaloneSetDedup to be on. With the default config the SET path falls back to setLegacy, preserving legacy semantics. MULTI/EXEC and list-push paths (the parent design's actual M4-validated workloads) are unaffected.

Regression test TestRedis_SET_OverwritesList_UnderDefaultGate (new file adapter/redis_set_overwrite_default_test.go) seeds RPUSH k x, runs SET k v under default config, asserts OK + GET returns v. Fails on the pre-fix build (WRONGTYPE), passes on the fixed build.

Bringing applySet to parity with executeSet is a larger surgery; tracking as a follow-up before re-flipping the sub-gate.

gemini medium — broken [#937] reference link ✅ FIXED

Replaced with an inline link on line 83.

gemini medium — accept "false" as opt-out ❌ DECLINED

elastickv's env-var convention across the codebase is binary "0"/"1" (e.g. ELASTICKV_REDIS_TRACE == "1" two lines above the SET dedup site). Accepting "false" without also accepting "no" / "off" / "FALSE" would be inconsistent; accepting all of them would diverge from every other elastickv env-var read. The opt-out value "0" is documented in the godoc and in-file comment. Project-wide convention shift toward truthy-string parsing is a separate refactor — out of scope here.

CI Analyze (go) failure — workflow permissions, not code 🚧 NOT FIXED HERE

CodeQL ran fine (287/605 files, 7 diagnostics, SARIF exported) but the upload step failed with security-events: write missing from GITHUB_TOKEN scope. Workflow-configuration concern, not a code change for #943. Will track separately.

coderabbit minor — same [#937] link as gemini ✅ FIXED (subsumed)

Verification

  • go vet ./... clean
  • go build ./... clean
  • golangci-lint run ./adapter/... 0 issues
  • TestStandaloneSetDedup + TestRedis_SET_OverwritesList_UnderDefaultGate (5 tests) — all pass
  • Dedup/OnePhase/PrevCommit test family — all pass

@claude

claude Bot commented Jun 10, 2026

Copy link
Copy Markdown

Claude finished @bootjp's task in 3m 51s —— View job


PR #943 Review — Round 2

  • Read all changed files
  • Review adapter/redis.go changes
  • Review test files
  • Review design doc
  • Review CI workflow changes
  • Post findings

Summary

The round-1 fixes are solid. The core default-on flip is well-supported and the codex-P1 regression is correctly contained behind the standaloneSetDedup sub-gate. A few minor follow-up notes below — none are blockers.


adapter/redis.go

Gate logic — correct (adapter/redis.go:536–539)

onePhaseTxnDedup:   os.Getenv("ELASTICKV_REDIS_ONEPHASE_DEDUP") != "0",
standaloneSetDedup: os.Getenv("ELASTICKV_REDIS_ONEPHASE_DEDUP_SET") == "1",

The != "0" / == "1" split is consistent with the rest of the codebase (ELASTICKV_REDIS_TRACE == "1" two lines above), and the constructor options correctly trump the env vars because options are applied after the struct literal. The combination ELASTICKV_REDIS_ONEPHASE_DEDUP=0 + ELASTICKV_REDIS_ONEPHASE_DEDUP_SET=1 is also safe — the outer gate is checked first (r.onePhaseTxnDedup && r.standaloneSetDedupfalse), so setLegacy is used.

Field comment inconsistency — minor (adapter/redis.go:254–265)

The standaloneSetDedup field comment says the parity gap is "SET-over-collection: SET k v after RPUSH k x" and mentions only isListKeyAt. But applySet (line 2393) only guards the list case — for hash/set/zset/stream, it falls through to t.load() + raw write without deleting collection internal keys (silent orphan). The WithStandaloneSetDedup godoc (lines 290–292) correctly lists all five collection types; the field comment should match. This is a documentation gap, not a functional bug (the sub-gate defaults off and the follow-up issue should track all five types).

Fix this →

Pre-existing: applySet passes context.Background() instead of t.ctx (adapter/redis.go:2393)

if isList, err := t.server.isListKeyAt(context.Background(), cmd.Args[1], t.startTS); err != nil {

Timeout/cancellation from the parent handler (redisDispatchTimeout) won't propagate into this store read. Pre-existing (not introduced here), but will matter when standaloneSetDedup is eventually enabled for production traffic. Worth fixing in the applySet parity work.

Pre-existing: applySet uses store read, not staged state, for the list check (adapter/redis.go:2393)

isListKeyAt reads from the MVCC store at t.startTS — it doesn't consult t.listStates. A DEL-then-SET sequence within the same MULTI/EXEC batch would incorrectly produce WRONGTYPE on the SET even though the prior DEL logically removed the list. This is pre-existing behavior (both runTransaction and runTransactionWithDedup call txn.applyapplySet), and the existing Jepsen workloads don't hit this pattern, so it's not a regression here. Track with the applySet parity work.


adapter/redis_set_overwrite_default_test.go (new)

Good regression coverage. The test seeds a list, runs SET k v under default config (onePhaseTxnDedup=true, standaloneSetDedup=false), and asserts OK + correct GET. The comment chain from the codex P1 report through to the fix is preserved, which makes the failure mode obvious if the sub-gate is ever accidentally promoted.

Minor gap: Only the list case is covered. A table-driven extension adding hash/set/zset seeds would give more complete coverage for the invariant (all collection types must be safely overwritten by standalone SET under default config). The fix routes them all to setLegacyreplaceWithStringTxndeleteLogicalKeyElems, so they'd all pass — but having the tests makes the invariant explicit before the applySet parity follow-up lands.


adapter/redis_set_dedup_test.go

The three test structs (TestStandaloneSetDedup_*) now explicitly set standaloneSetDedup: true alongside onePhaseTxnDedup: true. The comment explaining the sub-gate is correct. No issues.


.github/workflows/jepsen-test-scheduled.yml

The block comment explaining ELASTICKV_REDIS_ONEPHASE_DEDUP: "0" is clear and references the companion workflow. The CodeRabbit suggestion (fail-fast assertion in the launch step) is a reasonable hardening but not a blocker — the env is set at the job level so it can't drift without a workflow file edit.


Design doc (2026_06_10_proposed_redis_onephase_dedup_default_on.md)

The inline link fix for #937 is correct. The M4 evidence table is thorough. The standaloneSetDedup sub-gate discussion in §M1 is accurate.

One nit: the design doc still says (§M1, around line 50) "standalone SET, etc. already ship and have been exercised by the dedup-mode Jepsen workflow." After round-1's fix, standalone SET is not on the dedup path by default (standaloneSetDedup=false). This sentence could be misread as "standalone SET dedup is validated by the Jepsen runs" when it isn't. Worth a one-line clarification.


Verdict

LGTM. The round-1 fixes correctly address the codex P1 regression. The default-on flip for MULTI/EXEC and list-push retries is cleanly backed by M4 evidence. The field-comment inconsistency and the design doc sentence are minor documentation cleanups that could land as a follow-on commit or be folded in before merge — neither is a correctness issue.
| Branch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Jepsen Scheduled Stress (Redis): duplicate-elements + future-read + G-single/G2-item-realtime on main

1 participant