adapter: wrap DynamoDB query/scan/transactGetItems in LeaseRead#952
adapter: wrap DynamoDB query/scan/transactGetItems in LeaseRead#952bootjp wants to merge 1 commit into
Conversation
|
Warning Review limit reached
More reviews will be available in 38 minutes and 35 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@claude review |
|
Claude finished @bootjp's task in 3m 53s —— View job Code Review: adapter: wrap DynamoDB query/scan/transactGetItems in LeaseReadOverall: The implementation is correct, well-structured, and follows the Checklist
Findings1.
|
There was a problem hiding this comment.
Code Review
This pull request introduces quorum-freshness lease checks for multi-shard read operations (Query, Scan) and multi-item transactions (TransactGetItems) in the DynamoDB adapter, along with corresponding unit tests. The reviewer identified a critical performance bottleneck in leaseCheckTransactGetItems where lease checks are executed sequentially for each unique key. This sequential execution could easily exceed the timeout limit for large transactions, and the reviewer provided a concurrent implementation using goroutines to resolve this issue.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| func (d *DynamoDBServer) leaseCheckTransactGetItems(w http.ResponseWriter, r *http.Request, in transactGetItemsInput) bool { | ||
| tentativeTS := snapshotTS(d.coordinator.Clock(), d.store) | ||
| schemaCache := make(map[string]*dynamoTableSchema) | ||
| seenKeys := make(map[string]struct{}, len(in.TransactItems)) | ||
| leaseCtx, leaseCancel := context.WithTimeout(r.Context(), dynamoLeaseReadTimeout) | ||
| defer leaseCancel() | ||
| for _, item := range in.TransactItems { | ||
| itemKey, ok := d.transactGetItemKey(r.Context(), item, schemaCache, tentativeTS) | ||
| if !ok { | ||
| continue | ||
| } | ||
| if _, dup := seenKeys[string(itemKey)]; dup { | ||
| continue | ||
| } | ||
| seenKeys[string(itemKey)] = struct{}{} | ||
| if _, err := kv.LeaseReadForKeyThrough(d.coordinator, leaseCtx, itemKey); err != nil { | ||
| writeDynamoError(w, http.StatusInternalServerError, dynamoErrInternal, err.Error()) | ||
| return false | ||
| } | ||
| } | ||
| return true | ||
| } |
There was a problem hiding this comment.
Executing kv.LeaseReadForKeyThrough sequentially for each unique key in TransactGetItems introduces a significant performance bottleneck. If a transaction contains many items (up to transactGetItemsMaxItems = 100) and a lease is missed or the coordinator falls back to LinearizableRead (e.g., if leases are unsupported or disabled), these reads will execute sequentially. This results in up to 100 sequential Raft round-trips, which will easily exceed the dynamoLeaseReadTimeout (5s) and cause client requests to fail with a 500 InternalServerError.
To prevent this sync.WaitGroup and a buffered error channel.
func (d *DynamoDBServer) leaseCheckTransactGetItems(w http.ResponseWriter, r *http.Request, in transactGetItemsInput) bool {
tentativeTS := snapshotTS(d.coordinator.Clock(), d.store)
schemaCache := make(map[string]*dynamoTableSchema)
seenKeys := make(map[string]struct{}, len(in.TransactItems))
var uniqueKeys [][]byte
for _, item := range in.TransactItems {
itemKey, ok := d.transactGetItemKey(r.Context(), item, schemaCache, tentativeTS)
if !ok {
continue
}
if _, dup := seenKeys[string(itemKey)]; dup {
continue
}
seenKeys[string(itemKey)] = struct{}{}
uniqueKeys = append(uniqueKeys, itemKey)
}
if len(uniqueKeys) == 0 {
return true
}
leaseCtx, leaseCancel := context.WithTimeout(r.Context(), dynamoLeaseReadTimeout)
defer leaseCancel()
var wg sync.WaitGroup
errChan := make(chan error, len(uniqueKeys))
for _, key := range uniqueKeys {
wg.Add(1)
go func(itemKey []byte) {
defer wg.Done()
if _, err := kv.LeaseReadForKeyThrough(d.coordinator, leaseCtx, itemKey); err != nil {
errChan <- err
}
}(key)
}
wg.Wait()
close(errChan)
if err := <-errChan; err != nil {
writeDynamoError(w, http.StatusInternalServerError, dynamoErrInternal, err.Error())
return false
}
return true
}There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 528d63148f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| func (d *DynamoDBServer) leaseReadKeyless(w http.ResponseWriter, r *http.Request) bool { | ||
| leaseCtx, leaseCancel := context.WithTimeout(r.Context(), dynamoLeaseReadTimeout) | ||
| defer leaseCancel() | ||
| if _, err := kv.LeaseReadThrough(d.coordinator, leaseCtx); err != nil { |
There was a problem hiding this comment.
Route Query lease checks by the queried key
When DynamoDB is backed by ShardedCoordinator, this keyless LeaseRead only checks the default shard (kv/sharded_coordinator.go:1517-1524), not the shard that owns the Query's partition-key prefix. For a Query whose hash key routes to a non-default shard, an unavailable default group now returns InternalServerError before queryItems can read from the healthy owning shard; conversely this check still does not establish freshness for the target shard. Resolve the query key/prefix first and use LeaseReadForKey for that shard (or otherwise check the actual range owners) instead of the default-group lease.
Useful? React with 👍 / 👎.
Summary
Part of #557. Wraps the remaining DynamoDB read handlers in a quorum-freshness
LeaseReadcheck, following the pattern PR #549 established forgetItem. Previously these handlers read localLastCommitTS/snapshot state with no quorum check at all, so a partitioned-but-not-yet-stepped-down stale leader could serve stale reads. Each handler now performs one lease check (cheap atomic-load fast path within the lease window; oneLinearizableReadon a lease miss, after which the lease is refreshed for the rest of the window) before resolving its read timestamp.The Redis slice of #557 is deliberately left for a follow-up — this PR touches only
adapter/dynamodb.go.Per-handler placement
query— keylessLeaseRead(ctx)(leaseReadKeylesshelper) beforequeryItemssamplesreadTS. A Query scans a key range that can span shards in a multi-group deployment, and the owning shard can't be resolved in the handler without duplicatingprepareReadSchema/resolveQueryCondition, so the keyless check (which anchors the cluster freshness bound) is the correct, cheaper call.scan— keylessLeaseRead(ctx)(same helper) beforescanItemssamplesreadTS. A Scan reads the whole table and spans every shard holding its items, so keyless is the right choice (matches the issue's guidance for keyless/multi-shard ops).transactGetItems— per-keyLeaseReadForKey, deduplicated by item key, run inleaseCheckTransactGetItemsbeforenextTxnReadTS()resolves the single snapshot timestamp. Item keys are resolved at a tentative schema timestamp purely to route the lease check; the single-snapshot-ts semantics (onereadTSshared by all items) are unchanged. Malformed items are skipped in the pre-pass so the existing validation/error mapping still surfaces downstream identically. Within the lease window, same-shard keys hit the cheap fast path, so per-key checks are effectively free.Timeout decision
Every wrapped handler runs the lease check under a bounded
context.WithTimeout(r.Context(), dynamoLeaseReadTimeout)withdefer cancel()— the exact constant (5s) and structuregetItemuses after PR #549 — so a stalled Raft cannot hang a handler indefinitely when the client never cancels.Note on
batchGetItemThe issue lists
batchGetItemin scope, but there is noBatchGetItemhandler in this codebase (nobatchGetItemTarget, no registration, no implementation). There was nothing to wrap; the multi-key precedent it asked about is implemented intransactGetItems(per-keyLeaseReadForKey, deduped). IfBatchGetItemis added later, it should reuse the sameleaseCheckTransactGetItems-style per-key dedup pattern.Behavior change
Adds a quorum-freshness check to
query/scan/transactGetItems. Read results and error mapping are otherwise identical; lease-read failure surfaces as the same500 InternalServerError(dynamoErrInternal) thatgetItemproduces.Risk
Low. The only new failure mode is a
500when the lease check fails (quorum loss), which is strictly more correct than serving a stale read. No write paths, FSM, or snapshot semantics touched. The added per-requestLeaseReadis amortized by the lease window.Test evidence
go test -race -run TestDynamoDB ./adapter/→ ok (15s)TestDynamoDB_Query_LeaseRead,TestDynamoDB_Scan_LeaseRead,TestDynamoDB_TransactGetItems_LeaseRead(each asserts (a) success with a healthy lease and (b) lease failure → 500InternalServerError, mirroringgetItem's error class) plus existingTestDynamoDB_TransactGetItems*→ ok with-race.golangci-lint --config=.golangci.yaml run ./adapter/...→ 0 issues.go vet ./adapter/,gofmt -l→ clean.go test -race ./adapter/exceeds the default 10-minute package timeout (the suite spans Redis/S3/SQS/gRPC/etc., unrelated to this change); it passes under an extended-timeout.New tests inject lease-read failures via a
failLeaseReadstoggle added to the existingtestCoordinatorWrapper, and drive the failure path over raw HTTP so the AWS SDK does not retry the500.Self-review
return nil-after-error introduced; lease failure returns an explicit500.dynamoLeaseReadTimeoutwithdefer cancel(), so a leader change / partition can't hang a handler.transactGetItemskeeps its single sharedreadTS; the lease pre-pass runs before the timestamp is sampled, so a leadership flip mid-pass only causes a lease miss (→LinearizableRead) or a500, never a torn snapshot. Rango test -raceon the dynamo suite.LeaseReadper request, amortized by the lease window (atomic-load fast path; oneLinearizableReadper lease miss, then refreshed).transactGetItemsdedups by item key and same-shard keys hit the fast path, so per-key checks add negligible cost; no extra Raft round-trips per HLC tick. The tentative schema load in the transact pre-pass is a cheap, cached Pebble read.readTSis sampled after the lease confirmation in every handler, so any commit that landed before confirmation is visible; this tightens the freshness bound without bypassing the leader-issued read pipeline.transactGetItemssingle-snapshot-ts semantics are preserved.testCoordinatorWrapper/createNodeharness and mirroringgetItem's error class on failure.