Skip to content

feat(tools): queue hosted-key tool calls instead of failing with 429#4416

Merged
TheodoreSpeaks merged 4 commits into
stagingfrom
feat/queued-hosted-key
May 26, 2026
Merged

feat(tools): queue hosted-key tool calls instead of failing with 429#4416
TheodoreSpeaks merged 4 commits into
stagingfrom
feat/queued-hosted-key

Conversation

@TheodoreSpeaks
Copy link
Copy Markdown
Collaborator

@TheodoreSpeaks TheodoreSpeaks commented May 3, 2026

Summary

  • Hosted-key tool calls (Sim-provided keys, not BYOK) now enqueue onto a per-workspace+provider FIFO queue. Only the head of the queue consumes from the token bucket — strict ordering, no racing.
  • Different workspaces have independent queues. BYOK paths short-circuit before any of this and are unaffected.
  • Total wait (queue position + bucket refill) capped at 5 minutes; over the cap returns the existing 429 result.
  • Crash-tolerant: each ticket has a heartbeat key (TTL 30s, refreshed every 10s while waiting). Dead heads are reaped lazily by the next caller. Queue list TTL is 10 minutes for fully abandoned queues.
  • One Lua script per poll (reap + head-check + self-presence-check atomic) keeps Redis traffic low under contention.
  • Bump Exa search hosted RPM from 5 → 60.
  • New telemetry: platform.hosted_key.queue_waited (with queuePosition field) and platform.hosted_key.queue_wait_exceeded.

Type of Change

  • New feature

Testing

  • 39 hosted-key tests pass (15 queue + 24 rate-limiter, including FIFO ordering, head-only consume, dead-head reap, cap-exceeded, missing-ticket fall-through)
  • 141/141 across rate-limiter + tools regression
  • Manually verified in dev: depth, head rotation, heartbeat refresh, drain rate match the bucket config
  • bun run lint clean
  • bun run check:api-validation:strict passes

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

@vercel
Copy link
Copy Markdown

vercel Bot commented May 3, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
docs Skipped Skipped May 26, 2026 7:38pm

Request Review

Replace the per-call distributed lock with a Redis-backed FIFO queue so
callers within a workspace get strict ordering instead of racing the
bucket. Adds heartbeat-based crash recovery and dead-head reaping in a
single Lua script. Bumps Exa search hosted RPM from 5 to 60.
@TheodoreSpeaks
Copy link
Copy Markdown
Collaborator Author

@BugBot review

@cursor
Copy link
Copy Markdown

cursor Bot commented May 5, 2026

PR Summary

Medium Risk
Introduces blocking FIFO queueing and wait loops around hosted-key acquisition (Redis + polling/heartbeats) and changes tool retry behavior to re-enter the queue after upstream 429s, which could impact throughput/latency and failure modes under contention or Redis issues.

Overview
Hosted-key acquisition is changed from immediate token-bucket racing/429s to a per-workspace+provider FIFO queue: callers enqueue, wait until they reach the head (with heartbeat refresh), then wait for actor and (custom) dimension capacity up to a 5-minute cap before returning the existing 429-style error.

Adds a new Redis-backed HostedKeyQueue (Lua-based checkHead with dead-head reaping, TTLs, and fail-open behavior when Redis is unavailable) plus new telemetry events platform.hosted_key.queue_waited and platform.hosted_key.queue_wait_exceeded.

Tool execution now optionally re-acquires a hosted key and retries once after upstream 429 backoff is exhausted, and Exa search hosted RPM is increased from 5 to 60; tests are expanded to cover queue ordering, heartbeat, cap timeouts, and wait-then-succeed flows.

Reviewed by Cursor Bugbot for commit 0b80ed3. Bugbot is set up for automated code reviews on this repo. Configure here.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 0b80ed3. Configure here.

Comment thread apps/sim/lib/core/rate-limiter/hosted-key/hosted-key-rate-limiter.ts Outdated
@TheodoreSpeaks
Copy link
Copy Markdown
Collaborator Author

@greptile review

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 26, 2026

Greptile Summary

This PR replaces the immediate-429 behavior for hosted-key rate-limit hits with a per-workspace+provider FIFO queue backed by Redis. Each acquireKey call enqueues a ticket, polls until it reaches the head of the queue, and only then attempts to consume from the token bucket; the ticket is always dequeued in a finally block regardless of outcome.

  • Queue lifecycle (queue.ts): RPUSH/EXPIRE/SET on enqueue, a single Lua EVAL that atomically reaps dead heads and returns head/waiting/missing status on every poll, and LREM/DEL on dequeue. Heartbeat keys (30s TTL, refreshed every 10s via shared WaitState) prevent live callers from being reaped as dead. All Redis operations fail-open so the system degrades to plain bucket-racing when Redis is unavailable.
  • Rate-limiter refactor (hosted-key-rate-limiter.ts): acquireKey is restructured into waitForQueueHeadwaitForActorCapacitywaitForDimensionCapacity; each phase shares a single WaitState.lastHeartbeatAt (fixing the heartbeat-expiry regression from earlier review), and heartbeatAwareSleep caps every bucket-wait sleep at the heartbeat interval. Wait budget is bounded by the execution AbortSignal when available, falling back to 5 min; both phases respect the shared deadline.
  • tools/index.ts: Passes executionContext?.abortSignal through to acquireKey, and introduces a reacquireAfterRetriesExhausted hook so a single re-queue attempt is made when upstream 429s exhaust local exponential-backoff retries. Exa hosted RPM is bumped 5 → 60 to take advantage of the queuing.

Confidence Score: 5/5

Safe to merge; the queue logic is well-tested, all Redis failure paths fail-open to plain bucket racing, and the FIFO ordering and heartbeat mechanisms are correctly implemented.

The core queue mechanics — enqueue atomicity, Lua-based head reap, heartbeat sharing across wait phases, dequeue in finally, AbortSignal propagation — are all correct. Test coverage is thorough: FIFO ordering, dead-head reap, cap exceeded, abort mid-sleep, low-RPM heartbeat refresh, and no-Redis fallback are all exercised. The only finding is that attempts in hostedKeyQueueWaited telemetry is always emitted as 1, which only affects observability and not runtime behavior.

No files require special attention; the telemetry attempts field in hosted-key-rate-limiter.ts is the sole minor gap.

Important Files Changed

Filename Overview
apps/sim/lib/core/rate-limiter/hosted-key/queue.ts New FIFO queue implementation: Redis list for ordering + per-ticket heartbeat keys + a single Lua EVAL for atomic dead-head reap + head/waiting/missing status. All Redis failure paths fail-open correctly.
apps/sim/lib/core/rate-limiter/hosted-key/hosted-key-rate-limiter.ts Refactored acquireKey to enqueue/waitForQueueHead/waitForActorCapacity/waitForDimensionCapacity phases with shared WaitState heartbeat tracking; dequeue in finally block ensures cleanup on all exit paths. The attempts field in hostedKeyQueueWaited telemetry is hardcoded to 1.
apps/sim/lib/core/rate-limiter/hosted-key/queue.test.ts New test file with 15 tests covering enqueue position math, checkHead Lua result passthrough, heartbeat refresh, dequeue cleanup, and fail-open/no-op Redis scenarios.
apps/sim/lib/core/rate-limiter/hosted-key/hosted-key-rate-limiter.test.ts Updated tests inject a MockQueue, add FIFO ordering suite (enqueue, dequeue-on-exit, wait-at-head, heartbeat, cap exceeded, missing fall-through) and execution-budget tests (abort, mid-sleep abort, live-signal past cap, low-RPM heartbeat refresh).
apps/sim/lib/core/telemetry.ts Adds two new platform events: hostedKeyQueueWaited and hostedKeyQueueWaitExceeded. The attempts field is always emitted as 1 from the call site.
apps/sim/tools/index.ts Passes abortSignal to acquireKey; adds reacquireHostedKey helper and reacquireAfterRetriesExhausted callback so a single re-queue attempt is made when upstream 429s exhaust local retries.
apps/sim/tools/exa/search.ts Bumps hosted RPM from 5 → 60 for Exa search, leveraging the new queue-based throttling instead of instant 429s.

Sequence Diagram

sequenceDiagram
    participant Caller as Tool Caller
    participant RL as HostedKeyRateLimiter
    participant Q as HostedKeyQueue (Redis)
    participant Bucket as Token Bucket

    Caller->>RL: acquireKey(provider, workspaceId, signal)
    RL->>Q: enqueue(provider, workspaceId, ticketId)
    Q-->>RL: "{ position, enabled }"

    loop waitForQueueHead
        RL->>Q: checkHead (Lua: reap dead + check position)
        Q-->>RL: "waiting | head | missing"
        alt not at head and budget remains
            RL->>Q: maybeRefreshHeartbeat
            RL->>RL: interruptibleSleep(200ms, signal)
        end
    end

    alt queue timed out
        RL-->>Caller: 429 (queue wait exceeded)
    end

    loop waitForActorCapacity
        RL->>Bucket: checkActorRateLimit
        Bucket-->>RL: allowed? retryAfterMs?
        alt not allowed and budget remains
            RL->>Q: maybeRefreshHeartbeat
            RL->>RL: heartbeatAwareSleep(min(retryAfterMs,10s), signal)
        end
    end

    RL->>Caller: success true, key
    RL->>Q: dequeue(ticketId) [finally block]

    Note over Caller,Bucket: On upstream 429 after maxRetries
    Caller->>RL: "reacquireHostedKey -> acquireKey (fresh ticket)"
    RL-->>Caller: fresh key injected into params
    Caller->>Caller: executeToolRequest (one final retry)
Loading

Reviews (3): Last reviewed commit: "feat(rate-limiter): make hosted-key queu..." | Re-trigger Greptile

…ix heartbeat + telemetry

Tie the per-workspace hosted-key queue wait to the surrounding execution
budget instead of a flat 5-minute cap. acquireKey now accepts the execution
AbortSignal (threaded from ExecutionContext): when present, the wait is
bounded by the run's actual plan timeout / cancellation, with the enterprise
async ceiling as a backstop; when absent it falls back to MAX_QUEUE_WAIT_MS.
This lets long-running async (Trigger.dev) runs use their full budget while
no longer letting a single queued call burn a short sync run's entire budget.

Also addresses Greptile review:
- P1: share one lastHeartbeatAt across all wait phases and cap every sleep to
  HEARTBEAT_REFRESH_INTERVAL_MS so a long low-RPM retryAfterMs can no longer
  let the head's heartbeat lapse mid-wait and break FIFO ordering.
- P2: derive hostedKeyQueueWaited telemetry reason from the actual bottleneck
  (queue_position / dimension / actor_requests) instead of hardcoding it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the plain capped sleeps in the queue-head and bucket-capacity wait
loops with an interruptibleSleep that resolves early when the execution
AbortSignal fires (timeout or cancellation), cleaning up its own timer and
listener. Previously a cancelled/timed-out run could overshoot by up to the
heartbeat cap (~10s) before the loop re-checked its budget; now it wakes
within a tick. The cap remains for heartbeat renewal.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@TheodoreSpeaks TheodoreSpeaks marked this pull request as ready for review May 26, 2026 23:34
@TheodoreSpeaks
Copy link
Copy Markdown
Collaborator Author

@greptile review

@TheodoreSpeaks TheodoreSpeaks merged commit 4fa7e74 into staging May 26, 2026
13 checks passed
@waleedlatif1 waleedlatif1 deleted the feat/queued-hosted-key branch May 27, 2026 00:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant