Skip to content

perf(webapp): parallelize Phase 2 streaming batch-item ingest (TRI-10273)#3777

Open
matt-aitken wants to merge 2 commits into
mainfrom
feature/tri-10273-managed-runs-streaming-batchtriggerandwait-fails-with-408
Open

perf(webapp): parallelize Phase 2 streaming batch-item ingest (TRI-10273)#3777
matt-aitken wants to merge 2 commits into
mainfrom
feature/tri-10273-managed-runs-streaming-batchtriggerandwait-fails-with-408

Conversation

@matt-aitken
Copy link
Copy Markdown
Member

Problem

Phase 2 of the v3 streaming batch API (POST /api/v3/batches/:batchId/items) processed streamed items strictly sequentially. For a batch of many large payloads — each offloaded to object storage inline — this serialized N object-store round-trips inside a single request, exceeding Node's default server.requestTimeout (300s). The webapp returned 408, which the SDK reads as 408 terminated and retries 5×, turning a slow ingest into a ~26-minute failure (BatchTriggerError: Failed to stream items ... 408 terminated).

Closes TRI-10273 — https://linear.app/triggerdotdev/issue/TRI-10273

Fix

Ingest now runs through p-map over the NDJSON async iterable with bounded concurrency (STREAMING_BATCH_INGEST_CONCURRENCY, default 10):

  • p-map pulls lazily from the stream — at most concurrency items are read/in-flight at once, so peak memory is bounded to roughly concurrency × STREAMING_BATCH_ITEM_MAXIMUM_SIZE and request-body backpressure is preserved.
  • Set the env to 1 for fully sequential ingestion (escape hatch).
  • The default lives only in env.server.ts; the service takes a required number.

Why this is safe (ordering/idempotency unchanged)

  • Ordering derives from each item's index (enqueue timestamp = batch.createdAt + index), not enqueue order.
  • Dedup is atomic per index in enqueueBatchItem.
  • The NDJSON parser now stamps oversized-item markers with their emit position, removing the consumer's sequential lastIndex assumption (the only order-dependent bit).
  • The count-check + conditional seal path is untouched.

Tests

  • 100-item batch ingested concurrently → all enqueued + sealed, correct counts
  • in-flight processing never exceeds the configured concurrency (real instrumented payload processor)
  • concurrent dedup on Phase 2 retry (pre-enqueued half re-streamed)
  • emit-position marker indexing (parser unit test)
  • Full existing sealing/idempotency suite still green — 42/42 pass; webapp typecheck clean.

Follow-ups (not in this PR)

  • SDK pre-offload of large item payloads (send application/store refs instead of raw blobs) to remove object-store work from the request hot path and shrink the request body — bigger, protocol-level change.
  • Optional server.requestTimeout bump as a safety net.

🤖 Generated with Claude Code

…273)

Phase 2 of the v3 streaming batch API (POST /api/v3/batches/:batchId/items)
processed streamed items strictly sequentially. For batches of many large
payloads — each offloaded to object storage inline — this serialized N object-store
round-trips inside one request, blowing past Node's default 300s server.requestTimeout.
The webapp then returned 408, which the SDK reads as "408 terminated" and retries 5x,
turning a slow ingest into a ~26-minute failure.

Ingest now runs through p-map over the NDJSON async iterable with bounded concurrency
(STREAMING_BATCH_INGEST_CONCURRENCY, default 10). p-map pulls lazily, so at most
`concurrency` items are read/in-flight at once — bounding peak memory to roughly
concurrency x STREAMING_BATCH_ITEM_MAXIMUM_SIZE while preserving stream backpressure.
Set the env to 1 for fully sequential ingestion.

Safe by construction: run order derives from each item's index (enqueue timestamp =
batch.createdAt + index), and enqueueBatchItem dedups atomically per index — neither
depends on processing order. The NDJSON parser now stamps oversized-item markers with
their emit position, removing the consumer's sequential lastIndex assumption. The
count-check + conditional seal path is unchanged.

Tests: bounded-concurrency ingest of a 100-item batch, in-flight cap assertion,
concurrent dedup on Phase 2 retry, and emit-position marker indexing. Full existing
sealing/idempotency suite still green (42/42).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented May 29, 2026

⚠️ No Changeset found

Latest commit: 4ca9807

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 29, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 66a30d7b-b70c-4b00-87ac-a15da63a51c8

📥 Commits

Reviewing files that changed from the base of the PR and between d5daeb8 and 4ca9807.

📒 Files selected for processing (3)
  • apps/webapp/app/env.server.ts
  • apps/webapp/app/runEngine/services/streamBatchItems.server.ts
  • apps/webapp/test/engine/streamBatchItems.test.ts
🚧 Files skipped from review as they are similar to previous changes (3)
  • apps/webapp/app/env.server.ts
  • apps/webapp/app/runEngine/services/streamBatchItems.server.ts
  • apps/webapp/test/engine/streamBatchItems.test.ts
📜 Recent review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (8, 8)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (3, 8)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (7, 8)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (2, 8)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (6, 8)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (4, 8)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (1, 8)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (5, 8)
  • GitHub Check: typecheck / typecheck
  • GitHub Check: e2e-webapp / 🧪 E2E Tests: Webapp
  • GitHub Check: 🛡️ E2E Auth Tests (full)
  • GitHub Check: audit
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Analyze (python)

Walkthrough

This pull request refactors the Phase 2 streaming batch ingest endpoint to process NDJSON items with bounded concurrency rather than sequentially. It introduces a configurable STREAMING_BATCH_INGEST_CONCURRENCY environment variable (default 10) and replaces sequential for-await iteration with p-map to limit in-flight items and bound memory usage. The NDJSON parser now tracks emit positions to backfill oversized item marker indices when extraction fails, ensuring consistent ordering across concurrent execution. All operational guarantees—index-based ordering, atomic deduplication per index, and idempotency—are preserved. Test coverage includes new validation for concurrency bounds and deduplication behavior.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: parallelizing Phase 2 streaming batch-item ingestion with bounded concurrency, directly addressing the performance issue in the PR objectives.
Description check ✅ Passed The PR description covers the problem, fix, safety guarantees, and test coverage comprehensively. However, it lacks explicit completion of the template's checklist and screenshots section.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/tri-10273-managed-runs-streaming-batchtriggerandwait-fails-with-408

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 4 additional findings.

Open in Devin Review

coderabbitai[bot]

This comment was marked as resolved.

@mintlify
Copy link
Copy Markdown
Contributor

mintlify Bot commented May 29, 2026

Preview deployment for your docs. Learn more about Mintlify Previews.

Project Status Preview Updated (UTC)
trigger 🟢 Ready View Preview May 29, 2026, 6:09 PM

💡 Tip: Enable Workflows to automatically generate PRs for you.

- Enforce positive STREAMING_BATCH_INGEST_CONCURRENCY in the env schema
  (.int().positive()) — p-map requires concurrency >= 1, so 0/negative would
  throw at runtime.
- Apply the same out-of-range index guard to oversized-item markers as normal
  items, so an oversized item with index >= runCount returns a 4xx instead of
  creating a stray pre-failed run. Covered by a new test.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant