replayio · bhackett1024 · Jul 4, 2026 · Jul 4, 2026 · Jul 4, 2026 · Jul 4, 2026
diff --git a/docs/backend/admin-recording-data.md b/docs/backend/admin-recording-data.md
@@ -31,7 +31,7 @@ token, and addresses the data with `recording_id` (below).
 A **task-scoped agent** (authenticated by its own task id) may additionally read **by `recording_id`,
 but only the recording its own task analyzes** (`taskScope.recordingId`). This lets a polish-script
 run fetch the originating task's rrweb + injected events without the admin token — the polish setup
-harness (`scripts/polish/run-recorded.ts`) does exactly this.
+harness (`scripts/run-recorded-mcp.ts`) does exactly this.
 
 - Not authenticated → `401`.
 - Not an admin, and not requesting your own task's `recording_id` → `403 { "error": "Admin access required" }`.

diff --git a/docs/backend/admin-run-journey-eval.md b/docs/backend/admin-run-journey-eval.md
@@ -80,8 +80,9 @@ The work runs asynchronously in a container. Watch progress at `task_url`, or wa
 
 1. **Run the steps deterministically.** `journey-run-script` executes `journey_steps` as the task's
    fast-mode setup — recording itself and, when a failure handler judges a failure to be a real defect,
-   filing at most one bug (through the same judged `file-bug` webhook). Guidance is resolved
-   `--resolve latest` (staged).
+   reporting `bug_seen` + a `bug_data` blob into the run's `setup_output` (fast-mode journeys no longer
+   file bugs inline; a second-stage `file-journey-bug` task files the bug from the recording — see
+   [`warm-recordings.md`](./warm-recordings.md)). Guidance is resolved `--resolve latest` (staged).
 2. **Review + judge.** The review agent (guidance entry **`handle-journey-eval`**, created on demand if
    missing) reads the run's filed bug and decides — fuzzy, natural-language — whether it matches
    `expected_bug`. This is the one behavioral difference from `admin-run-polish-script`, which compares
@@ -104,6 +105,11 @@ When the run reaches a terminal state, `callback_url` is POSTed once (`Content-T
 application/json`). **Unauthenticated** — secured only by the unguessable `runId` in the path.
 Best-effort, exactly-once (edge-triggered via `eval_callback_fired`); not retried.
 
+> **Second-task ordering.** When the run saw a bug, a `file-journey-bug` task is scheduled to file it
+> from the recording. `container-events` DEFERS `finalizeJourneyEvalRun` + the callback until BOTH the
+> review task and the file-journey-bug task are terminal — so the bug is filed (and the review agent's
+> verdict recorded) before `bug_filed` is reported. See [`warm-recordings.md`](./warm-recordings.md) §6.
+
 ```jsonc
 {
   "event": "journey.eval_run.finished",

diff --git a/docs/backend/admin-run-polish-script.md b/docs/backend/admin-run-polish-script.md
@@ -5,7 +5,7 @@ for a single **signal** — the programmatic counterpart to the polish-script ev
 signal's expected `results` are captured for a recording).
 
 It kicks off one container task that runs **only** the polish setup script
-(`scripts/polish/run-recorded.ts`) against the recording and then stops — **no agent bug-filing
+(`scripts/run-recorded-mcp.ts`) against the recording and then stops — **no agent bug-filing
 afterwards**. The task is pooled in the **guidance-update project / containers** (`proj-guidance`),
 since the script analyzes an existing Replay recording by id and never drives the app. When the run
 finishes, a webhook you supply is POSTed with the results.
@@ -174,7 +174,7 @@ bugs**):
    (The harness only sanity-checks that an `eval_comparison` was produced; if a script forgot to call
    `attachEvalComparison`, the harness records that as an `eval-comparison` setup error so it surfaces
    instead of silently passing. The eval comparison is **no longer** computed harness-side — older
-   builds did it in `run-recorded.ts` and never wrote it back to the output file, so the agent saw no
+   builds did it in `run-recorded-mcp.ts` and never wrote it back to the output file, so the agent saw no
    `eval_comparison` and took no action.)
 3. The container then runs the agent with the **`handle-polish-script-eval`** guidance entry. The agent
    reads `eval_comparison` (and looks for the logged `EVAL FAILED` / `eval-comparison` setup error); if

diff --git a/docs/backend/journeys.md b/docs/backend/journeys.md
@@ -60,7 +60,7 @@ If `step_count` is 0, the journey is unstepped.
 
 1. `container-task-webhook.ts` checks if the journey is unstepped (latest version has zero actions)
 2. If unstepped: builds prompt via `buildUnsteppedQAPrompt()` — agent uses Playwright MCP to execute the journey description. On success it saves the recording and exits; the journey stays unstepped. On failure it files bugs.
-3. If stepped: builds prompt via `buildQAPrompt()` — the journey setup harness (`scripts/journey/run-recorded.ts`) runs the journey runner, which executes the actions programmatically, and records the result into the task info
+3. If stepped: the browser-driver setup harness (`scripts/run-recorded-browse.ts --mode journey`) runs the journey runner, which executes the actions programmatically and records the result into the task info. In normal mode the container agent (`buildQAPrompt()`) then triages the result and files bugs. In **FAST MODE** the runner is setup-only (no agent) and does NOT file bugs — it reports `bug_seen` + a `bug_data` blob into `setup_output`, and a separate `file-journey-bug` task files the bug from the recording. See [`warm-recordings.md`](./warm-recordings.md).
 
 ## Journey origin
 

diff --git a/docs/backend/mcp-error-reporting.md b/docs/backend/mcp-error-reporting.md
@@ -47,7 +47,7 @@ client owns reporting for setup-script calls; the proxy owns reporting for agent
 ## Setup-script path (the common mechanism)
 
 Every polish setup script lives as a guidance entry (`polish-script-<type>`) and drives Replay MCP
-**only** through `scripts/polish/replay-client.ts`. `run-recorded.ts` (the in-container runner)
+**only** through `scripts/polish/replay-client.ts`. `run-recorded-mcp.ts` (the in-container runner)
 bundles the client with the script and execs it with the task context in env:
 
 - `POLISH_ADMIN_TOKEN` — the task-scoped credential, which **is** the task id (`LOOPQA_TASK_ID`).
@@ -90,7 +90,7 @@ polish_pass-sourced guidance update; it no longer auto-creates the update) and
 ## Files
 
 - `scripts/polish/replay-client.ts` — shared setup-script client; self-reports `callTool` failures.
-- `scripts/polish/run-recorded.ts` — in-container runner; injects `POLISH_ADMIN_TOKEN` / `POLISH_SITE_URL`.
+- `scripts/run-recorded-mcp.ts` — in-container runner; injects `POLISH_ADMIN_TOKEN` / `POLISH_SITE_URL`.
 - `netlify/functions/mcp-error-webhook.ts` — receives `mcp.error`, writes `task_mcp_errors`.
 - `netlify/functions/lib/task-mcp-errors.ts` — `task_mcp_errors` table interface + `kind` vocabulary.
 - `netlify/functions/container-task-webhook.ts` — hands each task its `mcpErrorWebhook` URL (agent path).

diff --git a/docs/backend/run-recorded-browse.md b/docs/backend/run-recorded-browse.md
@@ -1,5 +1,9 @@
 # `run-recorded-browse.ts` — Fast-Mode Exploration harness architecture
 
+> Sibling harness: [`run-recorded-mcp.ts`](./warm-recordings.md) drives an existing *recording* via the
+> Replay MCP (no browser) for polish + `file-journey-bug` tasks. Both share `scripts/lib/setup-harness.ts`.
+
+
 This documents how the **exploration ("browse") setup harness** works end to end, what each moving
 part does, and — importantly — the **current open problem** with the driver-script recording, with the
 evidence I actually have (so the parts I'm still unsure about are called out explicitly rather than

diff --git a/docs/backend/tasks-and-containers.md b/docs/backend/tasks-and-containers.md
@@ -251,7 +251,7 @@ replenishes the pool.
 
 ## Polish setup script constraints
 
-Each polish pass runs a **setup script** before the agent: `scripts/polish/run-recorded.ts` fetches
+Each polish pass runs a **setup script** before the agent: `scripts/run-recorded-mcp.ts` fetches
 the per-pass-type TypeScript, bundles it (esbuild), and runs it under `replay-node` to collect
 diagnostic JSON from the recording via Replay MCP. Authoring one of these scripts has hard
 constraints that are easy to violate:
@@ -299,15 +299,15 @@ constraints that are easy to violate:
 
 - **Requirement: each setup script gets a 20-minute run window, and budgets must be ordered
   `TIME_BUDGET_MS < hard-exit backstop < runner deadline`.** The runner hard-kills the bundled script
-  via `execSync` after `SCRIPT_EXEC_TIMEOUT_MS` = **20 min** (`run-recorded.ts`; other commands it
+  via `execSync` after `SCRIPT_EXEC_TIMEOUT_MS` = **20 min** (`run-recorded-mcp.ts`; other commands it
   runs keep the 5-min default). A script's internal `TIME_BUDGET_MS`/`HARD_EXIT_MS` (and, in
   `ui-details`, `BACKSTOP_MS`/`WATCHDOG_MS`) must sit *just under* 20 min — the convention is
   `TIME_BUDGET_MS = 19 min < HARD_EXIT_MS = 19.5 min < 20-min kill` — so the script force-emits
   parseable partial output before the SIGKILL. A process still alive at the runner deadline dies as
   `spawnSync /bin/sh ETIMEDOUT` with **no output at all**. The 15-min `RecordingOverview` ceiling
   lives inside this window. If you change the window, change both ends together and keep the ordering.
 
-- **stdout ≤ 1 MiB.** `run-recorded.ts` captures the script's stdout with Node's default `execSync`
+- **stdout ≤ 1 MiB.** `run-recorded-mcp.ts` captures the script's stdout with Node's default `execSync`
   `maxBuffer` (1 MiB); overflowing it kills the process mid-print with `ENOBUFS` and no parseable
   output. Scripts must bound their JSON (cap candidates, collapse repeated findings).
 
@@ -335,16 +335,47 @@ can also be set explicitly on the create request (UI, `POST /api/projects`, or `
   report its own result), so no agent runs afterward.
 
 - **Polish → the setup script files its own bugs, no agent.** The regular polish-pass branch gets a
-  no-op prompt, and the setup command sets `FAST_MODE=1`. `scripts/polish/run-recorded.ts` forwards
+  no-op prompt, and the setup command sets `FAST_MODE=1`. `scripts/run-recorded-mcp.ts` forwards
   that (plus `POLISH_LLM_URL`, the run-scoped `call-llm` endpoint) into the setup script's exec env.
   The `polish-script-<passType>` entry, on seeing `FAST_MODE`, calls the method exported by
   **`polish-file-bug-script`**, which reviews its just-computed ("pass 2") results with an LLM via
   **`polish-review-results-script`** and files each resulting bug through the `fileBug` helper from
   **`file-bug-script`** — the same submit + poll bug path the polish agent prompt uses. `script_only`
   (admin eval) and `ui-details` passes are unaffected.
 
+- **Journey → the runner reports a bug SIGNAL, a second task files it.** In fast mode a stepped
+  journey's runner (`journey-run-script`) no longer files bugs. When it sees one it reports
+  `bug_seen: true` plus a `bug_data` blob (a description of what it encountered + browser history) in
+  its result, saved to the run's `setup_output` via `save-setup-result`. `saveTestRunSetupResult` then
+  schedules ONE **file-journey-bug** task (`lib/file-journey-bug.ts`, discriminated by the
+  `File journey bug:` goal prefix, carrying the run's `test_run_id` + the browser `recording_id`),
+  linked from the run's task page (`test_run.file_journey_bug_task_id`). That task runs
+  `scripts/run-recorded-mcp.ts --mode file-journey-bug` — the recording-driver harness with NO browser
+  — which materializes the **`file-journey-bug-script`** entry, hands it the bug blob (fetched via the
+  `get-journey-bug-data` task-webhook action) + the recording, and files the bug against the run
+  (`fileBug` → `/api/test-runs/<run>`). The journey run's finalization (and, for a **journey eval**,
+  its result callback) is DEFERRED in `container-events` until the file-journey-bug task completes, so
+  the bug exists before the run is finalized / the eval verdict is reported.
+
+## Warm recordings
+
+> Full write-up: [`warm-recordings.md`](./warm-recordings.md) — warming, `run-recorded-mcp`, the
+> `file-journey-bug` task, and journey-eval second-task ordering. Summary below.
+
+A task that OPERATES ON an existing recording (a polish pass task, or a file-journey-bug task) carries
+that recording in `tasks.recording_id`. When such a task is scheduled we fire
+`warm-recording-background` (`lib/warm-recording.ts`), which drives `RecordingOverview` (a cold
+recording's first overview can take minutes while Replay indexes it) and waits up to **20 min**. On
+success it stamps `tasks.recording_warmed_at` on every queued task for the recording; on timeout/error
+it fails those tasks (marked `no_retry`, so the retry helpers leave them alone) and records a Replay
+MCP error against them so the failure shows in admin activity. `claimNextTask` orders by warm
+readiness FIRST — warmed-recording tasks, then no-recording tasks, then not-yet-warmed-recording tasks
+— so a container never waits on a cold recording while a warmed one is queued.
+
 Like every other setup script, the four new entries (`exploration-run-script`,
 `polish-file-bug-script`, `polish-review-results-script`, `file-bug-script`) live ONLY as guidance
 entries — the repo holds stubs (`scripts/seed-guidance.ts`) with the real TypeScript authored in the
 prod DB via the guidance API. Making fast-mode polish actually file bugs also requires editing the
 `polish-script-<passType>` entries to check `FAST_MODE` and delegate to `polish-file-bug-script`.
+Fast-mode journeys additionally need `journey-run-script` edited to report `bug_seen`/`bug_data`
+instead of filing, and the new **`file-journey-bug-script`** entry authored to file from the recording.
diff --git a/docs/backend/test-runs.md b/docs/backend/test-runs.md
@@ -25,7 +25,7 @@
 ### Reads
 
 - `getTestRun(id)` — fetch a single test run with project name.
-- `getTestRunWithBugs(id)` — fetch a test run with project name and associated bugs.
+- `getTestRunWithBugs(id)` — fetch a test run with project name and associated bugs. Also returns `file_journey_bug_task_id` — the second-stage task that files a FAST MODE journey's bug (if any), so the run's task page can link to it (see [`warm-recordings.md`](./warm-recordings.md)).
 - `listTestRuns(filters?)` — paginated list of test runs, optionally filtered by project.
 - `listRecentRunsForProject(opts)` — lightweight timing rows for a project's recent runs (newest first, default 20, max 50), optionally excluding one run and flagging each row with `overlaps_this_run` relative to a given `[overlapsAfter, overlapsBefore]` window (no `overlapsBefore` = window still open). Lets a running task — and the bug-submission judge — see what other journeys executed concurrently, since parallel journeys can mutate shared app state.
 
@@ -37,6 +37,7 @@
 - `infraFailTestRun(id, data?)` — mark an in-progress run as `infra-failed` (transient infra error, retryable).
 - `incompleteTestRun(id, data?)` — mark an in-progress run as `incomplete` (journey couldn't be completed, no error).
 - `updateTestRunProgress(id, data)` — update bugs count or recording ID while a run is in progress.
+- `saveTestRunSetupResult(id, data)` — store a setup harness's `setup_output` + recordings on the run (the `save-setup-result` action). When the output carries a FAST MODE journey's `bug_seen: true` signal, it also schedules a `file-journey-bug` task to file the bug from the recording (see [`warm-recordings.md`](./warm-recordings.md) §5).
 
 ## Callers