chore(ci): stabilize e2e — pre-cache experiments, drop bridge spawn, fix XPath#32
chore(ci): stabilize e2e — pre-cache experiments, drop bridge spawn, fix XPath#32joalves wants to merge 1 commit into
Conversation
…fix XPath, raise AI timeout
This addresses four root causes for the e2e shards 1/4 and 3/4 failing
since the FT-1905 PR opened:
1. .experiment-item never visible (4 specs): every sidebar mount fired
a live /v1/experiments call, and under shard concurrency (4 shards
× 4 workers = 16 parallel sidebars) the ABsmartly API throttled.
Extend the existing global-setup pre-fetch to also fetch 25
experiments and route them into chrome.storage.sync under
`experiments-cache` (the key useExperimentLoading.loadCachedExperiments
reads on mount). Cache shape matches ExperimentsCacheSchema.
2. Invalid XPath in ai-dom-changes-persistence.spec.ts (1 spec): the
"Verify preview mode is enabled" step used `'.. .. :has-text("Preview")'`
which isn't valid XPath, and asserted preview was on even though the
test never enables it. The step has been removed — its only effect
was a SyntaxError before reaching the assertion.
3. Dead claude-code-bridge spawn in ai-page-persistence.spec.ts: the
binary doesn't exist on CI runners and the spawn ENOENTed at every
run. The whole bridge-discovery block (BRIDGE_PORTS, isBridgeRunning,
findAvailablePort, startBridge, stopBridge, beforeAll, afterAll) is
removed. The fixture's default seed already routes to anthropic-api
via llmproxy whenever PLASMO_PUBLIC_ANTHROPIC_API_KEY is present,
which it now is.
4. ai-extension-integration timeout was 180s. Under shard contention
that left almost no headroom for the model call itself; bumped to
240s, and the inner waitFor on the assistant message bumped from
60s to 90s to match.
After this CI run goes green, restore `e2e-tests (1/4)` and
`e2e-tests (3/4)` to the required-status-checks list on main:
gh api -X POST "repos/absmartly/browser-extension/branches/main/protection/required_status_checks/contexts" \\
-f "contexts[]=e2e-tests (1/4)" \\
-f "contexts[]=e2e-tests (3/4)"
WalkthroughThe pull request extends the global test setup to prefetch an Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/global-setup.ts (1)
198-230: 🛠️ Refactor suggestion | 🟠 MajorType the
/v1/*response contracts with generics instead of usingunknown[].
fetchOnecurrently returnsPromise<unknown[]>, allowing malformed API shapes to pass silently through to cache generation and fixture seeding. Introduce a typedfetchOne<T>generic contract using Zod schema types (e.g.,z.infer<typeof CachedExperimentSchema>) to keep this path strict at compile time.The codebase already defines proper validation schemas in
src/lib/validation-schemas.ts(e.g.,CachedExperimentSchema,ExperimentsCacheSchema). Use these schema types instead ofunknown, type assertions (as { experiments?: unknown[] }), oranycasts in:
tests/global-setup.tslines 198–230 (fetchOnefunction signature and return)tests/global-setup.tslines 255–275 (destructured variables fromPromise.all)tests/fixtures/extension.tsline 127 (editorResourcesCachedeclaration)tests/fixtures/extension.tsline 181 (type assertion)Per coding guidelines: "Enable strict TypeScript type checking and use interfaces for all API responses".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/global-setup.ts` around lines 198 - 230, Refactor the fetchOne function to use generics for type safety instead of unknown[]. Change the fetchOne function signature to accept a generic type parameter T and return Promise<T[]> instead of Promise<unknown[]>. Import the validation schemas from src/lib/validation-schemas.ts (such as CachedExperimentSchema, ExperimentsCacheSchema) and use their inferred types when calling fetchOne in the Promise.all block around lines 255-275. Update the editorResourcesCache declaration in tests/fixtures/extension.ts line 127 to use the properly typed response instead of unknown[], and remove the type assertion at line 181 by ensuring the fetchOne calls are properly typed with the corresponding schema types.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/e2e/ai-extension-integration.spec.ts`:
- Around line 7-13: The test timeout is being increased to 240s to accommodate
slow model calls, which masks the underlying performance issues. Instead, revert
the test.setTimeout call to a reasonable value that doesn't exceed standard E2E
test budgets, and add targeted debug logging and screenshot captures around the
slow path operations (specifically the model-call wait and assistant
interactions). This will provide evidence of where time is being spent and help
identify and fix the actual bottlenecks in the dependency chain rather than
working around them with inflated timeouts.
In `@tests/e2e/ai-page-persistence.spec.ts`:
- Around line 16-25: Add a fail-fast provider precondition check in the
beforeEach hook to validate that the AI provider is configured as anthropic-api
before tests run. The issue is that the fixture falls back to
claude-subscription when Anthropic environment keys are missing, but the bridge
wiring for claude-subscription has been removed, which can lead to long
downstream waits or flaky failures. In the beforeEach hook, add a check that
verifies the configured provider matches anthropic-api, and immediately fail the
test with a clear error message if the precondition is not met, ensuring setup
issues are caught early rather than causing obscure downstream failures.
In `@tests/fixtures/extension.ts`:
- Around line 214-220: The experiments cache is being stored as JSON strings in
the seedPage.evaluate call where chrome.storage.sync.set is invoked, but the
getExperimentsCache function in src/utils/storage.ts expects to validate the
retrieved value as an object against ExperimentsCacheSchema. Remove the
JSON.stringify calls on both lines (for 'experiments-cache' and
'plasmo:experiments-cache' keys) and store the payload object directly instead.
This ensures the storage format matches what the validation logic expects,
preventing cache validation failures and unnecessary fallback API calls.
- Around line 181-208: In the experiment normalization code block, replace the
`any` type annotations used for the `raw` experiment objects, variant objects
(`v`), application objects (`a`), and the filtered result (`e`) with properly
inferred Zod types. Import the existing Zod schemas `CachedExperimentSchema`,
`CachedVariantSchema`, and `CachedApplicationSchema` from
`src/lib/validation-schemas.ts`, then use `z.infer<typeof SchemaName>` to
extract the proper TypeScript types for each nested entity. This will replace
the loose `any` types in the map and filter operations with strict type-safe
alternatives that match the actual schema definitions.
---
Outside diff comments:
In `@tests/global-setup.ts`:
- Around line 198-230: Refactor the fetchOne function to use generics for type
safety instead of unknown[]. Change the fetchOne function signature to accept a
generic type parameter T and return Promise<T[]> instead of Promise<unknown[]>.
Import the validation schemas from src/lib/validation-schemas.ts (such as
CachedExperimentSchema, ExperimentsCacheSchema) and use their inferred types
when calling fetchOne in the Promise.all block around lines 255-275. Update the
editorResourcesCache declaration in tests/fixtures/extension.ts line 127 to use
the properly typed response instead of unknown[], and remove the type assertion
at line 181 by ensuring the fetchOne calls are properly typed with the
corresponding schema types.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 1ba7c473-969e-46fa-84cb-e460a5fb4153
📒 Files selected for processing (5)
tests/e2e/ai-dom-changes-persistence.spec.tstests/e2e/ai-extension-integration.spec.tstests/e2e/ai-page-persistence.spec.tstests/fixtures/extension.tstests/global-setup.ts
| // The full happy path is: setup test page (~3s) + load experiments list | ||
| // (~10s under shard concurrency) + create experiment (~3s) + save + open AI | ||
| // chat + real model call (~30-60s through llmproxy, but Claude can take up | ||
| // to 90s on a cold model + image-grounded prompt). Previously this was | ||
| // 180s, which under shard contention left almost no headroom for the | ||
| // model call itself; bump to 240s. | ||
| test.setTimeout(240000) |
There was a problem hiding this comment.
Timeout inflation masks the underlying E2E bottleneck.
Increasing the test ceiling to 240s and the assistant wait to 90s weakens failure detection and conflicts with the test-timeout policy. Please keep the timeout budget tight and fix the slow path directly (for example, add targeted debug logging/screenshots around the model-call wait and stabilise the slow dependency path).
As per coding guidelines, tests/e2e/**/*.spec.ts should not solve timeouts by extending them and should debug root causes with logging/screenshot evidence.
Also applies to: 73-76
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/e2e/ai-extension-integration.spec.ts` around lines 7 - 13, The test
timeout is being increased to 240s to accommodate slow model calls, which masks
the underlying performance issues. Instead, revert the test.setTimeout call to a
reasonable value that doesn't exceed standard E2E test budgets, and add targeted
debug logging and screenshot captures around the slow path operations
(specifically the model-call wait and assistant interactions). This will provide
evidence of where time is being spent and help identify and fix the actual
bottlenecks in the dependency chain rather than working around them with
inflated timeouts.
Sources: Coding guidelines, Learnings
| log('Configuring extension (anthropic-api via llmproxy, seeded by fixture)') | ||
|
|
||
| // The fixture's default seed (tests/fixtures/extension.ts:80-115) already | ||
| // wires up `aiProvider: 'anthropic-api'` with the llmproxy endpoint when | ||
| // PLASMO_PUBLIC_ANTHROPIC_API_KEY is present in env. No per-test seed is | ||
| // needed — the old `aiProvider: 'claude-subscription'` override was | ||
| // tied to a `spawn('claude-code-bridge')` block in beforeAll that has | ||
| // been removed; on CI runners the binary doesn't exist, the spawn | ||
| // ENOENTed, and downstream tests just observed a dead bridge. | ||
| testPage = await context.newPage() |
There was a problem hiding this comment.
Add a fail-fast provider precondition in beforeEach.
This hook now assumes fixture-seeded anthropic-api, but fixture logic falls back to claude-subscription when Anthropic env keys are missing. With bridge wiring removed, that can turn into long downstream waits/flaky failures instead of a clear setup error.
Proposed patch
test.beforeEach(async ({ context, extensionUrl }) => {
initializeTestLogging()
log('Configuring extension (anthropic-api via llmproxy, seeded by fixture)')
+ const hasAnthropicKey = Boolean(
+ process.env.ANTHROPIC_API_KEY || process.env.PLASMO_PUBLIC_ANTHROPIC_API_KEY
+ )
+ if (!hasAnthropicKey) {
+ throw new Error(
+ 'Missing Anthropic credentials for AI E2E flow. Set ANTHROPIC_API_KEY or PLASMO_PUBLIC_ANTHROPIC_API_KEY.'
+ )
+ }
// The fixture's default seed (tests/fixtures/extension.ts:80-115) already📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| log('Configuring extension (anthropic-api via llmproxy, seeded by fixture)') | |
| // The fixture's default seed (tests/fixtures/extension.ts:80-115) already | |
| // wires up `aiProvider: 'anthropic-api'` with the llmproxy endpoint when | |
| // PLASMO_PUBLIC_ANTHROPIC_API_KEY is present in env. No per-test seed is | |
| // needed — the old `aiProvider: 'claude-subscription'` override was | |
| // tied to a `spawn('claude-code-bridge')` block in beforeAll that has | |
| // been removed; on CI runners the binary doesn't exist, the spawn | |
| // ENOENTed, and downstream tests just observed a dead bridge. | |
| testPage = await context.newPage() | |
| log('Configuring extension (anthropic-api via llmproxy, seeded by fixture)') | |
| const hasAnthropicKey = Boolean( | |
| process.env.ANTHROPIC_API_KEY || process.env.PLASMO_PUBLIC_ANTHROPIC_API_KEY | |
| ) | |
| if (!hasAnthropicKey) { | |
| throw new Error( | |
| 'Missing Anthropic credentials for AI E2E flow. Set ANTHROPIC_API_KEY or PLASMO_PUBLIC_ANTHROPIC_API_KEY.' | |
| ) | |
| } | |
| // The fixture's default seed (tests/fixtures/extension.ts:80-115) already | |
| // wires up `aiProvider: 'anthropic-api'` with the llmproxy endpoint when | |
| // PLASMO_PUBLIC_ANTHROPIC_API_KEY is present in env. No per-test seed is | |
| // needed — the old `aiProvider: 'claude-subscription'` override was | |
| // tied to a `spawn('claude-code-bridge')` block in beforeAll that has | |
| // been removed; on CI runners the binary doesn't exist, the spawn | |
| // ENOENTed, and downstream tests just observed a dead bridge. | |
| testPage = await context.newPage() |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/e2e/ai-page-persistence.spec.ts` around lines 16 - 25, Add a fail-fast
provider precondition check in the beforeEach hook to validate that the AI
provider is configured as anthropic-api before tests run. The issue is that the
fixture falls back to claude-subscription when Anthropic environment keys are
missing, but the bridge wiring for claude-subscription has been removed, which
can lead to long downstream waits or flaky failures. In the beforeEach hook, add
a check that verifies the configured provider matches anthropic-api, and
immediately fail the test with a clear error message if the precondition is not
met, ensuring setup issues are caught early rather than causing obscure
downstream failures.
| const cacheBundle = editorResourcesCache as { experiments?: unknown[] } | ||
| const rawExperiments = Array.isArray(cacheBundle.experiments) | ||
| ? cacheBundle.experiments | ||
| : [] | ||
| if (rawExperiments.length > 0) { | ||
| const minimal = rawExperiments | ||
| .map((raw: any) => ({ | ||
| id: raw?.id, | ||
| name: raw?.name, | ||
| display_name: raw?.display_name, | ||
| state: raw?.state, | ||
| status: raw?.status, | ||
| percentage_of_traffic: raw?.percentage_of_traffic, | ||
| traffic_split: raw?.traffic_split, | ||
| variants: raw?.variants?.map((v: any) => ({ | ||
| variant: v?.variant, | ||
| name: v?.name, | ||
| is_control: v?.is_control | ||
| })), | ||
| applications: raw?.applications?.map((a: any) => ({ | ||
| application_id: a?.application_id, | ||
| id: a?.id, | ||
| name: a?.name | ||
| })) | ||
| })) | ||
| // Drop anything that doesn't have the required (id, name) pair — | ||
| // the schema rejects the whole array if any entry is invalid. | ||
| .filter((e: any) => typeof e.id === 'number' && typeof e.name === 'string') |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify `any` usage in the modified experiment cache normalisation block.
# Expected after refactor: no `any` in this file.
rg -nP '\bany\b' tests/fixtures/extension.tsRepository: absmartly/browser-extension
Length of output: 1304
🏁 Script executed:
# Get context around lines 181-208 to understand what this code does
head -n 210 tests/fixtures/extension.ts | tail -n +175Repository: absmartly/browser-extension
Length of output: 1690
🏁 Script executed:
# Search for existing interfaces or types related to experiments in the codebase
rg -i 'interface.*experiment|type.*experiment' --type ts --type tsxRepository: absmartly/browser-extension
Length of output: 99
🏁 Script executed:
# Check if there are type definitions or interfaces defined elsewhere for API responses
fd -e ts -e tsx | xargs rg -l 'interface.*{' | head -20Repository: absmartly/browser-extension
Length of output: 159
🏁 Script executed:
# Search for existing interfaces or types related to experiments
rg -i 'interface|type' tests/fixtures/extension.ts | head -20Repository: absmartly/browser-extension
Length of output: 743
🏁 Script executed:
# Look for ExperimentsCacheSchema or CachedExperimentSchema mentioned in comments
rg -r 'ExperimentsCacheSchema|CachedExperimentSchema' --type tsRepository: absmartly/browser-extension
Length of output: 468
🏁 Script executed:
# Check src/lib/validation-schemas.ts to see if it has the experiment schemas
cat src/lib/validation-schemas.tsRepository: absmartly/browser-extension
Length of output: 13119
🏁 Script executed:
# Check if validation-schemas is imported in test files
rg 'validation-schemas' tests/ --type tsRepository: absmartly/browser-extension
Length of output: 174
🏁 Script executed:
# Look for z.infer usage patterns in the codebase
rg 'z\.infer' --type tsRepository: absmartly/browser-extension
Length of output: 435
Replace any-based experiment normalisation with Zod-inferred types.
Lines 181–208 use any for API-derived entities (raw, v, a, e), bypassing strict type safety. TypeScript interfaces for these shapes already exist as Zod schemas in src/lib/validation-schemas.ts (CachedExperimentSchema, CachedVariantSchema, CachedApplicationSchema). Extract types using z.infer<typeof CachedExperimentSchema> and similar for nested structures to maintain type safety across the normaliser.
As per coding guidelines, "Enable strict TypeScript type checking and use interfaces for all API responses".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/fixtures/extension.ts` around lines 181 - 208, In the experiment
normalization code block, replace the `any` type annotations used for the `raw`
experiment objects, variant objects (`v`), application objects (`a`), and the
filtered result (`e`) with properly inferred Zod types. Import the existing Zod
schemas `CachedExperimentSchema`, `CachedVariantSchema`, and
`CachedApplicationSchema` from `src/lib/validation-schemas.ts`, then use
`z.infer<typeof SchemaName>` to extract the proper TypeScript types for each
nested entity. This will replace the loose `any` types in the map and filter
operations with strict type-safe alternatives that match the actual schema
definitions.
Source: Coding guidelines
| await seedPage.evaluate( | ||
| (payload) => | ||
| chrome.storage.sync.set({ | ||
| 'experiments-cache': JSON.stringify(payload), | ||
| 'plasmo:experiments-cache': JSON.stringify(payload) | ||
| }), | ||
| experimentsCache |
There was a problem hiding this comment.
Store the experiments cache as an object, not a JSON string.
Line 217 and Line 218 serialise payload before chrome.storage.sync.set. The reader path (src/utils/storage.ts, getExperimentsCache) validates the retrieved value against ExperimentsCacheSchema as an object, so this format mismatch can invalidate cache reads and force live /v1/experiments fallbacks.
Suggested fix
await seedPage.evaluate(
(payload) =>
chrome.storage.sync.set({
- 'experiments-cache': JSON.stringify(payload),
- 'plasmo:experiments-cache': JSON.stringify(payload)
+ 'experiments-cache': payload,
+ 'plasmo:experiments-cache': payload
}),
experimentsCache
)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| await seedPage.evaluate( | |
| (payload) => | |
| chrome.storage.sync.set({ | |
| 'experiments-cache': JSON.stringify(payload), | |
| 'plasmo:experiments-cache': JSON.stringify(payload) | |
| }), | |
| experimentsCache | |
| await seedPage.evaluate( | |
| (payload) => | |
| chrome.storage.sync.set({ | |
| 'experiments-cache': payload, | |
| 'plasmo:experiments-cache': payload | |
| }), | |
| experimentsCache | |
| ) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/fixtures/extension.ts` around lines 214 - 220, The experiments cache is
being stored as JSON strings in the seedPage.evaluate call where
chrome.storage.sync.set is invoked, but the getExperimentsCache function in
src/utils/storage.ts expects to validate the retrieved value as an object
against ExperimentsCacheSchema. Remove the JSON.stringify calls on both lines
(for 'experiments-cache' and 'plasmo:experiments-cache' keys) and store the
payload object directly instead. This ensures the storage format matches what
the validation logic expects, preventing cache validation failures and
unnecessary fallback API calls.
Summary
Four root-cause fixes for the e2e shards 1/4 and 3/4 that have been red since FT-1905 opened:
global-setupresources pre-fetch to also fetch 25 experiments and route them intochrome.storage.sync['experiments-cache'](the keyuseExperimentLoading.loadCachedExperimentsreads on mount). Eliminates the.experiment-item never visiblerace under shard concurrency.ai-dom-changes-persistence.spec.tshad'.. .. :has-text(\"Preview\")'which isn't valid XPath; the step also asserted preview was on when the test never enables it. Step removed.claude-code-bridgespawn (1 spec) —ai-page-persistence.spec.tswasspawn('claude-code-bridge', ...)inbeforeAll; the binary doesn't exist on CI runners and the spawn ENOENTed every run. The fixture already auto-seedsanthropic-apivia llmproxy whenPLASMO_PUBLIC_ANTHROPIC_API_KEYis set, which it now is.ai-extension-integration's 180s ceiling left almost no headroom for the model call under shard contention; bumped to 240s (and the inner assistant-message wait from 60s to 90s).Test plan
bun run test:unit— 3217 pass locally.bun run typecheck— clean.Summary by CodeRabbit
Release Notes