Skip to content

ci: live MCP onboarding suite (opencode + Workers AI)#2496

Draft
WcaleNieWolny wants to merge 4 commits into
mainfrom
wolny/mcp-live-ci
Draft

ci: live MCP onboarding suite (opencode + Workers AI)#2496
WcaleNieWolny wants to merge 4 commits into
mainfrom
wolny/mcp-live-ci

Conversation

@WcaleNieWolny

@WcaleNieWolny WcaleNieWolny commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

DRAFT — blocked on PR A. This is the CI half (PR B) of the opencode + Workers AI MCP-live effort.

What this is

Adds .github/workflows/builder_onboarding_mcp_live.yml: a path-filtered, contributor-only GitHub Actions workflow that runs the live MCP onboarding suite against the capgo MCP server, driven by opencode as the agent engine and Cloudflare Workers AI as the model provider.

⛔ Depends on (blockers) — do NOT merge yet

  1. The engine PR (PR A) in Cap-go/cli-mcp-tests must merge first — it adds the opencode actor/judge, cost metering, token budget, per-path retry, and the tree report. The test:mcp:hermetic / test:mcp:live scripts this workflow runs live in that private submodule, not in capgo, so the bun run steps do not exist until PR A is in.
  2. The private/cli-mcp-tests submodule must be bumped in this repo to the merged PR A commit on the submodule's default branch — not to a branch/PR-head commit. This PR intentionally does not bump the submodule, because bumping to an unmerged branch commit would pin capgo to a ref that can be rebased/force-pushed/deleted. The bump is a follow-up once PR A lands.

This PR must not be merged until both land (PR A merged → submodule bumped to that merged SHA). Marked as DRAFT for that reason.

Required manual setup (one new secret)

  • CLOUDFLARE_API_TOKEN — a new repo secret scoped to Workers AI Read/Run, must be added manually (e.g. gh secret set CLOUDFLARE_API_TOKEN --repo Cap-go/capgo).
  • CLOUDFLARE_ACCOUNT_ID, the R2 upload secrets (BUILDER_ONBOARDING_TUI_RESULTS_R2_UPLOAD_ACCESS_KEY_ID, TUI_RESULTS_R2_UPLOAD_SECRET_ACCESS_KEY), and the submodule token already exist.

How it runs

  • Runner: ubuntu-latest.
  • Triggers: workflow_dispatch (with an optional pr_number input) + pull_request (opened/synchronize/reopened/ready_for_review), path-filtered to the workflow, .gitmodules, the private/cli-mcp-tests pointer, cli/src/build/onboarding/**, cli/src/mcp/**, and the onboarding test paths.
  • Resolve PR context first: a pr step mirrors the TUI workflow — it outputs number/sha from the pull_request event, the pr_number dispatch input, or gh pr list --head. On a dispatch with no resolvable PR it runs on the branch ref and skips the R2 publish + PR comment instead of hard-failing.
  • Contributor-only: the job if guards to workflow_dispatch or PRs whose head repo is this repo (no fork secrets leak).
  • Installs a pinned opencode CLI: bun install -g opencode-ai@1.17.4 (the opencode-ai npm package ships the opencode binary via platform optionalDependencies); $HOME/.bun/bin is added to $GITHUB_PATH and opencode --version verifies it. Replaces the previous unversioned curl … | bash installer.
  • SHA-pinned actions: actions/checkout@df4cb1c… (v6.0.3) and actions/setup-node@48b55a0… (v6.4.0). This diverges from the repo's floating-tag convention — intentional supply-chain hardening for this workflow.
  • Hard-fails on both the hermetic MCP suites and the live opencode + Workers AI tree (no continue-on-error on those steps).
  • Per-path retry + token budget: CAPGO_E2E_MAX_ATTEMPTS=2, CAPGO_E2E_TOKEN_BUDGET=8000000.
  • Cost line is appended to $GITHUB_STEP_SUMMARY from results/cost.json inside a fenced code block with backticks stripped, so the value can't inject markdown/HTML.
  • R2 report upload reuses upload-r2-results.mjs but publishes under a distinct prefix (R2_PREFIX=builder-onboarding-mcp) with its own REPORT_TITLE, so it no longer collides with / overwrites the TUI report in the shared bucket. The script gained backwards-compatible R2_PREFIX/REPORT_TITLE env overrides whose defaults preserve the TUI workflow's behavior exactly. Upload is continue-on-error so a publish hiccup never masks a test failure.
  • Sticky PR comment: posts/refreshes a marker-keyed comment with the R2 report URL + run link (justifies pull-requests: write), only when a PR number is resolved.

Validation

  • YAML validated locally with python3 -c "import yaml; yaml.safe_load(...)" → parses clean.
  • actionlint (+ shellcheck) run locally on both this workflow and the TUI workflow → 0 issues.
  • node --check on upload-r2-results.mjs → clean; verified defaults reproduce the TUI prefix/title and the MCP overrides produce the builder-onboarding-mcp prefix.

Test plan

  • PR A merges in Cap-go/cli-mcp-tests.
  • Bump private/cli-mcp-tests submodule to the merged PR A commit.
  • Add the CLOUDFLARE_API_TOKEN repo secret (Workers AI Read/Run).
  • gh workflow run "Builder onboarding MCP live (opencode + Workers AI)" --repo Cap-go/capgo → expect green; job summary shows the fenced cost line; sticky PR comment + run link the R2 tree-report.html under the builder-onboarding-mcp/ prefix.
  • Confirm a concurrent TUI-preview run still publishes under builder-onboarding-tui/ (no collision).

@coderabbitai

coderabbitai Bot commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 8b716c4a-da25-4770-b451-f12e2f51a205

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review

Comment @coderabbitai help to get the list of available commands and usage tips.

@codspeed-hq

codspeed-hq Bot commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

Merging this PR will not alter performance

✅ 43 untouched benchmarks
⏩ 2 skipped benchmarks1


Comparing wolny/mcp-live-ci (599d80d) with main (a412c78)

Open in CodSpeed

Footnotes

  1. 2 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

The TUI uploader hardcoded the builder-onboarding-tui/ prefix and a
TUI-labeled step-summary heading, so a second workflow reusing it would
collide with / overwrite the TUI report. Add backwards-compatible
R2_PREFIX and REPORT_TITLE env overrides (defaults preserve the existing
TUI behavior exactly) plus a trimSlashes helper that normalizes the
prefix.
- Resolve PR context in a first step (mirrors the TUI workflow) so
  workflow_dispatch can publish: outputs number/sha from the
  pull_request event, the pr_number input, or gh pr list --head. When
  no PR resolves on dispatch, run on the branch ref and skip the
  R2/comment steps instead of hard-failing (PR_NUMBER was previously
  empty on dispatch, failing the R2 uploader).
- Replace the unpinned 'curl | bash' opencode installer with a pinned
  registry install: bun install -g opencode-ai@1.17.4 (ships the
  opencode binary via platform optionalDependencies); $HOME/.bun/bin
  is added to $GITHUB_PATH.
- SHA-pin actions/checkout@v6 and actions/setup-node@v6 (diverges from
  the repo's floating-tag convention, intentional per review).
- Publish the MCP report under a distinct R2 prefix
  (R2_PREFIX=builder-onboarding-mcp) with its own REPORT_TITLE so it no
  longer collides with the TUI report.
- Add a sticky PR comment with the report URL (justifies
  pull-requests: write), keyed by a unique marker, only when a PR is
  resolved.
- Emit the cost line inside a fenced code block with backticks stripped
  so it can't inject markdown/HTML into the step summary.
- Drop the dead cli/test/e2e-mcp/** path filter (that dir lives in the
  submodule, not capgo).
@sonarqubecloud

Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant