Skip to content

Implement stateless core VMCP constructor#5457

Open
tgrunnagle wants to merge 6 commits into
mainfrom
vmcp-core-p1-4_issue_5437
Open

Implement stateless core VMCP constructor#5457
tgrunnagle wants to merge 6 commits into
mainfrom
vmcp-core-p1-4_issue_5437

Conversation

@tgrunnagle
Copy link
Copy Markdown
Contributor

@tgrunnagle tgrunnagle commented Jun 4, 2026

Summary

RFC Phase 1 of the vMCP refactor (epic #5419, story #5430) extracts a stateless, identity-explicit domain core out of the server.New god-object. This turns the VMCP contract from #5434 into a working core that aggregates backend capabilities, routes calls, and drives composite-tool workflows — without mcp-go types crossing the boundary and without changing server.New's signature or observable behavior.

  • Why: The domain wiring (telemetry decoration, workflow engine, composer factory, health filtering, workflow validation) is currently buried in server.New's body, coupling the domain logic to the transport god-object and to context-injected capabilities. Phase 1 relocates that wiring behind a clean, identity-parameterized contract so later phases can re-home the transport (Serve, P2.1 Serve skeleton + ServerConfig #5439) and wire the Cedar admission seam (P1.5 Core admission seam (bounded rewrite) #5438) without untangling it again.
  • What:
    • New pkg/vmcp/core implements *coreVMCP and New(cfg *Config) (VMCP, error) with all 10 interface methods, relocating (not rewriting) the domain wiring from server.New: telemetry backend-client decoration, the optional workflow auditor, the in-memory state store + workflow engine, the per-call composer factory, and fail-fast workflow validation.
    • The core is stateless with respect to sessions: List/Lookup/Call health-filter the registry and aggregate on demand, then route through a per-call router.NewSessionRouter bound to the freshly aggregated routing table. This removes composite tools' dependency on context-injected capabilities (vmcp anti-pattern fix(typo): corrects readme #1). The core filters; Serve caches (a later decorator).
    • An injected health.StatusProvider drives filterHealthyBackends before aggregation, preserving today's exact include/exclude rules (healthy/degraded/empty included; unhealthy/unknown/unauthenticated excluded). A nil provider means no filtering (all backends included), matching today's no-monitor behavior.
    • Composite tools are advertised in ListTools (the core is the only layer that can add them, since decorators may only subtract reachability) and executed in CallTool through the relocated per-session composer pattern.
    • Close() stops the state-store cleanup goroutine and is idempotent (sync.Once).
    • The backend-client telemetry decorator moved from pkg/vmcp/server/telemetry.go to a new shared leaf package pkg/vmcp/internal/backendtelemetry (exported as MonitorBackends) so both server and core can use it without an import cycle.
    • Identity is an explicit *auth.Identity on every method, never read from context and never logged. The Cedar admission seam (P1.5 Core admission seam (bounded rewrite) #5438) is intentionally not wired here.

Closes #5437

Type of change

  • Refactoring (no behavior change)

Test plan

  • Unit tests (task test)
  • Linting (task lint-fix)

Unit tests for pkg/vmcp/core cover: constructor validation (loud failure on nil required collaborators), on-demand aggregation, health-filtering parity with discovery/middleware.go, lookups (including renamed/prefixed capability names and unknown-name errors), copy-before-mutate of args/meta, composite-tool execution across all executeComposite branches, idempotent Close, and verification that identity is never logged. Existing pkg/vmcp/server and the relocated pkg/vmcp/internal/backendtelemetry tests pass unchanged. golangci-lint reports 0 issues for the changed packages.

Changes

File Change
pkg/vmcp/core/core_vmcp.go New: *coreVMCP, New constructor, list/lookup paths, health filtering, workflow validation, idempotent Close.
pkg/vmcp/core/core_calls.go New: CallTool/ReadResource/GetPrompt call paths and composite-tool execution.
pkg/vmcp/core/core.go Wire New to the previously contract-only Config.
pkg/vmcp/elicitation.go New: domain-typed ElicitationRequester seam (no mcp-go types).
pkg/vmcp/internal/backendtelemetry/ Relocated backend-client telemetry decorator from server/telemetry.go; exported as MonitorBackends.
pkg/vmcp/server/server.go Call the relocated backendtelemetry.MonitorBackends; signature and behavior unchanged.
pkg/vmcp/composer/, pkg/vmcp/server/sdk_elicitation_adapter*.go Consume the domain-typed ElicitationRequester; SDK translation stays confined to the adapter.

Does this introduce a user-facing change?

No. The core is additive and not yet wired into the request path; server.New's signature and observable behavior are unchanged.

Special notes for reviewers

  • Approved scope exception (exceeds the 400-LOC criterion). This PR intentionally implements both pre-planned sub-tasks (TASK-104a list/lookup/constructor + TASK-104b call/read/get) in one change, totaling roughly 580 net-new production LOC across 4 production files — over the 400-LOC acceptance criterion but within the 10-file limit. This was an explicit, up-front decision; the alternative was a 104a PR plus a 104b follow-up.
  • Intentional temporary duplication (C2). filterHealthyBackends is a deliberate copy of discovery/middleware.go's implementation. The discovery middleware keeps its own copy on the legacy server.New path until that path is removed in Phase 3 (P2.4 Replace discovery-into-context with direct VMCP calls #5442/P3.2 Reduce server.New body to the wrapper #5445). The include/exclude rules are kept identical across both copies — this is not accidental duplication.
  • Pre-existing, unrelated findings observed during verification (not introduced by this PR):
    • The only repo-wide task lint-fix finding is a pre-existing gosec G115 in cmd/thv/app/upgrade.go (the .golangci.yml gosec exclude list omits G115).
    • The only task test failures are pkg/mcp/server/TestBuildServerConfig, which require a container runtime (Docker/Podman) not present locally.

Large PR Justification

This PR is large for three reasons, none of which are net-new production logic:

  • Atomic full-scope implementation (multiple related changes that would break if separated).
    Issue P1.4 New(cfg) -> VMCP core constructor #5437 pre-planned a split into TASK-104a (constructor + wiring + list/lookup +
    Close) and TASK-104b (the call/read/get paths + composite execution). Per an explicit
    up-front decision, both land here as one change. Splitting them would leave *coreVMCP
    not satisfying the VMCP interface between PRs (the interface requires all ten methods),
    so the halves are not independently mergeable as working code. Actual net-new production
    logic is ~580 LOC across 4 files (core_vmcp.go, core_calls.go, core.go, server.go).

  • A file move inflates the diff (refactor that must be atomic).
    The backend-client telemetry decorator was relocated from pkg/vmcp/server/telemetry.go
    to a shared pkg/vmcp/internal/backendtelemetry/ package so both server and core can
    use it without an import cycle. Git counts this as ~500 added + deleted lines, but it is a
    near-verbatim move (≈94% similarity) with net-zero behavior change — server.New's
    signature and observable behavior are unchanged.

  • Comprehensive unit tests (test-only, low review risk).
    The remaining bulk is table-driven tests for the new core package (constructor validation,
    on-demand aggregation, health-filtering parity, lookups, copy-before-mutate, composite
    execution, advertise-vs-execute conflict parity, idempotent Close, identity-not-logged)
    plus the relocated telemetry tests.

Splitting further would either ship non-compiling intermediate states or separate tests from
the code they cover, both of which reduce reviewability rather than improve it.

@github-actions github-actions Bot added the size/XL Extra large PR: 1000+ lines changed label Jun 4, 2026
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Large PR Detected

This PR exceeds 1000 lines of changes and requires justification before it can be reviewed.

How to unblock this PR:

Add a section to your PR description with the following format:

## Large PR Justification

[Explain why this PR must be large, such as:]
- Generated code that cannot be split
- Large refactoring that must be atomic
- Multiple related changes that would break if separated
- Migration or data transformation

Alternative:

Consider splitting this PR into smaller, focused changes (< 1000 lines each) for easier review and reduced risk.

See our Contributing Guidelines for more details.


This review will be automatically dismissed once you add the justification section.

@tgrunnagle tgrunnagle changed the base branch from main to vmcp-core-p1-3_issue_5436 June 4, 2026 17:33
@github-actions github-actions Bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Jun 4, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 4, 2026

Codecov Report

❌ Patch coverage is 83.33333% with 42 lines in your changes missing coverage. Please review.
✅ Project coverage is 69.22%. Comparing base (4a9af53) to head (a7e1b7f).
⚠️ Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
pkg/vmcp/core/core_vmcp.go 82.53% 24 Missing and 9 partials ⚠️
pkg/vmcp/core/core_calls.go 85.00% 6 Missing and 3 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5457      +/-   ##
==========================================
+ Coverage   69.17%   69.22%   +0.05%     
==========================================
  Files         634      636       +2     
  Lines       64473    64724     +251     
==========================================
+ Hits        44596    44803     +207     
- Misses      16577    16614      +37     
- Partials     3300     3307       +7     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor Author

@tgrunnagle tgrunnagle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔍 Multi-agent review — Phase 1 core VMCP constructor

Verdict: COMMENT. Faithful, well-executed "relocate, don't rewrite" refactor; safe to merge as additive code (core not yet wired into the request path). One HIGH-severity latent parity defect and two MEDIUM/LOW test-coverage gaps warrant attention before the core is wired in (Phase 3). Reviewed by 5 specialist agents (general quality, architecture/parity, concurrency, security, tests). Codex cross-review skipped (CLI not installed).

Consensus findings

# Finding Location Severity Score
F1 CallTool composite branch lacks the conflict gate ListTools applies → executes a withheld composite on name conflict (legacy routes to backend) core_calls.go:47 HIGH (latent) 9
F2 C2 filterHealthyBackends copy drops the legacy "source" debug-log field core_vmcp.go:399 LOW 8
F3 Workflow-level telemetry instruments not carried into composerFactory — confirm Serve-layer deferral core_vmcp.go:142 LOW 7
F4 stopStateStore silent interface{ Stop() } assertion; a 2nd store impl lacks Stop() (latent leak) core_vmcp.go:337 LOW 7
F5 TestIdentityNotLogged covers 2 of ~8 paths; auth.Identity has no slog.LogValuer core_vmcp_test.go:393 LOW 7
F6 Missing issue-mandated capability-set parity test vs ProcessPreQueriedCapabilities core_vmcp_test.go:158 MEDIUM 7
F7 executeComposite json.Marshal-failure branch (6th) untested core_calls_test.go:260 LOW 6
F8 GetPrompt copy-before-mutate unverified (asymmetric with CallTool) core_calls_test.go:178 LOW 6
F9 No goroutine-leak guard for the constructor error-path core_vmcp_test.go:116 LOW 6

Holistic notes

  • Strengths: state-store constants, telemetry-before-engine ordering, executeComposite error semantics, and filterHealthyBackends include/exclude rules all match legacy; comment line-citations verified accurate; clean dependency direction (no cycle); no mcp-go type crosses the boundary; identity never logged. Close() actually fixes a pre-existing goroutine leak in the relocated server.New code.
  • F1 is the one to weigh: it contradicts the "no behavior change" label and the issue's parity criteria. Not reachable in production from this PR (additive), so non-blocking — but fix it or add an explicit tracking note before Phase 3 wires the core in.
  • Docs: no task docs needed. Minor: the PR "Changes" table lists elicitation.go/composer//sdk_elicitation_adapter*.go, which are not in this diff (they're in the stacked base #5436).

Inline comments below map 1:1 to the table. REQUEST_CHANGES was intentionally not auto-set — set it yourself if you'd prefer to block on F1.

Comment thread pkg/vmcp/core/core_calls.go Outdated
Comment thread pkg/vmcp/core/core_vmcp.go
Comment thread pkg/vmcp/core/core_vmcp.go
Comment thread pkg/vmcp/core/core_vmcp.go Outdated
Comment thread pkg/vmcp/core/core_vmcp_test.go
Comment thread pkg/vmcp/core/core_vmcp_test.go
Comment thread pkg/vmcp/core/core_calls_test.go
Comment thread pkg/vmcp/core/core_calls_test.go
Comment thread pkg/vmcp/core/core_vmcp_test.go
tgrunnagle added a commit that referenced this pull request Jun 4, 2026
CallTool's composite branch only applied the accessibility filter, so on a
composite/backend tool-name collision it executed the withheld composite while
ListTools advertised the backend tool (advertise-vs-execute divergence). Extract
a shared accessibleComposites helper used by both advertisedTools and CallTool:
on a conflict all composites are dropped and the name falls through to backend
routing, so advertised == executed, matching the legacy decorator.

Addresses #5457 review comments:
- HIGH core_calls.go:47 (3358054909): apply the conflict gate to composite
  execution so a name collision routes to the backend, not the composite
tgrunnagle added a commit that referenced this pull request Jun 4, 2026
Addresses #5457 review comments:
- LOW core_vmcp.go:399 (3358054913): restore the "source" (health_monitor|
  registry) field on the backend-exclusion debug log to match the legacy filter
- LOW core_vmcp.go:142 (3358054917): note that workflow-level telemetry wrapping
  is intentionally deferred to the session/Serve layer (#5439), not a regression
- LOW core_vmcp.go:337 (3358054921): capture the state-store stopper at
  construction and warn loudly if absent, replacing the silent Close-time
  capability assertion (go-style: no silent no-ops)
tgrunnagle added a commit that referenced this pull request Jun 4, 2026
Addresses #5457 review comments:
- LOW core_vmcp_test.go:393 (3358054924): drive all identity-taking methods in
  TestIdentityNotLogged and assert the log buffer is non-empty so the check is
  not vacuous
- LOW core_calls_test.go:260 (3358054931): cover executeComposite's
  json.Marshal-failure branch (unserializable Output)
- LOW core_calls_test.go:178 (3358054936): add TestGetPrompt_CopyBeforeMutate
  asserting GetPrompt does not mutate the caller's args
@github-actions github-actions Bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Jun 4, 2026
Base automatically changed from vmcp-core-p1-3_issue_5436 to main June 5, 2026 15:28
tgrunnagle and others added 5 commits June 5, 2026 09:46
Relocate the domain wiring buried in server.New (telemetry decoration,
workflow auditor, in-memory state store + workflow engine, per-session
composer factory, fail-fast workflow validation) into a working
New(cfg) (VMCP, error) in pkg/vmcp/core, turning the #5434 contract into
an identity-parameterized core that aggregates capabilities, routes
calls, and drives composite workflows.

Implements #5437:
- All ten VMCP methods on *coreVMCP. The core is stateless w.r.t.
  sessions: List/Lookup/Call health-filter the registry and aggregate on
  demand, then route through a per-call SessionRouter bound to the fresh
  routing table (removing context-injected capability coupling). "The
  core filters, Serve caches."
- Injected health.StatusProvider drives filterHealthyBackends before
  aggregation, preserving today's exact include/exclude rules (copied
  from discovery/middleware.go; temporary C2 duplication until Phase 3).
- Composite tools are advertised in ListTools (decorators may only
  subtract) and executed in CallTool via the relocated per-session
  composer pattern.
- Close() stops the state store cleanup goroutine and is idempotent.
- Relocate the backend-client telemetry decorator to
  pkg/vmcp/internal/backendtelemetry so server and core share it without
  an import cycle; server.New behavior is unchanged.

Identity is an explicit *auth.Identity on every method, never read from
context and never logged. The Cedar admission seam (#5438) is not wired
here.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Fixed issues from code review:
- HIGH: ReadResource/GetPrompt now pass the advertised name straight to
  the backend client (which owns the single name translation), matching
  CallTool and removing the fragile reliance on idempotent re-translation
- MEDIUM: add TestCallTool_ResolvesRenamedTool (dot-convention/prefixed
  name resolution) and table-driven TestExecuteComposite covering the
  error/timeout/nil-result/result.Error branches
- MEDIUM: clarify that core aggregation is intentionally uncached (Serve
  caches) and that validateWorkflowDefs returns nil-when-empty by design

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
CallTool's composite branch only applied the accessibility filter, so on a
composite/backend tool-name collision it executed the withheld composite while
ListTools advertised the backend tool (advertise-vs-execute divergence). Extract
a shared accessibleComposites helper used by both advertisedTools and CallTool:
on a conflict all composites are dropped and the name falls through to backend
routing, so advertised == executed, matching the legacy decorator.

Addresses #5457 review comments:
- HIGH core_calls.go:47 (3358054909): apply the conflict gate to composite
  execution so a name collision routes to the backend, not the composite
Addresses #5457 review comments:
- LOW core_vmcp.go:399 (3358054913): restore the "source" (health_monitor|
  registry) field on the backend-exclusion debug log to match the legacy filter
- LOW core_vmcp.go:142 (3358054917): note that workflow-level telemetry wrapping
  is intentionally deferred to the session/Serve layer (#5439), not a regression
- LOW core_vmcp.go:337 (3358054921): capture the state-store stopper at
  construction and warn loudly if absent, replacing the silent Close-time
  capability assertion (go-style: no silent no-ops)
Addresses #5457 review comments:
- LOW core_vmcp_test.go:393 (3358054924): drive all identity-taking methods in
  TestIdentityNotLogged and assert the log buffer is non-empty so the check is
  not vacuous
- LOW core_calls_test.go:260 (3358054931): cover executeComposite's
  json.Marshal-failure branch (unserializable Output)
- LOW core_calls_test.go:178 (3358054936): add TestGetPrompt_CopyBeforeMutate
  asserting GetPrompt does not mutate the caller's args
@tgrunnagle tgrunnagle force-pushed the vmcp-core-p1-4_issue_5437 branch from 3d6bcd2 to 7dc752c Compare June 5, 2026 16:47
@github-actions github-actions Bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Jun 5, 2026
@github-actions github-actions Bot dismissed their stale review June 5, 2026 16:51

Large PR justification has been provided. Thank you!

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 5, 2026

✅ Large PR justification has been provided. The size review has been dismissed and this PR can now proceed with normal review.

@tgrunnagle tgrunnagle marked this pull request as ready for review June 5, 2026 16:58
@tgrunnagle tgrunnagle requested a review from ChrisJBurns as a code owner June 5, 2026 16:58
Comment thread pkg/vmcp/core/core_vmcp.go
A core built with a nil Elicitation produced a non-nil DefaultElicitationHandler
that would nil-deref the moment a workflow reached an elicitation step — a new
failure mode versus the legacy server.New path, which always wired a non-nil
adapter. Add defense in depth:
- composer.DefaultElicitationHandler.RequestElicitation returns the new
  ErrElicitationNotConfigured instead of dereferencing a nil requester, so every
  caller fails cleanly.
- core.New rejects construction when Elicitation is nil and any workflow contains
  an elicitation step (directly or as a forEach inner step), surfacing the
  misconfiguration at startup per the documented contract.

Addresses #5457 review comments:
- blocker core_vmcp.go:134 (3365728160): nil cfg.Elicitation would nil-deref on
  the first elicitation step
@github-actions github-actions Bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Jun 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XL Extra large PR: 1000+ lines changed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

P1.4 New(cfg) -> VMCP core constructor

2 participants