Skip to content

Telemetry: optional, default-off authenticated-user attribution (user.id) on the MCP server span #5455

@jeff-jsq

Description

@jeff-jsq

Telemetry: optional, default-off authenticated-user attribution (user.id) on the MCP server span

Summary

When ToolHive is fronted with authentication (incomingAuth: oidc on a vMCP, or any deployment that populates auth.Identity), it knows which authenticated principal made each MCP call — but that identity never reaches telemetry. The inbound server span from pkg/telemetry/middleware.go carries mcp.method.name, gen_ai.tool.name, mcp.client.name, transport/HTTP attributes, etc., but nothing tying a span to the authenticated user, so per-user questions ("which principal invoked this tool / hit this backend") can't be answered from traces.

This proposes optionally emitting the authenticated subject as the OTel user.id span attribute on the inbound MCP server span, defaulting to off. Scope is deliberately narrow; design choices that need maintainer direction are left as open questions rather than prescribed.

Current behavior

  • HTTPMiddleware.addMCPAttributes(ctx, span, r) sets MCP/JSON-RPC semconv attributes from the parsed request.
  • auth.Identity (with Subject, the OIDC sub) is placed on the request context by the auth middleware and already consumed elsewhere via auth.IdentityFromContext (e.g. pkg/audit, pkg/ratelimit, pkg/vmcp/server/telemetry.go).
  • In ToolHive's default middleware chain, authentication runs outer to telemetry, so the identity is present on the context at the span site. (Custom/pre-populated chains would need verification.)
  • The telemetry middleware does not read it, so no user attribution lands on spans.

Proposed change (narrow)

Add an optional, default-off user.id attribute to the inbound MCP server span, sourced from auth.IdentityFromContext(ctx).Subject, emitted only when the feature is enabled and an identity is present (so anonymous/unauthenticated requests are unaffected).

Feasibility note: addMCPAttributes already receives ctx, and attribute keys in this file are plain string literals, so this is an additive read with no signature change and no semconv-package bump. Exact placement is for maintainers to choose given the in-flight middleware refactor (below).

Attribute key

Proposing user.id rather than enduser.id or a vendor key like jwt.sub. Both user.id and enduser.id currently exist in the OTel registry at Development stability, but enduser.id carries an explicit PII warning and the namespace is trending toward user.*; user.id is also consistent with #3791's preference for OTel-standard attributes over custom ones. Open to enduser.id if maintainers prefer to track the (not-yet-merged) semconv consolidation differently.

Why default-off

  • The subject can be personally- or tenant-identifying even though auth.Identity is documented as "credential-free." Not-a-credential is not the same as not-sensitive.
  • [vMCP] Add observability for session lifecycle (Phase 4) #3873's acceptance criteria require that no sensitive data appears in logs, metrics, or traces by default.
  • This file already takes a sensitivity stance — sanitizeArguments/isSensitiveKey redact sensitive tool-call arguments, and addEnvironmentAttributes is conditionally gated — so a default-off identity attribute is consistent with existing patterns here.

Not in scope for this issue

  • Metrics. user.id is high-cardinality and should not be added to any metric instrument in ToolHive (collectors can still derive span-based metrics downstream if an operator chooses).
  • No change to auth, session, or token handling — this only reads an identity already on the context.

Open questions for maintainers

  1. Raw vs. pseudonymous. Should an enabled mode emit the raw subject, or also offer a stable hashed/pseudonymous form (preserving group-by-user correlation without an identifying value)? And does pseudonymization belong in ToolHive at all, or in a collector/processor? (Hashing brings salt/secret-rotation and cross-instance-correlation questions that are worth a deliberate decision rather than a default.)
  2. Config surface. Where should the opt-in live — the telemetry config consumed by NewHTTPMiddleware, the operator MCPTelemetryConfig CRD, or both? Happy to follow your convention.
  3. Placement vs. refactor. Should this target the current middleware.go, or land after the middleware-chain refactor (P2.3 Move middleware chain under Serve; remove authz + annotation mw #5441 and the server.New split, Phase 2: Serve transport helper, re-home transport, replace discovery #5431/Phase 3: Reduce server.New to wrapper + config split #5432/P1.4 New(cfg) -> VMCP core constructor #5437/P3.1 deriveCoreConfig/deriveServerConfig config split #5444) settles?
  4. Backend/client spans. Worth extending the same attribute to the vMCP backend/client spans (pkg/vmcp/server/telemetry.go), or keep this to the inbound server span first?

Acceptance criteria (core)

  • No identity attribute on spans by default — test asserting absence when the feature is unset/disabled.
  • When enabled with an identity on the context, user.id is emitted from Subject; anonymous requests emit nothing.
  • user.id is not added to any metric instrument.
  • A test covers identity present vs. absent at the telemetry site (and flags custom middleware orderings).
  • Docs note the attribute, its opt-in nature, and the PII consideration.

(Additional criteria — e.g. a hashed mode — to follow if maintainers want that direction.)

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions