Skip to content

Configure rate limits on VirtualMCPServer PR B 1#5276

Open
Sanskarzz wants to merge 6 commits into
stacklok:mainfrom
Sanskarzz:ratelimitingVMCP2
Open

Configure rate limits on VirtualMCPServer PR B 1#5276
Sanskarzz wants to merge 6 commits into
stacklok:mainfrom
Sanskarzz:ratelimitingVMCP2

Conversation

@Sanskarzz
Copy link
Copy Markdown
Contributor

@Sanskarzz Sanskarzz commented May 14, 2026

Summary

This PR wires the VirtualMCPServer.spec.config.rateLimiting config into the vMCP runtime for normal tools/call requests.

The change now focuses only on the base rate-limit runtime path:

  • Builds vMCP rate-limit middleware in a vMCP-side factory.
  • Adds pkg/ratelimit.NewMiddleware as the typed, value-returning builder for Redis setup, limiter construction, cleanup, and HTTP wire behavior.
  • Keeps pkg/ratelimit.CreateMiddleware as the legacy runner-based entry point by delegating to pkg/ratelimit.NewMiddleware.
  • Passes the built middleware into vmcpserver.Config as RateLimitMiddleware.
  • Inserts rate limiting after incoming auth + MCP parsing and before audit/discovery.
  • Uses parsed.ResourceID as the rate-limit tool key.
  • Keeps pkg/ratelimit/middleware.go as the single owner of request-time rate-limit behavior.

Optimizer-aware call_tool inner-tool-name resolution is intentionally deferred to a follow-up PR.

Fixes: #4552

Type of Change

  • Bug fix
  • New feature
  • Refactoring
  • Dependency update
  • Documentation

Test plan

  • Unit tests (task test)
  • E2E tests (task test-e2e)
  • Linting (task lint-fix)
  • Manual testing (describe below)

Changes

File Change
pkg/ratelimit/middleware.go Adds NewMiddleware as the typed reusable builder and keeps CreateMiddleware as the runner API wrapper.
pkg/vmcp/ratelimit/factory/middleware.go Adds the vMCP-side factory that validates vMCP config and calls pkg/ratelimit.NewMiddleware directly.
pkg/vmcp/cli/serve.go Builds the rate-limit middleware next to incoming auth middleware and passes it to the server config.
pkg/vmcp/server/server.go Adds RateLimitMiddleware to server config and inserts it into the MCP middleware chain.
pkg/ratelimit/middleware_test.go Covers the typed NewMiddleware builder and preserves runner-wrapper coverage.
pkg/vmcp/ratelimit/factory/middleware_test.go Covers Redis validation, per-user shared buckets, and post-aggregation tool names.
test/e2e/thv-operator/virtualmcp/virtualmcp_rate_limiting_test.go Adds focused E2E coverage for per-user rate limiting with OIDC.

Notes for Reviewers

  • This PR adds one small public API to pkg/ratelimit: NewMiddleware(MiddlewareParams), so vMCP can reuse the rate-limit implementation without a fake MiddlewareRunner or JSON marshal/unmarshal round-trip.
  • This PR intentionally does not implement optimizer-aware call_tool tool-name extraction.
  • Request-time Redis errors fail open, matching existing rate-limit middleware behavior.
  • Startup still validates Redis connectivity when rate limiting is configured.
  • pkg/ratelimit/middleware.go remains the single owner of Redis setup, limiter construction, cleanup, the JSON-RPC 429 response, and request-time rate-limit behavior.

Manual Testing

Prerequisites

Use a local kind cluster with locally built operator and vMCP images:

task kind-with-toolhive-operator-local
export KUBECONFIG="$PWD/kconfig.yaml"

Verify the cluster and operator:

kubectl get nodes
kubectl get pods -n toolhive-system
kubectl get crd virtualmcpservers.toolhive.stacklok.dev mcpservers.toolhive.stacklok.dev

Expected:

  • The kind node is Ready.
  • The ToolHive operator pod is Running.
  • The VirtualMCPServer and MCPServer CRDs exist.

1. Validate Admission: Redis Is Required

Apply a vMCP with config.rateLimiting but without Redis session storage:

kubectl apply -f - <<'EOF'
apiVersion: toolhive.stacklok.dev/v1beta1
kind: VirtualMCPServer
metadata:
  name: vmcp-rl-invalid-no-redis
  namespace: default
spec:
  groupRef:
    name: vmcp-rl-manual-group
  incomingAuth:
    type: anonymous
  config:
    rateLimiting:
      shared:
        maxTokens: 1
        refillPeriod: 1m
EOF

Expected rejection:

config.rateLimiting requires sessionStorage with provider 'redis'

2. Validate Admission: Per-User Requires OIDC

Apply a vMCP with perUser rate limiting but anonymous auth:

kubectl apply -f - <<'EOF'
apiVersion: toolhive.stacklok.dev/v1beta1
kind: VirtualMCPServer
metadata:
  name: vmcp-rl-invalid-peruser-anon
  namespace: default
spec:
  groupRef:
    name: vmcp-rl-manual-group
  incomingAuth:
    type: anonymous
  sessionStorage:
    provider: redis
    address: redis-rl-manual.default.svc.cluster.local:6379
  config:
    rateLimiting:
      perUser:
        maxTokens: 1
        refillPeriod: 1m
EOF

Expected rejection:

config.rateLimiting.perUser requires incomingAuth.type oidc

3. Deploy Redis

kubectl apply -f - <<'EOF'
apiVersion: apps/v1
kind: Deployment
metadata:
  name: redis-rl-manual
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: redis-rl-manual
  template:
    metadata:
      labels:
        app: redis-rl-manual
    spec:
      containers:
      - name: redis
        image: redis:7-alpine
        ports:
        - containerPort: 6379
---
apiVersion: v1
kind: Service
metadata:
  name: redis-rl-manual
  namespace: default
spec:
  selector:
    app: redis-rl-manual
  ports:
  - name: redis
    port: 6379
    targetPort: 6379
EOF

kubectl wait --for=condition=Available deployment/redis-rl-manual -n default --timeout=120s

4. Verify ConfigMap Serialization

Create a valid vMCP with a shared limit. This uses anonymous auth so setup stays simple; runtime per-user enforcement is validated by the focused E2E test in the next section.

kubectl apply -f - <<'EOF'
---
apiVersion: toolhive.stacklok.dev/v1beta1
kind: MCPGroup
metadata:
  name: vmcp-rl-manual-group
  namespace: default
spec:
  description: Manual vMCP rate-limit group
---
apiVersion: toolhive.stacklok.dev/v1beta1
kind: MCPServer
metadata:
  name: vmcp-rl-manual-yardstick
  namespace: default
spec:
  groupRef:
    name: vmcp-rl-manual-group
  image: ghcr.io/stackloklabs/yardstick/yardstick-server:1.1.1
  transport: streamable-http
  proxyPort: 8080
  mcpPort: 8080
---
apiVersion: toolhive.stacklok.dev/v1beta1
kind: VirtualMCPServer
metadata:
  name: vmcp-rl-manual
  namespace: default
spec:
  groupRef:
    name: vmcp-rl-manual-group
  incomingAuth:
    type: anonymous
  sessionStorage:
    provider: redis
    address: redis-rl-manual.default.svc.cluster.local:6379
  serviceType: NodePort
  config:
    rateLimiting:
      shared:
        maxTokens: 1
        refillPeriod: 1m
EOF

Wait for readiness:

kubectl wait --for=jsonpath='{.status.phase}'=Ready mcpserver/vmcp-rl-manual-yardstick -n default --timeout=180s
kubectl wait --for=jsonpath='{.status.phase}'=Ready virtualmcpserver/vmcp-rl-manual -n default --timeout=300s
kubectl get virtualmcpserver vmcp-rl-manual -n default

Inspect the generated vMCP config:

kubectl get configmap vmcp-rl-manual-vmcp-config -n default -o yaml

Expected config.yaml contains:

sessionStorage:
  provider: redis
  address: redis-rl-manual.default.svc.cluster.local:6379
rateLimiting:
  shared:
    maxTokens: 1
    refillPeriod:
      duration: 1m0s

5. Verify Runtime Enforcement

Run the focused E2E. It deploys Redis, a parameterized OIDC server, one backend MCPServer, and a vMCP with perUser.maxTokens=1. It then sends two authenticated tools/call requests from the same user and expects the second call to be rate-limited.

KUBECONFIG="$PWD/kconfig.yaml" go test ./test/e2e/thv-operator/virtualmcp \
  -run TestE2E \
  -count=1 \
  -timeout=15m \
  -args -ginkgo.v -ginkgo.focus="VirtualMCPServer Rate Limiting"

Expected:

ok      github.com/stacklok/toolhive/test/e2e/thv-operator/virtualmcp   105.995s

This validates:

  • vMCP reads config.rateLimiting.
  • The vMCP rate-limit factory creates the Redis client and limiter.
  • Middleware runs after auth and MCP parsing.
  • The per-user bucket is keyed by authenticated subject.
  • The second tools/call receives the expected rate-limit error.

6. Cleanup

kubectl delete virtualmcpserver vmcp-rl-manual -n default --ignore-not-found
kubectl delete mcpserver vmcp-rl-manual-yardstick -n default --ignore-not-found
kubectl delete mcpgroup vmcp-rl-manual-group -n default --ignore-not-found
kubectl delete deployment redis-rl-manual -n default --ignore-not-found
kubectl delete service redis-rl-manual -n default --ignore-not-found

The focused E2E uses timestamped resources and cleans them up automatically.

Large PR Justification

This is a new feature package with a large test suite, and it needs to land as one coherent phase.

@github-actions github-actions Bot added the size/L Large PR: 600-999 lines changed label May 14, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 14, 2026

Codecov Report

❌ Patch coverage is 66.10169% with 20 lines in your changes missing coverage. Please review.
✅ Project coverage is 69.20%. Comparing base (05f11b5) to head (5776155).
⚠️ Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
pkg/vmcp/cli/serve.go 28.57% 15 Missing ⚠️
pkg/ratelimit/middleware.go 61.53% 3 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5276      +/-   ##
==========================================
+ Coverage   68.87%   69.20%   +0.32%     
==========================================
  Files         634      635       +1     
  Lines       64439    64522      +83     
==========================================
+ Hits        44384    44650     +266     
+ Misses      16779    16569     -210     
- Partials     3276     3303      +27     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Sanskarzz Sanskarzz marked this pull request as ready for review May 14, 2026 18:17
@github-actions github-actions Bot added size/L Large PR: 600-999 lines changed and removed size/L Large PR: 600-999 lines changed labels May 14, 2026
@Sanskarzz Sanskarzz force-pushed the ratelimitingVMCP2 branch from 695a96c to 480152f Compare May 15, 2026 14:09
@github-actions github-actions Bot added size/XL Extra large PR: 1000+ lines changed and removed size/L Large PR: 600-999 lines changed labels May 15, 2026
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Large PR Detected

This PR exceeds 1000 lines of changes and requires justification before it can be reviewed.

How to unblock this PR:

Add a section to your PR description with the following format:

## Large PR Justification

[Explain why this PR must be large, such as:]
- Generated code that cannot be split
- Large refactoring that must be atomic
- Multiple related changes that would break if separated
- Migration or data transformation

Alternative:

Consider splitting this PR into smaller, focused changes (< 1000 lines each) for easier review and reduced risk.

See our Contributing Guidelines for more details.


This review will be automatically dismissed once you add the justification section.

@Sanskarzz Sanskarzz force-pushed the ratelimitingVMCP2 branch from 480152f to 619264d Compare May 23, 2026 20:07
@github-actions github-actions Bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels May 23, 2026
@Sanskarzz Sanskarzz force-pushed the ratelimitingVMCP2 branch from 619264d to f3cf1e4 Compare May 25, 2026 19:00
@github-actions github-actions Bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels May 25, 2026
@github-actions github-actions Bot dismissed their stale review May 25, 2026 20:16

Large PR justification has been provided. Thank you!

@github-actions
Copy link
Copy Markdown
Contributor

✅ Large PR justification has been provided. The size review has been dismissed and this PR can now proceed with normal review.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR wires the VirtualMCPServer.spec.config.rateLimiting field (introduced in PR A) into the vMCP runtime so rate limits are actually enforced. It builds a Redis-backed limiter during server construction, refactors the MCP middleware chain so rate limiting runs after auth + MCP parsing (allowing per-user/per-tool keying), and adds optimizer-aware tool-name resolution for call_tool.

Changes:

  • Pass namespace and RateLimiting config through vmcpserver.Config and initialize a Redis-backed rate-limit middleware (with cleanup on Stop) in New().
  • Refactor the vMCP MCP middleware composition into small applyXxx helpers and place rate limiting between MCP parser and audit/discovery; resolve call_tool to its inner tool_name when optimizer is active.
  • Generalize pkg/ratelimit/middleware.go to expose NewMiddleware with an optional ToolNameResolver, and add unit + focused E2E coverage.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated no comments.

Show a summary per file
File Description
pkg/vmcp/cli/serve.go Plumbs VMCP_NAMESPACE env (defaulting to local) and RateLimiting into vmcpserver.Config.
pkg/vmcp/cli/serve_test.go Adds unit tests for vmcpNamespace() default and env override.
pkg/vmcp/server/server.go Adds Namespace + RateLimiting config fields, initializes rate-limit middleware in New(), and splits MCP middleware chain into helpers with new execution order.
pkg/vmcp/server/ratelimit.go New file: builds Redis client + limiter, validates Redis session storage, returns middleware + cleanup, and resolves optimizer call_tool inner tool name.
pkg/vmcp/server/ratelimit_test.go New unit tests for config validation, per-user/per-tool/optimizer behavior, and chain helpers.
pkg/ratelimit/middleware.go Introduces ToolNameResolver/DefaultToolNameResolver and exported NewMiddleware; CreateMiddleware now uses it.
pkg/ratelimit/middleware_test.go Adds tests for custom resolver, default resolver nil-handling, and end-to-end CreateMiddleware registration.
test/e2e/thv-operator/virtualmcp/virtualmcp_rate_limiting_test.go New focused E2E: deploys Redis + OIDC + backend + vMCP with perUser.maxTokens=1 and asserts second tools/call is rejected.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@Sanskarzz Sanskarzz force-pushed the ratelimitingVMCP2 branch from f3cf1e4 to 589dfe9 Compare May 26, 2026 19:22
@github-actions github-actions Bot removed the size/XL Extra large PR: 1000+ lines changed label May 26, 2026
@Sanskarzz
Copy link
Copy Markdown
Contributor Author

Sanskarzz commented May 29, 2026

@jerm-dro Thanks, that split makes sense.
I’ve updated this branch to keep only the base vMCP rate-limit runtime wiring, and I've addressed the review comment changes.
I’ll raise a separate follow-up PR for optimizer call_tool support using the pass-through tools pattern you suggested, without adding optimizer concepts to pkg/ratelimit. I don’t see a blocker that requires optimizer support to land this PR.

Copy link
Copy Markdown
Contributor

@jerm-dro jerm-dro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inline question on the duplication with pkg/ratelimit/middleware.go — blocking until we've decided whether to share or document why we can't.

return rateLimitHandler(limiter), cleanup, nil
}

func rateLimitHandler(limiter ratelimit.Limiter) func(http.Handler) http.Handler {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question (blocking): rateLimitHandler, writeRateLimited, and rateLimitedBody here look like close copies of pkg/ratelimit/middleware.go:110-175 — same tools/call early-return, same identity extraction, same JSON-RPC 429 body. The Redis-ping flow above also overlaps with lines 80-95 of that file.

Is there a way to share the rate-limit logic between the two paths, or is there a constraint that makes that worse than it looks? Marking blocking mainly because two copies of the wire contract tend to drift. Open on how to address it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the copied vMCP rate-limit handler/429 response/Redis setup logic.
The vMCP factory now delegates to the existing pkg/ratelimit.CreateMiddleware using a small runner adapter that captures the middleware and cleanup. That keeps pkg/ratelimit/middleware.go as the single owner of the tools/call filtering, identity extraction, fail-open behavior, Redis ping/client setup, and JSON-RPC 429 wire contract, while still letting vMCP pass RateLimitMiddleware into server.Config.

@github-actions github-actions Bot added size/L Large PR: 600-999 lines changed and removed size/L Large PR: 600-999 lines changed labels Jun 3, 2026
@Sanskarzz Sanskarzz force-pushed the ratelimitingVMCP2 branch from d09cb69 to f5a7973 Compare June 3, 2026 11:49
@github-actions github-actions Bot added size/L Large PR: 600-999 lines changed and removed size/L Large PR: 600-999 lines changed labels Jun 3, 2026
@Sanskarzz Sanskarzz changed the title Configure rate limits on VirtualMCPServer PR B Configure rate limits on VirtualMCPServer PR B 1 Jun 3, 2026
@github-actions github-actions Bot added size/L Large PR: 600-999 lines changed and removed size/L Large PR: 600-999 lines changed labels Jun 3, 2026
Sanskarzz added 4 commits June 4, 2026 10:54
Signed-off-by: Sanskarzz <sanskar.gur@gmail.com>
Signed-off-by: Sanskarzz <sanskar.gur@gmail.com>
Signed-off-by: Sanskarzz <sanskar.gur@gmail.com>
Signed-off-by: Sanskarzz <sanskar.gur@gmail.com>
@Sanskarzz Sanskarzz force-pushed the ratelimitingVMCP2 branch from f5a7973 to 727c50e Compare June 4, 2026 05:28
@github-actions github-actions Bot added size/L Large PR: 600-999 lines changed and removed size/L Large PR: 600-999 lines changed labels Jun 4, 2026
return runner.middleware.Handler(), cleanup, nil
}

type captureRunner struct {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: captureRunner implements the MiddlewareRunner interface only to catch one callback — the other five methods (SetAuthInfoHandler, SetPrometheusHandler, GetConfig, GetUpstreamTokenReader, GetKeyProvider) are dead-weight stubs that will need updating every time the interface grows. There's also a JSON round-trip: this factory marshals MiddlewareParams into MiddlewareConfig.Parameters, then ratelimit.CreateMiddleware immediately unmarshals it back into the same struct in the same process. Both round-trips exist only to fit through an API shaped for a different caller.

The cleaner path is to export a value-returning builder from pkg/ratelimit and let CreateMiddleware become a thin wrapper that calls it:

// In pkg/ratelimit/middleware.go
func NewMiddleware(params MiddlewareParams) (types.Middleware, error) {
    if params.RedisAddr == "" {
        return nil, fmt.Errorf("rate limit middleware requires a Redis address")
    }
    client := redis.NewClient(&redis.Options{
        Addr:     params.RedisAddr,
        DB:       int(params.RedisDB),
        Password: os.Getenv(redisPasswordEnvVar),
    })
    pingCtx, pingCancel := context.WithTimeout(context.Background(), 5*time.Second)
    defer pingCancel()
    if err := client.Ping(pingCtx).Err(); err != nil {
        _ = client.Close()
        return nil, fmt.Errorf("rate limit middleware: failed to connect to Redis at %s: %w", params.RedisAddr, err)
    }
    limiter, err := NewLimiter(client, params.Namespace, params.ServerName, params.Config)
    if err != nil {
        _ = client.Close()
        return nil, fmt.Errorf("failed to create rate limiter: %w", err)
    }
    return &rateLimitMiddleware{handler: rateLimitHandler(limiter), client: client}, nil
}

// CreateMiddleware stays as the Runner-API entry point for legacy callers
func CreateMiddleware(config *types.MiddlewareConfig, runner types.MiddlewareRunner) error {
    var params MiddlewareParams
    if err := json.Unmarshal(config.Parameters, &params); err != nil {
        return fmt.Errorf("failed to unmarshal rate limit middleware parameters: %w", err)
    }
    mw, err := NewMiddleware(params)
    if err != nil {
        return err
    }
    runner.AddMiddleware(MiddlewareType, mw)
    return nil
}

This file then collapses to:

func NewMiddleware(_ context.Context, cfg Config) (func(http.Handler) http.Handler, func(context.Context) error, error) {
    if cfg.RateLimiting == nil {
        return nil, nil, nil
    }
    if cfg.SessionStorage == nil || cfg.SessionStorage.Provider != "redis" {
        return nil, nil, fmt.Errorf("rate limiting requires Redis session storage")
    }
    if cfg.SessionStorage.Address == "" {
        return nil, nil, fmt.Errorf("rate limiting requires Redis session storage address")
    }
    mw, err := ratelimit.NewMiddleware(ratelimit.MiddlewareParams{
        Namespace:  cfg.Namespace,
        ServerName: cfg.ServerName,
        Config:     cfg.RateLimiting,
        RedisAddr:  cfg.SessionStorage.Address,
        RedisDB:    cfg.SessionStorage.DB,
    })
    if err != nil {
        return nil, nil, err
    }
    cleanup := func(context.Context) error { return mw.Close() }
    return mw.Handler(), cleanup, nil
}

captureRunner and the five stubs go away, the JSON marshal/unmarshal round-trip goes away, and ratelimit.CreateMiddleware is still available for the legacy Runner-based callers. The single keep-this-tradeoff-clear change is one new exported symbol (ratelimit.NewMiddleware).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that make sense, I've added the commit with suggested changes.

@github-actions github-actions Bot added size/L Large PR: 600-999 lines changed and removed size/L Large PR: 600-999 lines changed labels Jun 6, 2026
Signed-off-by: Sanskarzz <sanskar.gur@gmail.com>
@github-actions github-actions Bot added size/L Large PR: 600-999 lines changed and removed size/L Large PR: 600-999 lines changed labels Jun 6, 2026
@github-actions github-actions Bot added size/L Large PR: 600-999 lines changed and removed size/L Large PR: 600-999 lines changed labels Jun 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/L Large PR: 600-999 lines changed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Configure rate limits on VirtualMCPServer

3 participants