Skip to content

Conversation

JAORMX
Copy link
Collaborator

@JAORMX JAORMX commented Oct 10, 2025

Summary

This proposal extends the ToolHive Kubernetes operator to support proxying remote MCP servers with OAuth authentication, bringing the thv run <remote-url> CLI capability to Kubernetes.

Problem

Organizations can't deploy centralized remote MCP proxies in Kubernetes to gain visibility, enforce policies, and maintain audit logs for employee interactions with external SaaS MCP services (e.g., Salesforce MCP, GitHub MCP).

Proposed Solution

  • MCPRemoteProxy CRD: Deploy stateless proxies for remote MCP servers
  • MCPExternalAuthConfig CRD: Optional token exchange configuration
  • JWT validation and identity extraction for policy enforcement
  • Reuses existing authz, audit, and tool filtering middleware

Seeking Feedback On

  1. Overall architecture and CRD design
  2. Security model and token handling approach
  3. Integration with existing middleware
  4. Edge cases or concerns not covered

🤖 Generated with Claude Code

Copy link

codecov bot commented Oct 10, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 52.98%. Comparing base (feee96c) to head (2807939).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2151      +/-   ##
==========================================
- Coverage   53.05%   52.98%   -0.07%     
==========================================
  Files         222      222              
  Lines       28868    28868              
==========================================
- Hits        15315    15296      -19     
- Misses      12414    12436      +22     
+ Partials     1139     1136       -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ChrisJBurns
Copy link
Collaborator

@claude Can you review the proposal made in this PR? Put your Kubernetes and security hat on for this one

Copy link
Contributor

claude bot commented Oct 10, 2025

Claude finished @ChrisJBurns's task —— View job


Remote MCP Proxy Proposal Review

I've completed a thorough review of the proposal from both Kubernetes and security perspectives. Overall, this is a well-architected proposal that builds intelligently on existing ToolHive infrastructure. Below is my detailed feedback.


✅ Strengths

Architecture & Design

  1. Excellent reuse of existing middleware - Leveraging auth, authz, audit, and tool filtering middleware is the right approach
  2. Stateless design - No token storage, horizontally scalable, follows cloud-native principles
  3. Clear CRD separation - MCPRemoteProxy vs MCPExternalAuthConfig provides good separation of concerns
  4. Defense in depth - Both proxy and remote MCP validate tokens independently
  5. Follows operator patterns - Consistent with existing MCPServer and MCPRegistry CRDs

Security Model

  1. JWT signature validation via JWKS is the right approach for token trust
  2. Token introspection fallback already implemented in pkg/auth/token.go
  3. No credential storage - Tokens flow through without persistence
  4. TLS enforcement mentioned for all connections
  5. Multi-tenancy via stateless architecture - Per-request validation ensures isolation

🔴 Critical Issues

1. Missing CRD Type Definition for MCPExternalAuthConfig

Severity: High

The proposal references MCPExternalAuthConfig CRD extensively (lines 223-237, 258) but doesn't define its structure.

Required:

// In cmd/thv-operator/api/v1alpha1/mcpexternalauthconfig_types.go
type MCPExternalAuthConfigSpec struct {
    Type string `json:"type"` // "tokenExchange"
    TokenExchange *TokenExchangeConfig `json:"tokenExchange,omitempty"`
}

type TokenExchangeConfig struct {
    TokenURL string `json:"token_url"`
    ClientID string `json:"client_id"`
    ClientSecretRef SecretRef `json:"client_secret_ref"`
    Audience string `json:"audience,omitempty"`
    Scope string `json:"scope,omitempty"`
}

Questions:

  • Should this be namespace-scoped or cluster-scoped?
  • Should it support cross-namespace references? (I'd recommend NO for security)
  • What status fields are needed? (e.g., connection health, last exchange time)

2. OIDC Configuration Ambiguity

Severity: High | Location: Lines 172-180, 250-255

The proposal shows two different uses of oidcConfig:

  • Base case: Validates tokens from remote MCP IDP (line 176: issuer: https://login.salesforce.com)
  • Token exchange: Validates company tokens (line 253: issuer: https://auth.company.com)

Problem: This overloading creates confusion. When externalAuthConfigRef is present, whose tokens should oidcConfig validate?

Recommendation:

# Option A: Make it explicit with two separate fields
spec:
  remoteURL: https://mcp.salesforce.com
  
  # For validating incoming tokens from employees/clients
  incomingAuth:
    oidcConfig: {...}
  
  # Optional: For exchanging tokens with remote service
  externalAuthConfigRef:
    name: salesforce-token-exchange

Or document clearly: "When externalAuthConfigRef is present, oidcConfig validates incoming tokens. When absent, oidcConfig validates tokens destined for the remote MCP."

3. Token Exchange Middleware Registration Missing

Severity: Medium | Location: Line 610

The proposal states: "⚠️ Needs registration in middleware factory (1-line fix)"

This needs to be addressed before implementation. Where exactly should this be registered? Looking at the existing auth middleware pattern in pkg/auth/middleware.go, you'll need:

  1. A middleware type constant in pkg/auth/tokenexchange/middleware.go
  2. Registration in the middleware factory (likely in pkg/transport or pkg/runner)
  3. Proper ordering guarantees (token exchange MUST run after auth)

4. RBAC Permissions Undefined

Severity: Medium

What Kubernetes RBAC permissions does the MCPRemoteProxy controller need? Looking at mcpserver_controller.go:141-150, you'll need kubebuilder RBAC markers like:

// +kubebuilder:rbac:groups=toolhive.stacklok.dev,resources=mcpremoteproxies,verbs=get;list;watch;create;update;patch;delete
// +kubebuilder:rbac:groups=toolhive.stacklok.dev,resources=mcpremoteproxies/status,verbs=get;update;patch
// +kubebuilder:rbac:groups=toolhive.stacklok.dev,resources=mcpexternalauthconfigs,verbs=get;list;watch

⚠️ Security Concerns

1. JWKS Endpoint Trust

Location: Lines 468-472

Concern: If the remote IDP's JWKS endpoint is compromised, the proxy will accept forged tokens.

Mitigations (already mentioned but emphasize):

  • ✅ Certificate validation (mentioned line 477)
  • ✅ Periodic refresh with caching (line 537)
  • ⚠️ Missing: JWKS endpoint pinning or allowlist for production environments
  • ⚠️ Missing: Monitoring/alerting for JWKS changes

Recommendation: Add CRD field for JWKS certificate pinning:

oidcConfig:
  inline:
    issuer: https://login.salesforce.com
    jwksUrl: https://login.salesforce.com/.well-known/jwks.json
    jwksCertFingerprint: "sha256:abc123..."  # Optional pinning

2. Token Replay Prevention

Severity: Medium | Location: Architecture diagrams

Observation: The proposal doesn't mention token replay protection (e.g., jti claim tracking, nonce validation).

Assessment: Probably acceptable because:

  • Tokens are validated by both proxy and remote MCP (defense in depth)
  • Short-lived tokens (typical OAuth2 practice)
  • HTTPS prevents network-level replay

Recommendation: Document this explicitly in "Security Considerations" and recommend short token TTLs (e.g., 5-15 minutes).

3. Authorization Policy Context

Location: Lines 354-383

Excellent: Cedar policies can reference principal.email, principal.groups extracted from validated JWT.

Question: How are groups extracted? Different IDPs put groups in different claims:

  • Some use groups (array)
  • Some use roles (array)
  • Some use cognito:groups (AWS Cognito)
  • Some embed in nested structures

Recommendation: Add configurable claim mapping to OIDCConfig:

oidcConfig:
  inline:
    issuer: https://auth.company.com
    claimMappings:
      email: "email"
      groups: "cognito:groups"  # Or "roles", etc.
      name: "preferred_username"

4. Network Policy Not Enforced

Location: Line 479

"Kubernetes NetworkPolicies to restrict egress" is mentioned but not in the CRD design.

Recommendation: Consider adding to MCPRemoteProxySpec:

type MCPRemoteProxySpec struct {
    // ...
    
    // NetworkPolicy defines network access controls
    // +optional
    NetworkPolicy *NetworkPolicyConfig `json:"networkPolicy,omitempty"`
}

type NetworkPolicyConfig struct {
    // Enabled controls whether NetworkPolicy is created
    Enabled bool `json:"enabled,omitempty"`
    
    // AllowedEgress defines allowed egress destinations
    AllowedEgress []NetworkPolicyEgressRule `json:"allowedEgress,omitempty"`
}

🟡 Design Recommendations

1. Status Field Design

Location: Lines 304-309

The proposed status is minimal. Recommend adding:

type MCPRemoteProxyStatus struct {
    Phase       MCPRemoteProxyPhase `json:"phase,omitempty"`
    URL         string              `json:"url,omitempty"`
    ExternalURL string              `json:"externalURL,omitempty"`
    Conditions  []metav1.Condition  `json:"conditions,omitempty"`
    
    // New fields:
    LastHealthCheck    *metav1.Time `json:"lastHealthCheck,omitempty"`
    RemoteReachable    bool         `json:"remoteReachable,omitempty"`
    TokenExchangeReady bool         `json:"tokenExchangeReady,omitempty"` // If configured
    ObservedGeneration int64        `json:"observedGeneration,omitempty"`
}

Conditions should include:

  • RemoteAvailable - Can proxy reach remote MCP?
  • AuthConfigured - Is OIDC/token exchange properly configured?
  • Ready - Overall readiness

2. Consistent with DESIGN.md Principles

Location: cmd/thv-operator/DESIGN.md

Your proposal correctly uses dedicated CRD attributes for business logic (remoteURL, oidcConfig, authzConfig) and would defer to PodTemplateSpec for infrastructure concerns. ✅

However: The proposal doesn't show PodTemplateSpec support in MCPRemoteProxySpec. Recommend adding:

type MCPRemoteProxySpec struct {
    // ...
    
    // PodTemplateSpec for infrastructure customization (node selection, resources, etc.)
    // +optional
    PodTemplateSpec *corev1.PodTemplateSpec `json:"podTemplateSpec,omitempty"`
}

3. Validation Webhooks

Not mentioned in proposal

You'll need admission webhooks for:

  1. Validating MCPRemoteProxy:

    • Ensure remoteURL is valid HTTPS
    • Ensure externalAuthConfigRef exists in same namespace (if specified)
    • Ensure oidcConfig OR externalAuthConfigRef is present (not neither)
    • Validate port ranges
  2. Validating MCPExternalAuthConfig:

    • Ensure client_secret_ref secret exists
    • Validate token_url is valid HTTPS
  3. Defaulting: Set default port (8080), default replicas (1), etc.

Example from existing operator:

// In cmd/thv-operator/api/v1alpha1/mcpremoteproxy_webhook.go
func (r *MCPRemoteProxy) ValidateCreate() error {
    if r.Spec.OIDCConfig == nil && r.Spec.ExternalAuthConfigRef == nil {
        return fmt.Errorf("must specify either oidcConfig or externalAuthConfigRef")
    }
    // ...
}

4. Observability

Location: Lines 534-546 (Performance Considerations)

Good: Mentions JWKS caching and introspection caching.

Missing: Metrics to expose. Recommend:

# Request metrics
toolhive_remote_proxy_requests_total{proxy="salesforce-mcp",status="success|failure"}
toolhive_remote_proxy_request_duration_seconds{proxy="salesforce-mcp"}

# Token metrics
toolhive_remote_proxy_token_validations_total{proxy="salesforce-mcp",result="valid|invalid|error"}
toolhive_remote_proxy_token_exchanges_total{proxy="salesforce-mcp",result="success|failure"}

# Cache metrics
toolhive_remote_proxy_jwks_cache_hits_total{proxy="salesforce-mcp"}
toolhive_remote_proxy_jwks_refreshes_total{proxy="salesforce-mcp"}

🟢 Kubernetes-Specific Feedback

1. Controller Implementation Path

Based on existing patterns, you'll create:

cmd/thv-operator/
├── api/v1alpha1/
│   ├── mcpremoteproxy_types.go (NEW)
│   ├── mcpexternalauthconfig_types.go (NEW)
│   ├── mcpremoteproxy_webhook.go (NEW)
│   └── mcpexternalauthconfig_webhook.go (NEW)
└── controllers/
    ├── mcpremoteproxy_controller.go (NEW)
    └── mcpremoteproxy_runconfig.go (NEW)

2. Resource Ownership

The Deployment and Service created by MCPRemoteProxy should have owner references:

controllerutil.SetControllerReference(proxy, deployment, r.Scheme)

This ensures automatic cleanup when proxy is deleted.

3. Proxy Deployment Strategy

The proposal doesn't specify. Recommend:

Deployment.Spec.Strategy = appsv1.DeploymentStrategy{
    Type: appsv1.RollingUpdateDeploymentStrategyType,
    RollingUpdate: &appsv1.RollingUpdateDeployment{
        MaxUnavailable: &intstr.IntOrString{Type: intstr.Int, IntVal: 0},
        MaxSurge:       &intstr.IntOrString{Type: intstr.Int, IntVal: 1},
    },
}

This ensures zero-downtime updates.

4. Service Account for Proxy

Will the proxy pods need a service account with specific permissions? If externalAuthConfigRef reads secrets, you'll need:

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
rules:
- apiGroups: [""]
  resources: ["secrets"]
  verbs: ["get"]
  resourceNames: ["token-exchange-creds"]  # Specific secrets only

📋 Edge Cases & Concerns

1. Token Expiration Mid-Request

What happens if a token expires while a long-running MCP tool call is executing?

Recommendation: Document that clients should ensure tokens have sufficient TTL for expected operation duration.

2. Remote MCP Authentication Changes

If the remote MCP changes its OIDC configuration (new issuer, rotated keys), the proxy will start rejecting tokens.

Mitigation: JWKS refresh handles key rotation. But issuer changes require proxy reconfiguration.

3. Cross-Namespace References

The proposal doesn't address whether externalAuthConfigRef can reference resources in other namespaces.

Recommendation: Enforce same-namespace only (consistent with ToolConfigRef in mcpserver_types.go:477-481):

// The referenced MCPExternalAuthConfig must exist in the same namespace.
// Cross-namespace references are not supported for security and isolation reasons.

4. Multiple Proxies to Same Remote MCP

Can multiple MCPRemoteProxy resources proxy to the same remote URL with different configs?

Use case: Different teams with different authz policies for same external service.

Recommendation: Yes, allow this. Each proxy is independent with its own policies.

5. Graceful Degradation

If token exchange is configured but the exchange endpoint is down, should the proxy:

  • A) Fail closed (reject all requests) ✅ Recommended
  • B) Fall through to direct token forwarding ❌ Security risk

Recommendation: Fail closed. Document this behavior.


🎯 Summary Recommendations

Must Fix Before Implementation

  1. ✅ Define MCPExternalAuthConfig CRD types
  2. ✅ Clarify oidcConfig usage in token exchange vs direct auth scenarios
  3. ✅ Register token exchange middleware in factory
  4. ✅ Define RBAC permissions for controller
  5. ✅ Add validation webhooks

Should Consider

  1. ✅ Add JWKS certificate pinning support
  2. ✅ Document token replay prevention strategy
  3. ✅ Add configurable claim mapping for groups/roles
  4. ✅ Enhance status fields with health checks
  5. ✅ Add NetworkPolicy support in CRD
  6. ✅ Define prometheus metrics
  7. ✅ Add PodTemplateSpec support for consistency

Nice to Have

  1. ✅ Add examples for common IDPs (Okta, Auth0, Keycloak)
  2. ✅ Document migration path from CLI to operator
  3. ✅ E2E test scenarios with mock remote MCP

✨ Final Assessment

This is a strong proposal that demonstrates deep understanding of both the MCP protocol and Kubernetes operator patterns. The architecture is sound, security model is reasonable, and the reuse of existing middleware is excellent.

The main gaps are around CRD completeness and operational details (RBAC, webhooks, observability). These are standard implementation concerns, not fundamental design flaws.

Recommendation: Proceed with implementation after addressing the critical issues above.


Great work on this proposal! The level of detail and the diagrams make it very clear what you're building. Looking forward to seeing this feature in ToolHive.


Job Run | Branch: proposal/remote-mcp-proxy

@JAORMX
Copy link
Collaborator Author

JAORMX commented Oct 10, 2025

Note that the external auth config crd is here #2150

}
```

**With OAuth-capable clients (future):**
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure this is the structure of future clients config files?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, it wasn't very clever for me to add samples here. I removed it. It's not up to us how these look like. The point was that it should work.

name: salesforce-tools

port: 8080
replicas: 3
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about this bit. If we want the proxy runner to dynamically scale itself based on usage, this value will keep overriding it back to 3.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the replicas from the CRD. Probably once this is implemented we should instead document how the Deployment looks like so folks can configure HPA or something along those lines accordingly without conflicting with the operator.

@JAORMX JAORMX force-pushed the proposal/remote-mcp-proxy branch from 1a50fe9 to daf7537 Compare October 13, 2025 14:13
- Add validation rules and defaults

**Controller**: `cmd/thv-operator/controllers/mcpremoteproxy_controller.go`
- Create Deployment running `thv-proxyrunner`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is everything in proxyrunner implemented to support this use-case?

Replicas int32 `json:"replicas,omitempty"`

// Optional: Company OIDC for token exchange
OIDCConfig *OIDCConfigRef `json:"oidcConfig,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need the OIDCConfig for TE, IIRC all the information is in ExternalAuthConfigRef...I guess you just need OIDCConfig to validate the incoming token?


### Token Handling

**No Token Storage**: Proxy never stores user tokens.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how it works currently, but note that for token exchange this might change...as every exchange is not completely free, we might want to implement some sort of caching, I was thinking about taking oauth2.ReuseTokenSource into use.

JAORMX and others added 2 commits October 14, 2025 09:33
This proposal extends the ToolHive Kubernetes operator to support
proxying remote Model Context Protocol (MCP) servers with OAuth
authentication, bringing feature parity with the existing
`thv run <remote-url>` CLI capability.

Key features:
- New MCPRemoteProxy CRD for deploying remote MCP proxies
- JWT validation and identity extraction for policy enforcement
- Optional token exchange support via MCPExternalAuthConfig CRD
- Integration with existing authz, audit, and tool filtering
- Stateless, horizontally scalable architecture

This enables enterprises to gain centralized observability and policy
enforcement over external MCP usage without requiring changes to
remote services or storing user credentials.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Signed-off-by: Juan Antonio Osorio <[email protected]>
Add status field enhancements:
- Add ObservedGeneration field
- Document recommended condition types (Ready, RemoteAvailable)

Add implementation clarifications:
- Explain authentication model differences between CLI and K8s proxy
- Document two proxy modes: direct (token forwarding) vs token exchange
- Clarify OIDC configuration validates different issuers in different modes
- Document cross-namespace reference policy
- Specify middleware ordering
- Note that token exchange may use oauth2.ReuseTokenSource for caching

Fix numbering in best practices list.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@JAORMX JAORMX force-pushed the proposal/remote-mcp-proxy branch from d041022 to 2807939 Compare October 14, 2025 06:49
@JAORMX
Copy link
Collaborator Author

JAORMX commented Oct 14, 2025

Thanks for the review! I've addressed your comments in commit 2807939:

Re: OIDC Config for token exchange (line 283):
Exactly right - the oidcConfig validates the incoming token. The externalAuthConfigRef (from #2150) contains the token exchange configuration. Added clarification in "Implementation Notes" section.

Re: Token exchange caching (line 425):
Good point! Updated "Authentication Model" section to note that token exchange may cache using oauth2.ReuseTokenSource to avoid expensive exchanges on every request.

Re: ProxyRunner implementation (line 395):
The proxyrunner has the middleware plumbing. Main work is:

Added middleware ordering to the proposal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants