diff --git a/modules/ROOT/nav.adoc b/modules/ROOT/nav.adoc index 11e0a42..9b093a5 100644 --- a/modules/ROOT/nav.adoc +++ b/modules/ROOT/nav.adoc @@ -1,4 +1,4 @@ -* xref:index.adoc[Overview] +* xref:ROOT:index.adoc[Overview] * xref:get-started:index.adoc[Get started] ** xref:get-started:byoc-prereqs.adoc[Prerequisites] ** xref:get-started:byoc-quickstart.adoc[Quickstart] diff --git a/modules/ROOT/partials/ai-hub/configure-ai-hub.adoc b/modules/ROOT/partials/ai-hub/configure-ai-hub.adoc index d7b8d0f..2d4daf6 100644 --- a/modules/ROOT/partials/ai-hub/configure-ai-hub.adoc +++ b/modules/ROOT/partials/ai-hub/configure-ai-hub.adoc @@ -91,7 +91,7 @@ AI Hub mode automatically provisions 6 backend pools to handle different request * Authentication: x-api-key header * Transform: OpenAI → Anthropic Messages API * Timeout: Standard (60 seconds) -* Models: All `anthropic/*` models via OpenAI-compatible endpoint +* Models: All `anthropic/*` models through OpenAI-compatible endpoint -- . *Anthropic with Transform (Streaming)*: Converts OpenAI format to Anthropic's native format for streaming requests @@ -180,7 +180,7 @@ While routing rules are immutable, you can customize routing behavior through us include::ROOT:partial$ai-hub-preference-toggles.adoc[] -=== Set preferences via Console +=== Set preferences through Console . Navigate to your AI Hub gateway. . Click *Settings* → *Preferences*. @@ -196,7 +196,7 @@ include::ROOT:partial$ai-hub-preference-toggles.adoc[] Changes take effect immediately for new requests. -=== Set preferences via API +=== Set preferences through API [,bash] ---- diff --git a/modules/ROOT/partials/ai-hub/eject-to-custom-mode.adoc b/modules/ROOT/partials/ai-hub/eject-to-custom-mode.adoc index c27a4cc..128490c 100644 --- a/modules/ROOT/partials/ai-hub/eject-to-custom-mode.adoc +++ b/modules/ROOT/partials/ai-hub/eject-to-custom-mode.adoc @@ -140,7 +140,7 @@ Store these files securely. You'll reference them when configuring Custom mode r Define your post-ejection configuration: . *Routing rules*: Write CEL expressions that replicate AI Hub behavior, then add your custom rules -. *Backend pools*: Identify modifications needed (timeouts, custom providers, etc.) +. *Backend pools*: Identify modifications needed (timeouts, custom providers, and so on) . *Testing strategy*: Plan how you'll validate that existing functionality still works . *Rollout approach*: Decide whether to eject immediately or test in staging first @@ -185,13 +185,13 @@ Provide builders with clear expectations: The ejection process is irreversible. Follow these steps carefully. -=== Step 1: Initiate ejection +=== Initiate ejection . Navigate to your gateway in the console. . Click *Settings*. . Click *Eject to Custom Mode* button. -=== Step 2: Confirm understanding +=== Confirm understanding The console presents warnings about ejection: @@ -204,14 +204,14 @@ The console presents warnings about ejection: Check all boxes to proceed. -=== Step 3: Execute ejection +=== Execute ejection . Enter the gateway name to confirm: `[Your Gateway Name]` . Click *Eject to Custom Mode*. Ejection typically completes in seconds. The gateway remains available during the transition. -You can also eject via API: +You can also eject through API: [,bash] ---- @@ -230,7 +230,7 @@ Expected response: } ---- -=== Step 4: Verify ejection +=== Verify ejection After ejection completes: diff --git a/modules/ROOT/partials/ai-hub/gateway-modes.adoc b/modules/ROOT/partials/ai-hub/gateway-modes.adoc index 32075cf..bdb490e 100644 --- a/modules/ROOT/partials/ai-hub/gateway-modes.adoc +++ b/modules/ROOT/partials/ai-hub/gateway-modes.adoc @@ -44,7 +44,7 @@ When you create a gateway, you choose between two modes that differ in configura |*Routing preferences* |6 configurable toggles -|N/A (full control via rules) +|N/A (full control through rules) |*Modify backends* |Cannot modify/delete @@ -67,7 +67,7 @@ AI Hub mode provides instant, pre-configured access to OpenAI, Anthropic, and Go AI Hub mode eliminates complex LLM gateway configuration by providing pre-built routing rules and backend pools. Platform admins add provider credentials (OpenAI, Anthropic, Google Gemini) once, and all teams immediately benefit from intelligent routing across both providers. -Teams adopting LLMs typically face significant friction: configuring backends and routing rules takes hours, different providers have incompatible APIs, and developers must learn each provider's quirks. AI Hub mode solves this by providing instant access—IT adds API keys once, all teams benefit immediately. +Teams adopting LLMs typically face significant friction: configuring backends and routing rules takes hours, different providers have incompatible APIs, and developers must learn each provider's quirks. AI Hub mode solves this by providing instant access: IT adds API keys once, all teams benefit immediately. === Pre-configured components @@ -86,7 +86,7 @@ When you create an AI Hub gateway, you automatically get: * Model prefix routing: `openai/*`, `anthropic/*` * Model name pattern routing: `gpt-*`, `claude-*`, `o1-*` -* Special routing: embeddings, images, audio → OpenAI only +* Special routing: Embeddings, images, audio → OpenAI only * Native SDK detection: `/v1/messages` → Anthropic passthrough * Streaming detection → Extended timeout backends diff --git a/modules/ROOT/partials/ai-hub/use-ai-hub-gateway.adoc b/modules/ROOT/partials/ai-hub/use-ai-hub-gateway.adoc index 4e8d6e7..5a899a2 100644 --- a/modules/ROOT/partials/ai-hub/use-ai-hub-gateway.adoc +++ b/modules/ROOT/partials/ai-hub/use-ai-hub-gateway.adoc @@ -258,8 +258,8 @@ model = "openai/gpt-5.2" model = "anthropic/claude-sonnet-4.5" # ⚠️ Works but relies on pattern matching -model = "gpt-5.2" # Routes to OpenAI via pattern matching -model = "claude-sonnet-4.5" # Routes to Anthropic via pattern matching +model = "gpt-5.2" # Routes to OpenAI through pattern matching +model = "claude-sonnet-4.5" # Routes to Anthropic through pattern matching ---- Explicit provider prefixes ensure deterministic routing and make your code more maintainable. diff --git a/modules/ROOT/partials/integrations/claude-code-admin.adoc b/modules/ROOT/partials/integrations/claude-code-admin.adoc index af82e65..0fc5f80 100644 --- a/modules/ROOT/partials/integrations/claude-code-admin.adoc +++ b/modules/ROOT/partials/integrations/claude-code-admin.adoc @@ -30,8 +30,8 @@ Claude Code connects to AI Gateway through two primary endpoints: The gateway handles: -. Authentication via bearer tokens in the `Authorization` header -. Gateway selection via the endpoint URL +. Authentication through bearer tokens in the `Authorization` header +. Gateway selection through the endpoint URL . Model routing using the `vendor/model_id` format . MCP server aggregation for multi-tool workflows . Request logging and cost tracking per gateway @@ -389,7 +389,7 @@ Track Claude Code activity through gateway observability features. |=== -=== Query logs via API +=== Query logs through API Programmatically access logs for integration with monitoring systems: diff --git a/modules/ROOT/partials/integrations/cline-admin.adoc b/modules/ROOT/partials/integrations/cline-admin.adoc index 991f97b..5a5fd37 100644 --- a/modules/ROOT/partials/integrations/cline-admin.adoc +++ b/modules/ROOT/partials/integrations/cline-admin.adoc @@ -42,7 +42,7 @@ Cline connects to AI Gateway through two primary endpoints: The gateway handles: -. Authentication via bearer tokens in the `Authorization` header +. Authentication through bearer tokens in the `Authorization` header . Model routing using the `vendor/model_id` format . MCP server aggregation for multi-tool workflows . Request logging and cost tracking per gateway @@ -326,9 +326,9 @@ Provide these instructions to users configuring Cline in VS Code. Users configure Cline's API provider and credentials through the Cline extension interface. -IMPORTANT: API provider configuration (API keys, base URLs, custom headers) is managed via Cline's extension global state, not VS Code `settings.json`. These settings are stored in the extension's internal state and must be configured through the Cline UI. +IMPORTANT: API provider configuration (API keys, base URLs, custom headers) is managed through Cline's extension global state, not VS Code `settings.json`. These settings are stored in the extension's internal state and must be configured through the Cline UI. -==== Configure via Cline UI +==== Configure through Cline UI . Open the Cline extension panel in VS Code . Click the settings icon or gear menu @@ -352,13 +352,13 @@ Configure Cline to connect to the aggregated MCP endpoint through the Cline UI o . Search for "Cline > Mcp: Mode" . Enable the MCP mode toggle -==== Configure MCP server via Cline UI +==== Configure MCP server through Cline UI . Open the Cline extension panel in VS Code . Navigate to MCP server settings . Add the Redpanda AI Gateway MCP server with the connection details -==== Configure via cline_mcp_settings.json +==== Configure through cline_mcp_settings.json Alternatively, edit `cline_mcp_settings.json` (located in the Cline extension storage directory): @@ -438,7 +438,7 @@ Cline autonomous operations may generate request sequences. Look for patterns to |=== -=== Query logs via API +=== Query logs through API Programmatically access logs for integration with monitoring systems: diff --git a/modules/ROOT/partials/integrations/continue-admin.adoc b/modules/ROOT/partials/integrations/continue-admin.adoc index ec2f631..64aee0d 100644 --- a/modules/ROOT/partials/integrations/continue-admin.adoc +++ b/modules/ROOT/partials/integrations/continue-admin.adoc @@ -29,8 +29,8 @@ Key characteristics: * Uses native provider formats (Anthropic format for Anthropic, OpenAI format for OpenAI) * Supports multiple LLM providers simultaneously with per-provider configuration -* Custom API endpoints via `apiBase` configuration -* Custom headers via `requestOptions.headers` +* Custom API endpoints through `apiBase` configuration +* Custom headers through `requestOptions.headers` * Built-in MCP support for tool discovery and execution * Autocomplete, chat, and inline edit modes @@ -44,7 +44,7 @@ Continue.dev connects to AI Gateway differently than unified-format clients: The gateway handles: -. Authentication via bearer tokens in the `Authorization` header +. Authentication through bearer tokens in the `Authorization` header . Provider-specific request formats without transformation . Model routing using provider-native model identifiers . MCP server aggregation for multi-tool workflows @@ -580,7 +580,7 @@ Continue.dev generates different request patterns: |=== -=== Query logs via API +=== Query logs through API Programmatically access logs for integration with monitoring systems: diff --git a/modules/ROOT/partials/integrations/cursor-admin.adoc b/modules/ROOT/partials/integrations/cursor-admin.adoc index 006df4d..0b5faae 100644 --- a/modules/ROOT/partials/integrations/cursor-admin.adoc +++ b/modules/ROOT/partials/integrations/cursor-admin.adoc @@ -32,7 +32,7 @@ Key characteristics: * Limited support for custom headers (makes multi-tenant deployments challenging) * Supports MCP protocol with a 40-tool limit * Built-in code completion and chat modes -* Configuration via settings file (`~/.cursor/config.json`) +* Configuration through settings file (`~/.cursor/config.json`) == Architecture overview @@ -43,10 +43,10 @@ Cursor IDE connects to AI Gateway through standardized endpoints: The gateway handles: -. Authentication via bearer tokens in the `Authorization` header -. Gateway selection via the endpoint URL +. Authentication through bearer tokens in the `Authorization` header +. Gateway selection through the endpoint URL . Model routing using vendor prefixes (for example, `anthropic/claude-sonnet-4.5`) -. Format transforms from OpenAI format to provider-native formats (for Anthropic, Google, etc.) +. Format transforms from OpenAI format to provider-native formats (for Anthropic, Google, and so on) . MCP server aggregation for multi-tool workflows . Request logging and cost tracking per gateway @@ -627,7 +627,7 @@ Cursor generates different request patterns: |Metric |Purpose |Request volume by provider -|Identify which providers are most used via model prefix routing +|Identify which providers are most used through model prefix routing |Token usage by model |Track consumption patterns (completion vs chat) @@ -646,7 +646,7 @@ Cursor generates different request patterns: |=== -=== Query logs via API +=== Query logs through API Programmatically access logs for integration with monitoring systems: diff --git a/modules/ROOT/partials/integrations/github-copilot-admin.adoc b/modules/ROOT/partials/integrations/github-copilot-admin.adoc index d55a4f1..2474fc1 100644 --- a/modules/ROOT/partials/integrations/github-copilot-admin.adoc +++ b/modules/ROOT/partials/integrations/github-copilot-admin.adoc @@ -32,7 +32,7 @@ Key characteristics: * Limited support for custom headers (similar to Cursor IDE) * Supports BYOK for Business/Enterprise subscriptions * Built-in code completion, chat, and inline editing modes -* Configuration via IDE settings or organization policies +* Configuration through IDE settings or organization policies * High request volume from code completion features == Architecture overview @@ -44,8 +44,8 @@ GitHub Copilot connects to AI Gateway through standardized endpoints: The gateway handles: -. Authentication via bearer tokens in the `Authorization` header -. Gateway selection via URL path routing or query parameters +. Authentication through bearer tokens in the `Authorization` header +. Gateway selection through URL path routing or query parameters . Model routing and aliasing for friendly names . Format transforms from OpenAI format to provider-native formats . Request logging and cost tracking per gateway @@ -621,7 +621,7 @@ GitHub Copilot generates distinct request patterns: |Metric |Purpose |Request volume by model -|Identify most-used models via aliases +|Identify most-used models through aliases |Token usage by model |Track consumption patterns (completion vs chat) @@ -643,7 +643,7 @@ GitHub Copilot generates distinct request patterns: |=== -=== Query logs via API +=== Query logs through API Programmatically access logs for integration with monitoring systems: diff --git a/modules/ROOT/partials/integrations/github-copilot-user.adoc b/modules/ROOT/partials/integrations/github-copilot-user.adoc index 16fc444..9487e6a 100644 --- a/modules/ROOT/partials/integrations/github-copilot-user.adoc +++ b/modules/ROOT/partials/integrations/github-copilot-user.adoc @@ -28,7 +28,7 @@ Before configuring GitHub Copilot, ensure you have: ** API key with access to the gateway * Your IDE: ** VS Code with GitHub Copilot extension installed -** Or JetBrains IDE (IntelliJ IDEA, PyCharm, etc.) with GitHub Copilot plugin +** Or JetBrains IDE (IntelliJ IDEA, PyCharm, and so on) with GitHub Copilot plugin == About GitHub Copilot and AI Gateway @@ -112,7 +112,7 @@ Replace `https://gw.ai.panda.com/v1` with your gateway endpoint. IMPORTANT: This experimental feature requires configuring API keys and custom headers through the Copilot Chat UI, not in `settings.json`. -==== Configure API key and headers via Copilot Chat UI +==== Configure API key and headers through Copilot Chat UI . Open Copilot Chat in VS Code (`Cmd+I` or `Ctrl+I`) . Click the model selector dropdown @@ -163,7 +163,7 @@ Add the base URL configuration in VS Code settings: Replace `https://gw.ai.panda.com/v1` with your gateway endpoint. -==== Configure API key and headers via Copilot Chat UI +==== Configure API key and headers through Copilot Chat UI IMPORTANT: Do not configure API keys or custom headers in `settings.json`. Use the Copilot Chat UI instead. @@ -195,7 +195,7 @@ JetBrains IDE integration requires GitHub Copilot Enterprise with Bring Your Own === Configure BYOK with AI Gateway -. Open your JetBrains IDE (IntelliJ IDEA, PyCharm, etc.) +. Open your JetBrains IDE (IntelliJ IDEA, PyCharm, and so on) . Navigate to *Settings/Preferences*: ** macOS: `Cmd+,` ** Windows/Linux: `Ctrl+Alt+S` @@ -320,7 +320,7 @@ For large organizations deploying GitHub Copilot Enterprise with AI Gateway acro === Centralized configuration management -Distribute IDE configuration files via: +Distribute IDE configuration files through: * **Git repository**: Store `settings.json` or IDE configuration in a shared repository * **Configuration management tools**: Puppet, Chef, Ansible @@ -372,13 +372,13 @@ Single key for all developers: === Automated provisioning workflow . Developer joins organization -. Identity system (Okta, Azure AD, etc.) triggers provisioning: +. Identity system (Okta, Azure AD, and so on) triggers provisioning: .. Create Redpanda API key .. Assign to appropriate gateway .. Generate IDE configuration file with embedded keys .. Distribute to developer workstation . Developer installs IDE and GitHub Copilot -. Configuration auto-applies (via MDM or configuration management) +. Configuration auto-applies (through MDM or configuration management) . Developer starts using Copilot immediately === Observability and governance diff --git a/modules/ROOT/partials/migration-guide.adoc b/modules/ROOT/partials/migration-guide.adoc index eb880a0..c86551a 100644 --- a/modules/ROOT/partials/migration-guide.adoc +++ b/modules/ROOT/partials/migration-guide.adoc @@ -189,7 +189,7 @@ response = client.messages.create( ---- -After (Gateway via OpenAI-compatible wrapper) +After (Gateway through OpenAI-compatible wrapper) Because AI Gateway provides an OpenAI-compatible endpoint, we recommend migrating Anthropic SDK usage to OpenAI SDK for consistency: @@ -264,7 +264,7 @@ else: ---- -After (Unified via Gateway) +After (Unified through Gateway) [source,python] ---- @@ -403,8 +403,7 @@ Option D: Feature flag service (recommended) [source,python] ---- -# LaunchDarkly, Split.io, etc. -use_gateway = feature_flags.is_enabled("ai-gateway", user_context) +# LaunchDarkly, Split.io, and so onuse_gateway = feature_flags.is_enabled("ai-gateway", user_context) ---- @@ -869,7 +868,7 @@ A/B testing * Test new models without code changes * Compare quality/cost/latency -* Gradual rollout via routing policies +* Gradual rollout through routing policies == Next steps diff --git a/modules/ROOT/partials/observability-logs.adoc b/modules/ROOT/partials/observability-logs.adoc index 8c10e8f..62aad40 100644 --- a/modules/ROOT/partials/observability-logs.adoc +++ b/modules/ROOT/partials/observability-logs.adoc @@ -374,7 +374,7 @@ Shows: * Full request headers * Full request body (formatted JSON) -* All parameters (temperature, max_tokens, etc.) +* All parameters (temperature, max_tokens, and so on) * Custom headers used for routing Example: @@ -506,8 +506,7 @@ Token Generation Rate: 71 tokens/second 3. Check error message: - * Gateway error: Issue with configuration, rate limits, etc. - * Provider error: Issue with upstream API (OpenAI, Anthropic, etc.) + * Gateway error: Issue with configuration, rate limits, and so on * Provider error: Issue with upstream API (OpenAI, Anthropic, and so on) 4. Check routing: * Was fallback triggered? (May indicate primary provider issue) @@ -619,7 +618,7 @@ Use case: Chargeback/showback to customers // PLACEHOLDER: Confirm log retention policy -Retention period: // PLACEHOLDER: e.g., 30 days, 90 days, configurable +Retention period: // PLACEHOLDER: for example, 30 days, 90 days, configurable After retention period: @@ -642,7 +641,7 @@ Export logs (if needed for longer retention): 2. Click "Export to CSV" 3. Download includes all filtered logs with full fields -=== Export via API +=== Export through API // PLACEHOLDER: If API is available for log export @@ -693,7 +692,7 @@ AI Gateway does not log (if applicable): If redaction is supported: * Configure redaction rules for specific fields -* Mask PII (email addresses, phone numbers, etc.) +* Mask PII (email addresses, phone numbers, and so on) * Redact custom header values Example: diff --git a/modules/ROOT/partials/observability-metrics.adoc b/modules/ROOT/partials/observability-metrics.adoc index 371b3b2..a196ca5 100644 --- a/modules/ROOT/partials/observability-metrics.adoc +++ b/modules/ROOT/partials/observability-metrics.adoc @@ -121,7 +121,7 @@ Breakdowns: * By gateway (for chargeback/showback) * By model (for cost optimization) * By provider (for negotiation leverage) -* By custom header (if configured, e.g., `x-customer-id`) +* By custom header (if configured, for example, `x-customer-id`) Use cases: @@ -165,7 +165,7 @@ Breakdowns: Use cases: * Identify slow models or providers -* Set SLO targets (e.g., "p95 < 2 seconds") +* Set SLO targets (for example, "p95 < 2 seconds") * Detect performance regressions Example insights: @@ -189,7 +189,7 @@ What it shows: Percentage of failed requests over time Metrics: * Total error rate (%) -* Errors by status code (400, 401, 429, 500, etc.) +* Errors by status code (400, 401, 429, 500, and so on) * Errors by model * Errors by provider @@ -207,7 +207,7 @@ Breakdowns: Use cases: * Detect provider outages -* Identify configuration issues (e.g., model not enabled) +* Identify configuration issues (for example, model not enabled) * Monitor rate limit breaches Example insights: @@ -226,7 +226,7 @@ Target: Typically 99%+ for production workloads Use cases: * Monitor overall health -* Set up alerts (e.g., "Alert if success rate < 95%") +* Set up alerts (for example, "Alert if success rate < 95%") === Fallback rate @@ -360,7 +360,7 @@ Widgets: * Spend by gateway (stacked bar chart) * Spend by model (pie chart) * Spend by provider (pie chart) -* Spend by custom dimension (if configured, e.g., customer ID) +* Spend by custom dimension (if configured, for example, customer ID) * Spend trend (time series with forecast) * Budget utilization (progress bar: $X / $Y monthly limit) @@ -507,7 +507,7 @@ alerts: Use case: Import into spreadsheet for analysis, reporting -=== Export via API +=== Export through API // PLACEHOLDER: If API is available for metrics @@ -545,7 +545,7 @@ Response: Supported integrations: * *Prometheus*: Native metrics endpoint on port 9090 at `/metrics` -* *OpenTelemetry*: Traces exported to Redpanda topics via the OpenTelemetry exporter +* *OpenTelemetry*: Traces exported to Redpanda topics through the OpenTelemetry exporter == Common analysis tasks @@ -630,7 +630,7 @@ Decision: If mini's error rate is acceptable, save 10x on costs === "Why did costs spike yesterday?" 1. View cost trend graph -2. Identify spike (e.g., Jan 10th: $500 vs usual $100) +2. Identify spike (for example, Jan 10th: $500 vs usual $100) 3. Drill down: * By gateway: Which gateway caused the spike? * By model: Did someone switch to expensive model? @@ -807,7 +807,7 @@ Track trends, not point-in-time * Day-to-day variance is normal * Look for week-over-week and month-over-month trends -* Seasonal patterns (e.g., more usage on weekdays) +* Seasonal patterns (for example, more usage on weekdays) == Troubleshoot metrics issues diff --git a/modules/agents/pages/a2a-concepts.adoc b/modules/agents/pages/a2a-concepts.adoc index 5c96f76..cc5b428 100644 --- a/modules/agents/pages/a2a-concepts.adoc +++ b/modules/agents/pages/a2a-concepts.adoc @@ -44,7 +44,7 @@ For example, if your agent URL is `\https://my-agent.ai-agents.abc123.cloud.redp The `.well-known` path follows internet standards for service discovery, making agents discoverable without configuration. -To configure the agent card, see xref:create-agent.adoc#configure-a2a-discovery-metadata-optional[Configure A2A discovery metadata]. +To configure the agent card, see xref:agents:create-agent.adoc#configure-a2a-discovery-metadata-optional[Configure A2A discovery metadata]. == Where A2A is used in ADP @@ -54,7 +54,7 @@ ADP uses the A2A protocol in two contexts: External applications and agents hosted outside ADP use A2A to call ADP agents. This includes backend services, CLI tools, custom UIs, and agents hosted on other platforms. -For integration pattern guidance, see xref:integration-overview.adoc[]. +For integration pattern guidance, see xref:agents:integration-overview.adoc[]. === Internal pipeline-to-agent integration @@ -64,7 +64,7 @@ Redpanda Connect pipelines use the xref:redpanda-cloud:develop:connect/component * Streaming data enrichment with AI-generated fields. * Event-driven agent invocation for automated processing. -The `a2a_message` processor uses the A2A protocol internally to discover and call agents. For pipeline patterns, see xref:pipeline-integration-patterns.adoc[]. +The `a2a_message` processor uses the A2A protocol internally to discover and call agents. For pipeline patterns, see xref:agents:pipeline-integration-patterns.adoc[]. == How agents discover each other @@ -90,7 +90,7 @@ ADP agents use OAuth2 client credentials flow. When you create an agent, the sys External callers use these credentials to obtain access tokens: . Agent creation automatically provisions a service account with credentials. -. Applications exchange the client ID and secret for a time-limited access token via OAuth2. +. Applications exchange the client ID and secret for a time-limited access token through OAuth2. . Applications include the access token in the Authorization header when calling the agent endpoint. . When tokens expire, applications exchange credentials again for a new token. @@ -116,5 +116,5 @@ The A2A protocol uses semantic versioning (major.minor.patch). Agents declare th == Next steps -* xref:integration-overview.adoc[] -* xref:create-agent.adoc[] +* xref:agents:integration-overview.adoc[] +* xref:agents:create-agent.adoc[] diff --git a/modules/agents/pages/architecture-patterns.adoc b/modules/agents/pages/architecture-patterns.adoc index 02366e5..e36adba 100644 --- a/modules/agents/pages/architecture-patterns.adoc +++ b/modules/agents/pages/architecture-patterns.adoc @@ -59,7 +59,7 @@ Every architecture pattern involves trade-offs. - *Complexity now versus complexity later:* Starting simple means faster initial development but may require refactoring. Starting structured requires more upfront work but makes the system easier to extend. -For foundational concepts on how agents execute and manage complexity, see xref:concepts.adoc[]. +For foundational concepts on how agents execute and manage complexity, see xref:agents:concepts.adoc[]. == Single-agent pattern @@ -117,7 +117,7 @@ Use external glossterm:Agent2Agent (A2A) protocol[] for multi-organization workf === How it works -Agents communicate using the xref:a2a-concepts.adoc[A2A protocol], a standard HTTP-based protocol for discovery and invocation. Each agent manages its own credentials and access control independently, and can deploy, scale, and update without coordinating with other agents. Agent cards define capabilities without exposing implementation details. +Agents communicate using the xref:agents:a2a-concepts.adoc[A2A protocol], a standard HTTP-based protocol for discovery and invocation. Each agent manages its own credentials and access control independently, and can deploy, scale, and update without coordinating with other agents. Agent cards define capabilities without exposing implementation details. === Example: Multi-platform customer service @@ -137,7 +137,7 @@ External A2A lets different teams own and deploy their agents independently, wit External A2A adds network latency on every cross-agent call, and authentication complexity multiplies with each agent requiring credential management. Removing capabilities or changing contracts requires coordination across consuming systems, and debugging requires tracing requests across organizational boundaries. -For implementation details on external A2A integration, see xref:integration-overview.adoc[]. +For implementation details on external A2A integration, see xref:agents:integration-overview.adoc[]. == Common anti-patterns @@ -255,8 +255,8 @@ Provide clear error messages to users. Log errors for debugging. == Next steps -* xref:integration-overview.adoc[] -* xref:a2a-concepts.adoc[] +* xref:agents:integration-overview.adoc[] +* xref:agents:a2a-concepts.adoc[] * xref:mcp:overview.adoc[] -* xref:overview.adoc[] +* xref:agents:overview.adoc[] * xref:mcp:overview.adoc[] diff --git a/modules/agents/pages/byoa-register.adoc b/modules/agents/pages/byoa-register.adoc index 424871a..81a2dd3 100644 --- a/modules/agents/pages/byoa-register.adoc +++ b/modules/agents/pages/byoa-register.adoc @@ -31,7 +31,7 @@ The two models differ in who runs the agent, who owns scaling, and how the agent |Question |Choose BYOA when… |Choose a managed agent when… |Where does your agent run? -|You have an existing runtime (LangGraph, custom Go, etc.) you want to keep. +|You have an existing runtime (LangGraph, custom Go, and so on) you want to keep. |You want Redpanda to host and operate the agent runtime for you. |How is the agent defined? @@ -133,7 +133,7 @@ Conceptually, registration captures these pieces of data: |*Agent endpoint URL* |The HTTPS base URL where your agent's `/.well-known/agent-card.json` lives. AI Gateway uses this to fetch the agent card when callers reference your agent by registered name. -// TODO: confirm whether the registration message carries the agent endpoint URL directly, or requires it via the agent card only. The proto will resolve this when the BYOA arm lands. +// TODO: confirm whether the registration message carries the agent endpoint URL directly, or requires it through the agent card only. The proto will resolve this when the BYOA arm lands. |*Agent type / variant* |Set to the BYOA arm of `AgentCreate.agent_type`. diff --git a/modules/agents/pages/concepts.adoc b/modules/agents/pages/concepts.adoc index 0acd428..756b1e8 100644 --- a/modules/agents/pages/concepts.adoc +++ b/modules/agents/pages/concepts.adoc @@ -154,7 +154,7 @@ include::ROOT:partial$service-account-authorization.adoc[] == Next steps -* xref:architecture-patterns.adoc[] -* xref:quickstart.adoc[] -* xref:system-prompts.adoc[] +* xref:agents:architecture-patterns.adoc[] +* xref:agents:quickstart.adoc[] +* xref:agents:system-prompts.adoc[] * xref:mcp:overview.adoc[] diff --git a/modules/agents/pages/create-agent.adoc b/modules/agents/pages/create-agent.adoc index 674dbca..aceb6c5 100644 --- a/modules/agents/pages/create-agent.adoc +++ b/modules/agents/pages/create-agent.adoc @@ -19,7 +19,7 @@ After reading this page, you will be able to: * An ADP BYOC environment. * xref:ai-gateway:gateway-quickstart.adoc[AI Gateway configured] with at least one LLM provider enabled. * At least one xref:mcp:overview.adoc[Remote MCP server] deployed with tools. -* System prompt prepared (see xref:system-prompts.adoc[System Prompt Best Practices]). +* System prompt prepared (see xref:agents:system-prompts.adoc[System Prompt Best Practices]). == Access the agents UI @@ -85,7 +85,7 @@ For detailed model specifications and pricing: * link:https://docs.anthropic.com/claude/docs/models-overview[Anthropic Claude Models^] * link:https://ai.google.dev/gemini-api/docs/models[Google Gemini Models^] -For model selection based on architecture patterns, see xref:architecture-patterns.adoc#model-selection-guide[Model selection guide]. +For model selection based on architecture patterns, see xref:agents:architecture-patterns.adoc#model-selection-guide[Model selection guide]. == Write the system prompt @@ -120,7 +120,7 @@ Response format: - [Format guideline] ---- -For complete prompt guidelines, see xref:system-prompts.adoc[System Prompt Best Practices]. +For complete prompt guidelines, see xref:agents:system-prompts.adoc[System Prompt Best Practices]. == Add MCP servers and select tools @@ -152,11 +152,11 @@ Subagents are internal specialists within a single agent. Each subagent can have The root agent orchestrates and delegates work to appropriate subagents based on the request. -For multi-agent design patterns, see xref:architecture-patterns.adoc[Agent Architecture Patterns]. +For multi-agent design patterns, see xref:agents:architecture-patterns.adoc[Agent Architecture Patterns]. === Set max iterations -Max iterations determine how many reasoning loops the agent can perform before stopping. Each iteration consumes tokens and adds latency. For detailed cost calculations and the cost/capability/latency trade-off, see xref:concepts.adoc[]. +Max iterations determine how many reasoning loops the agent can perform before stopping. Each iteration consumes tokens and adds latency. For detailed cost calculations and the cost/capability/latency trade-off, see xref:agents:concepts.adoc[]. In the *Execution Settings* section, configure *Max Iterations* (range: 10-100, default: 30). @@ -201,7 +201,7 @@ Skills describe what your agent can do for capability-based discovery. External . Click *Save Changes*. -The updated metadata appears immediately at `\https://your-agent-url/.well-known/agent-card.json`. For more about what these fields mean and how they're used, see xref:a2a-concepts.adoc#agent-cards[Agent cards]. +The updated metadata appears immediately at `\https://your-agent-url/.well-known/agent-card.json`. For more about what these fields mean and how they're used, see xref:agents:a2a-concepts.adoc#agent-cards[Agent cards]. === Review and create @@ -211,7 +211,7 @@ The updated metadata appears immediately at `\https://your-agent-url/.well-known + A service account is automatically created to authenticate your agent with ADP resources. You can customize the default name (3-128 characters, cannot contain `<` or `>` characters). + -For details about default permissions and how to manage service accounts, see xref:concepts.adoc#service-account-authorization[Service account authorization]. +For details about default permissions and how to manage service accounts, see xref:agents:concepts.adoc#service-account-authorization[Service account authorization]. . Click *Create Agent*. @@ -223,7 +223,7 @@ You can use this URL to call your agent programmatically or integrate it with ex The *Inspector* tab in ADP automatically uses this URL to connect to your agent for testing. -For programmatic access or external agent integration, see xref:integration-overview.adoc[]. +For programmatic access or external agent integration, see xref:agents:integration-overview.adoc[]. == Test your agent @@ -237,7 +237,7 @@ For programmatic access or external agent integration, see xref:integration-over . Iterate on the system prompt or tool selection as needed. -For detailed testing strategies, see xref:monitor.adoc[]. +For detailed testing strategies, see xref:agents:monitor.adoc[]. == Example configurations @@ -267,8 +267,8 @@ Here are example configurations for different agent types: == Next steps -* xref:integration-overview.adoc[] -* xref:system-prompts.adoc[] +* xref:agents:integration-overview.adoc[] +* xref:agents:system-prompts.adoc[] * xref:mcp:create-server.adoc[] -* xref:architecture-patterns.adoc[] +* xref:agents:architecture-patterns.adoc[] * xref:troubleshoot/troubleshoot-ai-agents.adoc[] diff --git a/modules/agents/pages/integration-overview.adoc b/modules/agents/pages/integration-overview.adoc index d522374..5e78b66 100644 --- a/modules/agents/pages/integration-overview.adoc +++ b/modules/agents/pages/integration-overview.adoc @@ -30,12 +30,12 @@ ADP supports three primary integration scenarios based on who initiates the call | Pipeline processes events | Your Redpanda Connect pipeline invokes agents for each event in a stream using the `a2a_message` processor | Event-driven, automated, high-volume stream processing -| xref:pipeline-integration-patterns.adoc[] +| xref:agents:pipeline-integration-patterns.adoc[] | External system calls agent | Your application or agent (hosted outside ADP) calls ADP agents using the A2A protocol | Backend services, CLI tools, custom UIs, multi-platform agent workflows -| xref:a2a-concepts.adoc[] +| xref:agents:a2a-concepts.adoc[] |=== == Common use cases by pattern @@ -64,7 +64,7 @@ The pipeline controls when agents execute. This pattern is ideal for automated, Common scenarios include real-time fraud detection, sentiment scoring for customer reviews, and content moderation that classifies and routes content. -For implementation details, see xref:pipeline-integration-patterns.adoc[]. +For implementation details, see xref:agents:pipeline-integration-patterns.adoc[]. === External system calls agent @@ -74,7 +74,7 @@ External systems send requests using the A2A protocol and receive responses sync Common scenarios include backend services analyzing data as part of workflows, CLI tools invoking agents for batch tasks, custom UIs displaying agent responses, CRM agents coordinating with Redpanda agents, and multi-platform workflows spanning different infrastructure. -To learn how the A2A protocol enables this integration, see xref:a2a-concepts.adoc[]. +To learn how the A2A protocol enables this integration, see xref:agents:a2a-concepts.adoc[]. == Pattern comparison @@ -123,6 +123,6 @@ Access tokens grant full access to the agent. Anyone with a valid token can send == Next steps -* xref:a2a-concepts.adoc[] +* xref:agents:a2a-concepts.adoc[] * xref:mcp:overview.adoc[] -* xref:pipeline-integration-patterns.adoc[] +* xref:agents:pipeline-integration-patterns.adoc[] diff --git a/modules/agents/pages/monitor.adoc b/modules/agents/pages/monitor.adoc index 6e7c727..5ebd001 100644 --- a/modules/agents/pages/monitor.adoc +++ b/modules/agents/pages/monitor.adoc @@ -18,7 +18,7 @@ For conceptual background on traces and observability, see xref:observability:co == Prerequisites -You must have a running agent. If you do not have one, see xref:quickstart.adoc[]. +You must have a running agent. If you do not have one, see xref:agents:quickstart.adoc[]. == Debug agent execution with Transcripts @@ -71,7 +71,7 @@ Cost = (input_tokens x input_price) + (output_tokens x output_price) Example: GPT-5.2 with 4,302 input tokens and 1,340 output tokens at $0.00000175 per input token and $0.000014 per output token costs $0.026 per request. -For cost optimization strategies, see xref:concepts.adoc#cost-calculation[Cost calculation]. +For cost optimization strategies, see xref:agents:concepts.adoc#cost-calculation[Cost calculation]. == Test agent behavior with Inspector @@ -96,4 +96,4 @@ Monitor iteration counts during complex requests to ensure they complete within * xref:observability:concepts.adoc[] * xref:troubleshoot/troubleshoot-ai-agents.adoc[] -* xref:concepts.adoc[] +* xref:agents:concepts.adoc[] diff --git a/modules/agents/pages/overview.adoc b/modules/agents/pages/overview.adoc index 372e29a..8212ea8 100644 --- a/modules/agents/pages/overview.adoc +++ b/modules/agents/pages/overview.adoc @@ -32,7 +32,7 @@ Agents can invoke Redpanda Connect components as tools on-demand. Redpanda Conne When a user makes a request, the LLM receives the system prompt and context, decides which tools to invoke, and processes the results. This cycle repeats until the task completes. -For a deeper understanding of how agents execute, manage context, and maintain state, see xref:concepts.adoc[]. +For a deeper understanding of how agents execute, manage context, and maintain state, see xref:agents:concepts.adoc[]. == Key benefits @@ -58,13 +58,13 @@ Process every event with AI reasoning at scale. Invoke agents automatically from * Agents are available only on ADP BYOC environments * MCP servers must be hosted in ADP -// TODO(review-before-publish): forward-looking framing — "is not currently supported" implies it's coming. Confirm whether to keep this limitation note as-is, drop "currently" to make it a static limitation, or remove the bullet entirely if the workaround (internal subagents) is the durable answer. +// TODO(review-before-publish): forward-looking framing: "is not currently supported" implies it's coming. Confirm whether to keep this limitation note as-is, drop "currently" to make it a static limitation, or remove the bullet entirely if the workaround (internal subagents) is the durable answer. * Cross-agent calling between separate agents is not currently supported (use internal subagents for delegation within a single agent) == Next steps -* xref:quickstart.adoc[] -* xref:concepts.adoc[] -* xref:architecture-patterns.adoc[] -* xref:integration-overview.adoc[] -* xref:create-agent.adoc[] +* xref:agents:quickstart.adoc[] +* xref:agents:concepts.adoc[] +* xref:agents:architecture-patterns.adoc[] +* xref:agents:integration-overview.adoc[] +* xref:agents:create-agent.adoc[] diff --git a/modules/agents/pages/pipeline-integration-patterns.adoc b/modules/agents/pages/pipeline-integration-patterns.adoc index ed26b57..4831dd0 100644 --- a/modules/agents/pages/pipeline-integration-patterns.adoc +++ b/modules/agents/pages/pipeline-integration-patterns.adoc @@ -14,11 +14,11 @@ After reading this page, you will be able to: * [ ] {learning-objective-2} * [ ] {learning-objective-3} -This page focuses on pipelines calling agents (pipeline-initiated integration). For agents invoking MCP tools, see xref:integration-overview.adoc#agent-needs-capabilities[Agent needs capabilities]. For external applications calling agents, see xref:integration-overview.adoc#external-system-calls-agent[External system calls agent]. +This page focuses on pipelines calling agents (pipeline-initiated integration). For agents invoking MCP tools, see xref:agents:integration-overview.adoc#agent-needs-capabilities[Agent needs capabilities]. For external applications calling agents, see xref:agents:integration-overview.adoc#external-system-calls-agent[External system calls agent]. == How pipelines invoke agents -Pipelines use the xref:redpanda-cloud:develop:connect/components/processors/a2a_message.adoc[`a2a_message`] processor to invoke agents for each event in a stream. The processor uses the xref:a2a-concepts.adoc[A2A protocol] to discover and communicate with agents. +Pipelines use the xref:redpanda-cloud:develop:connect/components/processors/a2a_message.adoc[`a2a_message`] processor to invoke agents for each event in a stream. The processor uses the xref:agents:a2a-concepts.adoc[A2A protocol] to discover and communicate with agents. When the `a2a_message` processor receives an event, it sends the event data to the specified agent along with any prompt you provide. The agent processes the event using its reasoning capabilities and returns a response. The processor then adds the agent's response to the event for further processing or output. @@ -49,7 +49,7 @@ Invoke agents automatically for each event: include::ROOT:example$pipelines/event-driven-invocation.yaml[] ---- -Replace `AGENT_CARD_URL` with your actual agent card URL. See xref:a2a-concepts.adoc#agent-card-location[Agent card location]. +Replace `AGENT_CARD_URL` with your actual agent card URL. See xref:agents:a2a-concepts.adoc#agent-card-location[Agent card location]. **Use case:** Real-time fraud detection on every transaction. @@ -62,7 +62,7 @@ Add AI-generated metadata to events: include::ROOT:example$pipelines/streaming-enrichment.yaml[tag=processors,indent=0] ---- -Replace `AGENT_CARD_URL` with your actual agent card URL. See xref:a2a-concepts.adoc#agent-card-location[Agent card location]. +Replace `AGENT_CARD_URL` with your actual agent card URL. See xref:agents:a2a-concepts.adoc#agent-card-location[Agent card location]. **Use case:** Add sentiment scores to every customer review in real-time. @@ -75,7 +75,7 @@ Process events in the background: include::ROOT:example$pipelines/async-workflows.yaml[tag=pipeline,indent=0] ---- -Replace `AGENT_CARD_URL` with your actual agent card URL. See xref:a2a-concepts.adoc#agent-card-location[Agent card location]. +Replace `AGENT_CARD_URL` with your actual agent card URL. See xref:agents:a2a-concepts.adoc#agent-card-location[Agent card location]. **Use case:** Nightly batch summarization of reports where latency is acceptable. @@ -88,7 +88,7 @@ Chain multiple agents in sequence: include::ROOT:example$pipelines/multi-agent-orchestration.yaml[tag=processors,indent=0] ---- -Replace the agent URL variables with your actual agent card URLs. See xref:a2a-concepts.adoc#agent-card-location[Agent card location]. +Replace the agent URL variables with your actual agent card URLs. See xref:agents:a2a-concepts.adoc#agent-card-location[Agent card location]. **Use case:** Translate feedback, analyze sentiment, then route to appropriate team. @@ -101,7 +101,7 @@ Use agent reasoning for complex transformations: include::ROOT:example$pipelines/agent-transformation.yaml[tag=processors,indent=0] ---- -Replace `AGENT_CARD_URL` with your actual agent card URL. See xref:a2a-concepts.adoc#agent-card-location[Agent card location]. +Replace `AGENT_CARD_URL` with your actual agent card URL. See xref:agents:a2a-concepts.adoc#agent-card-location[Agent card location]. **Use case:** Convert natural language queries to SQL for downstream processing. @@ -113,7 +113,7 @@ Do not use the `a2a_message` processor when: * The transformation is simple and does not require AI reasoning. * Agents need to dynamically decide what data to fetch based on context. -For a detailed comparison between pipeline-initiated and agent-initiated integration patterns, see xref:integration-overview.adoc#pattern-comparison[Pattern comparison]. +For a detailed comparison between pipeline-initiated and agent-initiated integration patterns, see xref:agents:integration-overview.adoc#pattern-comparison[Pattern comparison]. == Example: Real-time fraud detection @@ -126,7 +126,7 @@ This example shows a complete pipeline that analyzes every transaction with an a include::ROOT:example$pipelines/fraud-detection-routing.yaml[] ---- -Replace `AGENT_CARD_URL` with your agent card URL. See xref:a2a-concepts.adoc#agent-card-location[Agent card location]. +Replace `AGENT_CARD_URL` with your agent card URL. See xref:agents:a2a-concepts.adoc#agent-card-location[Agent card location]. This pipeline: @@ -138,6 +138,6 @@ This pipeline: == Next steps * xref:mcp:overview.adoc[] -* xref:integration-overview.adoc[] -* xref:a2a-concepts.adoc[] +* xref:agents:integration-overview.adoc[] +* xref:agents:a2a-concepts.adoc[] * xref:redpanda-cloud:develop:connect/components/processors/about.adoc[] diff --git a/modules/agents/pages/quickstart.adoc b/modules/agents/pages/quickstart.adoc index 4255ff7..88b498a 100644 --- a/modules/agents/pages/quickstart.adoc +++ b/modules/agents/pages/quickstart.adoc @@ -98,7 +98,7 @@ Response format: . Review your configuration and click *Create Agent*. + -TIP: A service account is automatically created to authenticate your agent with ADP resources. For details about default permissions and how to manage service accounts, see xref:concepts.adoc#service-account-authorization[Service account authorization]. +TIP: A service account is automatically created to authenticate your agent with ADP resources. For details about default permissions and how to manage service accounts, see xref:agents:concepts.adoc#service-account-authorization[Service account authorization]. . Wait for the agent status to change from *Starting* to *Running*. @@ -177,8 +177,8 @@ Common quickstart issue: == Next steps -* xref:overview.adoc[] -* xref:create-agent.adoc[] -* xref:system-prompts.adoc[] -* xref:architecture-patterns.adoc[] +* xref:agents:overview.adoc[] +* xref:agents:create-agent.adoc[] +* xref:agents:system-prompts.adoc[] +* xref:agents:architecture-patterns.adoc[] * xref:mcp:overview.adoc[] diff --git a/modules/agents/pages/system-prompts.adoc b/modules/agents/pages/system-prompts.adoc index 9f1a9df..0656f8d 100644 --- a/modules/agents/pages/system-prompts.adoc +++ b/modules/agents/pages/system-prompts.adoc @@ -184,7 +184,7 @@ Design prompts to recognize escalation conditions: When you cannot complete the task: 1. Explain what you tried and why it didn't work 2. Tell the user what information or capability is missing -3. Suggest how they can help (provide more details, contact support, etc.) +3. Suggest how they can help (provide more details, contact support, and so on) ---- === Common error scenarios @@ -290,7 +290,7 @@ Guide agents to: * Avoid redundant tool calls (check context before calling) * Stop when the task completes (don't continue exploring) -For cost management strategies including iteration limits and monitoring, see xref:concepts.adoc[]. +For cost management strategies including iteration limits and monitoring, see xref:agents:concepts.adoc[]. == Example: System prompt with all best practices @@ -419,6 +419,6 @@ Decision criteria enable reliable tool selection based on request context. == Next steps -* xref:quickstart.adoc[] -* xref:overview.adoc[] +* xref:agents:quickstart.adoc[] +* xref:agents:overview.adoc[] * xref:mcp:overview.adoc[] diff --git a/modules/agents/pages/troubleshoot/troubleshoot-ai-agents.adoc b/modules/agents/pages/troubleshoot/troubleshoot-ai-agents.adoc index 7235c78..06d7c3a 100644 --- a/modules/agents/pages/troubleshoot/troubleshoot-ai-agents.adoc +++ b/modules/agents/pages/troubleshoot/troubleshoot-ai-agents.adoc @@ -67,7 +67,7 @@ NEVER respond about order status without calling the tool first. **Prevention:** * Write explicit tool selection criteria in system prompts -* Test agents with the xref:system-prompts.adoc#evaluation-and-testing[systematic testing approach] +* Test agents with the xref:agents:system-prompts.adoc#evaluation-and-testing[systematic testing approach] * Use models appropriate for your task complexity === Calling wrong tools @@ -144,7 +144,7 @@ If a tool fails after 2 attempts: **Prevention:** * Design tools to return complete information in one call -* Set max iterations appropriate for task complexity (see xref:concepts.adoc#why-iterations-matter[Why iterations matter]) +* Set max iterations appropriate for task complexity (see xref:agents:concepts.adoc#why-iterations-matter[Why iterations matter]) * Test with ambiguous requests that might cause loops === Making up information @@ -274,7 +274,7 @@ Efficiency guidelines: * Set appropriate max iterations (10-20 for simple, 30-40 for complex) * Design tools to return minimal necessary data * Monitor token usage trends -* See cost calculation guidance in xref:concepts.adoc#cost-calculation[Cost calculation] +* See cost calculation guidance in xref:agents:concepts.adoc#cost-calculation[Cost calculation] == Tool execution issues @@ -438,7 +438,7 @@ The agent card is always available at `/.well-known/agent-card.json` according t * Always append `/.well-known/agent-card.json` to the agent endpoint URL * Test the agent card URL in a browser before using it in pipeline configuration -* See xref:a2a-concepts.adoc#agent-card-location[Agent card location] for details +* See xref:agents:a2a-concepts.adoc#agent-card-location[Agent card location] for details === Pipeline integration failures @@ -472,15 +472,15 @@ processors: * Test pipeline-agent integration with low volume first * Size agent resources appropriately for event rate -* See integration patterns in xref:pipeline-integration-patterns.adoc[] +* See integration patterns in xref:agents:pipeline-integration-patterns.adoc[] == Monitor and debug agents -For comprehensive guidance on monitoring agent activity, analyzing conversation history, tracking token usage, and debugging issues, see xref:monitor.adoc[]. +For comprehensive guidance on monitoring agent activity, analyzing conversation history, tracking token usage, and debugging issues, see xref:agents:monitor.adoc[]. == Next steps -* xref:system-prompts.adoc[] -* xref:concepts.adoc[] +* xref:agents:system-prompts.adoc[] +* xref:agents:concepts.adoc[] * xref:mcp:overview.adoc[] -* xref:architecture-patterns.adoc[] +* xref:agents:architecture-patterns.adoc[] diff --git a/modules/agents/pages/tutorials/customer-support-agent.adoc b/modules/agents/pages/tutorials/customer-support-agent.adoc index 3d4a043..7d645ee 100644 --- a/modules/agents/pages/tutorials/customer-support-agent.adoc +++ b/modules/agents/pages/tutorials/customer-support-agent.adoc @@ -268,6 +268,6 @@ Use these documented test IDs when testing the agent. If you replace the mock to == Next steps * xref:mcp:overview.adoc[Call external APIs] -* xref:system-prompts.adoc[] -* xref:architecture-patterns.adoc[] +* xref:agents:system-prompts.adoc[] +* xref:agents:architecture-patterns.adoc[] * xref:troubleshoot/troubleshoot-ai-agents.adoc[] diff --git a/modules/agents/pages/tutorials/transaction-dispute-resolution.adoc b/modules/agents/pages/tutorials/transaction-dispute-resolution.adoc index 88dca9f..d10440e 100644 --- a/modules/agents/pages/tutorials/transaction-dispute-resolution.adoc +++ b/modules/agents/pages/tutorials/transaction-dispute-resolution.adoc @@ -510,7 +510,7 @@ This pipeline: * Consumes transactions from `bank.transactions` topic * Filters for high-value transactions (>$500) or pre-flagged transactions * Calculates preliminary risk score based on location, amount, velocity, and category -* Routes transactions with risk score ≥40 to the dispute-resolution-agent via A2A +* Routes transactions with risk score ≥40 to the dispute-resolution-agent through A2A * Outputs investigation results to `bank.dispute_results` topic === Test the pipeline @@ -656,8 +656,8 @@ For production deployments, replace the mock tools with API calls to your accoun == Next steps -* xref:architecture-patterns.adoc[] -* xref:integration-overview.adoc[] -* xref:pipeline-integration-patterns.adoc[] -* xref:monitor.adoc[] +* xref:agents:architecture-patterns.adoc[] +* xref:agents:integration-overview.adoc[] +* xref:agents:pipeline-integration-patterns.adoc[] +* xref:agents:monitor.adoc[] * xref:mcp:overview.adoc[] diff --git a/modules/ai-gateway/pages/admin/setup-guide.adoc b/modules/ai-gateway/pages/admin/setup-guide.adoc index 35e1913..bc14752 100644 --- a/modules/ai-gateway/pages/admin/setup-guide.adoc +++ b/modules/ai-gateway/pages/admin/setup-guide.adoc @@ -22,7 +22,7 @@ After completing this guide, you will be able to: == Enable a provider -Providers represent upstream services (Anthropic, OpenAI, Google AI) and associated credentials. Providers are disabled by default and must be enabled explicitly by an administrator. +Providers represent upstream services (Anthropic, OpenAI, Google AI) and associated credentials. Providers default to disabled; an administrator must explicitly enable each one. . In ADP, navigate to *Agentic* → *AI Gateway* → *Providers*. . Select a provider (for example, Anthropic). @@ -91,7 +91,7 @@ Before creating a gateway, decide which mode fits your needs. You can start with AI Hub mode and later eject to Custom mode if you need more control. Ejection is a one-way transition. See xref:admin/eject-to-custom-mode.adoc[]. ==== -For detailed comparison, see xref:gateway-modes.adoc[]. +For detailed comparison, see xref:ai-gateway:gateway-modes.adoc[]. *Next sections:* @@ -186,7 +186,7 @@ Provider pools define which LLM providers handle requests, with support for prim * *Name*: For example, `primary-anthropic` * *Providers*: Select one or more providers (for example, Anthropic) * *Models*: Choose which models to include (for example, `anthropic/claude-sonnet-4.5`) -* *Load balancing*: If multiple providers are selected, choose distribution strategy (round-robin, weighted, etc.) +* *Load balancing*: If multiple providers are selected, choose distribution strategy (round-robin, weighted, and so on) -- . (Optional) Click *Add fallback pool* to configure automatic failover: @@ -205,7 +205,7 @@ Provider pools define which LLM providers handle requests, with support for prim + For simple routing, select *Route all requests to primary pool*. + -For advanced routing based on request properties, use CEL expressions. See xref:routing-cel.adoc[] for examples. +For advanced routing based on request properties, use CEL expressions. See xref:ai-gateway:routing-cel.adoc[] for examples. + Example CEL expression for tier-based routing: + @@ -231,7 +231,7 @@ If a provider pool contains multiple providers, you can distribute traffic to ba == Configure MCP tools (optional) -If your users will build glossterm:AI agent[,AI agents] that need access to glossterm:MCP tool[,tools] via glossterm:MCP[,Model Context Protocol (MCP)], configure MCP tool aggregation. +If your users will build glossterm:AI agent[,AI agents] that need access to glossterm:MCP tool[,tools] through glossterm:MCP[,Model Context Protocol (MCP)], configure MCP tool aggregation. On the gateway details page, select the *MCP* tab to configure tool discovery and execution. The MCP proxy aggregates multiple glossterm:MCP server[,MCP servers], allowing agents to find and call tools through a single endpoint. @@ -297,7 +297,7 @@ Rate limits for MCP work the same way as LLM rate limits. Repeat for each MCP server you want to aggregate. -See xref:aggregation.adoc[] for detailed information about MCP aggregation. +See xref:ai-gateway:aggregation.adoc[] for detailed information about MCP aggregation. === Configure the MCP orchestrator @@ -378,5 +378,5 @@ Users can then discover and connect to the gateway using the information provide == Next steps -* xref:routing-cel.adoc[CEL Routing Cookbook] +* xref:ai-gateway:routing-cel.adoc[CEL Routing Cookbook] * xref:integrations:index.adoc[Integrations] diff --git a/modules/ai-gateway/pages/aggregation.adoc b/modules/ai-gateway/pages/aggregation.adoc index 768ea61..c6f6b5c 100644 --- a/modules/ai-gateway/pages/aggregation.adoc +++ b/modules/ai-gateway/pages/aggregation.adoc @@ -801,7 +801,7 @@ Solution: Orchestrator sandbox: * No file system access -* No network access (except via MCP tools) +* No network access (except through MCP tools) * No system calls * Memory limit: // PLACEHOLDER: for example, 128MB * Execution timeout: // PLACEHOLDER: for example, 30s @@ -931,7 +931,7 @@ response = client.chat.completions.create( # Handle tool calls if response.choices[0].message.tool_calls: for tool_call in response.choices[0].message.tool_calls: - # Execute tool via gateway + # Execute tool through gateway tool_result = requests.post( f"{os.getenv('GATEWAY_ENDPOINT')}/mcp/tools/{tool_call.function.name}", headers={ diff --git a/modules/ai-gateway/pages/builders/discover-gateways.adoc b/modules/ai-gateway/pages/builders/discover-gateways.adoc index 205f8db..7095ee3 100644 --- a/modules/ai-gateway/pages/builders/discover-gateways.adoc +++ b/modules/ai-gateway/pages/builders/discover-gateways.adoc @@ -168,7 +168,7 @@ include::ROOT:partial$ai-hub-mode-indicator.adoc[] * Routing is pre-configured and intelligent * Models are automatically routed based on system-managed rules * You cannot see or modify routing rules (they're managed by Redpanda) -* Limited customization via administrator-configured preference toggles +* Limited customization through administrator-configured preference toggles * See xref:builders/use-ai-hub-gateway.adoc[] for AI Hub-specific guidance *Custom Mode:* @@ -302,4 +302,4 @@ echo -e "\n=== Gateway validated successfully ===" == Next steps -* xref:connect-agent.adoc[Connect Your Agent] +* xref:ai-gateway:connect-agent.adoc[Connect Your Agent] diff --git a/modules/ai-gateway/pages/configure-provider.adoc b/modules/ai-gateway/pages/configure-provider.adoc index b68e126..c8acce6 100644 --- a/modules/ai-gateway/pages/configure-provider.adoc +++ b/modules/ai-gateway/pages/configure-provider.adoc @@ -123,7 +123,7 @@ Google AI:: + [IMPORTANT] ==== -Gemini uses the `x-goog-api-key` header for authentication, not `Authorization: Bearer`. This matters when you wire up clients. See xref:connect-agent.adoc[Connect your agent]. +Gemini uses the `x-goog-api-key` header for authentication, not `Authorization: Bearer`. This matters when you wire up clients. See xref:ai-gateway:connect-agent.adoc[Connect your agent]. ==== AWS Bedrock:: @@ -173,7 +173,7 @@ For Bedrock, the picker exposes inference profiles, not raw foundation-model IDs [NOTE] ==== -Models are stored as structured `ProviderModel` entries (one entry per model, with the model name as the only required field). A future Phase 2 release will add per-model metadata such as custom pricing overrides. The legacy flat `models` field still works on writes for backward compatibility. +Redpanda stores models as structured `ProviderModel` entries (one entry per model, with the model name as the only required field). A future Phase 2 release will add per-model metadata such as custom pricing overrides. The legacy flat `models` field still works on writes for backward compatibility. ==== After you create the provider, the detail page renders each model as a row with capability badges (*Vision*, *Reasoning*, *Streaming*, and others lifted from the catalog), the model's 7-day spend, and a link to the per-model detail page. The model list supports search and filtering. @@ -185,7 +185,7 @@ The detail page also carries a *Last 7 days* KPI strip (*TOTAL SPEND*, *REQUESTS . Click *Create provider*. The button activates once *Name* and *Type* are both set; the right-hand *Summary* panel checks them off as you fill them in. . On the provider's detail page, the *Connection* card shows your *Proxy URL*, *Discovery* URL, *Base URL*, and *API key ref*. Copy the *Proxy URL*: this is where your applications point. . Scroll to the *Verify connection* section. Pick a model from the dropdown and click *Test Connection*. The status updates from _Not tested yet_ to a pass/fail indicator. Use the *Show commands* disclosure if you want to see the equivalent curl or SDK call. -. To wire up an application, open *Connect your app* further down the page or follow xref:connect-agent.adoc[Connect your agent]. +. To wire up an application, open *Connect your app* further down the page or follow xref:ai-gateway:connect-agent.adoc[Connect your agent]. A successful Test Connection result confirms that the provider's credentials, region (Bedrock), and network path are all correct. If the call fails, see <>. @@ -336,7 +336,7 @@ A list/grid view toggle in the top-right switches between table and card layouts |Confirm the client is sending its own `Authorization` header and the *API key* field on the provider is empty. |Gemini returns 401 -|Gemini uses the `x-goog-api-key` header, not `Authorization`. If you're seeing 401s on Gemini, check that the client is sending the correct header. See xref:connect-agent.adoc[Connect your agent]. +|Gemini uses the `x-goog-api-key` header, not `Authorization`. If you're seeing 401s on Gemini, check that the client is sending the correct header. See xref:ai-gateway:connect-agent.adoc[Connect your agent]. |Provider list empty or 403 |Confirm your account has the `dataplane_adp_llmprovider_*` permissions in ADP. @@ -357,4 +357,4 @@ AI Gateway does not provide these capabilities. For current status, consult the == Next steps -* xref:connect-agent.adoc[Connect your agent] +* xref:ai-gateway:connect-agent.adoc[Connect your agent] diff --git a/modules/ai-gateway/pages/connect-agent.adoc b/modules/ai-gateway/pages/connect-agent.adoc index f424e11..5a77aaf 100644 --- a/modules/ai-gateway/pages/connect-agent.adoc +++ b/modules/ai-gateway/pages/connect-agent.adoc @@ -19,7 +19,7 @@ After completing this guide, you will be able to: * A configured LLM provider. If you haven't created one yet, see xref:ai-gateway:configure-provider.adoc[Configure an LLM provider]. * For local development, nothing else. You'll install `rpk ai` in the next section. -* For CI or programmatic clients: a Redpanda Cloud service account with OIDC client credentials. See xref:redpanda-cloud:security:cloud-authentication.adoc[Authenticate to Redpanda Cloud]. +* For CI or programmatic clients: A Redpanda Cloud service account with OIDC client credentials. See xref:redpanda-cloud:security:cloud-authentication.adoc[Authenticate to Redpanda Cloud]. + // TODO: confirm whether ADP hosts its own service-account IAM post-standalone, or continues to share Redpanda Cloud Organization IAM. * A development environment with your chosen programming language. @@ -33,9 +33,9 @@ Every provider you create in AI Gateway gets its own proxy URL: /llm/v1/providers// ---- -* ``: the AI Gateway base URL for your dataplane. Cluster-specific subdomain on `clusters.rdpa.co` (for example, `https://aigw..clusters.rdpa.co`). Copy the exact value from the *Proxy URL* field on any provider's *Connection* card. -* ``: the name you gave the provider when you created it, for example `my-openai` or `prod-anthropic`. -* ``: the upstream provider's native API path (for example, `v1/chat/completions` for OpenAI, `v1/messages` for Anthropic). +* ``: The AI Gateway base URL for your dataplane. Cluster-specific subdomain on `clusters.rdpa.co` (for example, `https://aigw..clusters.rdpa.co`). Copy the exact value from the *Proxy URL* field on any provider's *Connection* card. +* ``: The name you gave the provider when you created it, for example `my-openai` or `prod-anthropic`. +* ``: The upstream provider's native API path (for example, `v1/chat/completions` for OpenAI, `v1/messages` for Anthropic). AI Gateway forwards the request to the upstream provider, attaches the configured credentials, and records the request for observability. Your application never sees the upstream API key. @@ -417,26 +417,26 @@ AI Gateway returns standard HTTP status codes. The upstream provider's error bod == Best practices * Use environment variables for the proxy URL and token. Never hard-code them. -* Refresh OIDC tokens through your client library so refresh is invisible to your SDK code (`authlib` for Python, `openid-client` for Node.js, etc.). +* Refresh OIDC tokens through your client library so refresh is invisible to your SDK code (`authlib` for Python, `openid-client` for Node.js, and so on). * Implement retry with exponential backoff for 5xx and timeout conditions. * Respect `Retry-After` on 429 responses. * Rotate service account credentials on a schedule your organization accepts. -// TODO(review-before-publish): forward-looking framing — "is in development" and "today". Same wording duplicated in `ai-gateway/pages/overview.adoc`. Confirm whether the *Cost & usage* surface is shipped before merge; rephrase or drop the sentence based on the current state. +// TODO(review-before-publish): forward-looking framing: "is in development" and "today". Same wording duplicated in `ai-gateway/pages/overview.adoc`. Confirm whether the *Cost & usage* surface is shipped before merge; rephrase or drop the sentence based on the current state. * Observe usage through the ADP UI on each provider's detail page. A *Cost & usage* section is in development (the UI shows a "Coming soon" placeholder today). == Troubleshooting === 401 Unauthorized -* If you're using `rpk ai`: rerun `rpk cloud login` to refresh the cached cloud token. Token expiry surfaces as a 401 with this hint in the error. -* If you're using OIDC client credentials: check the token hasn't expired and refresh it. Verify the audience is `cloudv2-production.redpanda.cloud` and the `Authorization` header is formatted `Bearer `. -* For Gemini: ensure the token is sent as `x-goog-api-key`, not `Authorization`. -* For Anthropic with passthrough: ensure the client is sending a valid Anthropic `Authorization` header. +* If you're using `rpk ai`: Rerun `rpk cloud login` to refresh the cached cloud token. Token expiry surfaces as a 401 with this hint in the error. +* If you're using OIDC client credentials: Check the token hasn't expired and refresh it. Verify the audience is `cloudv2-production.redpanda.cloud` and the `Authorization` header is formatted `Bearer `. +* For Gemini: Ensure the token is sent as `x-goog-api-key`, not `Authorization`. +* For Anthropic with passthrough: Ensure the client is sending a valid Anthropic `Authorization` header. === 404 Not found * Re-check the provider name in the proxy URL. The segment after `/providers/` must match the provider's `Name` exactly. -* For model-not-found: confirm the model identifier is one your provider's catalog actually serves. OpenAI-compatible endpoints accept whatever model IDs the upstream exposes. +* For model-not-found: Confirm the model identifier is one your provider's catalog actually serves. OpenAI-compatible endpoints accept whatever model IDs the upstream exposes. === 403 Forbidden diff --git a/modules/ai-gateway/pages/gateway-architecture.adoc b/modules/ai-gateway/pages/gateway-architecture.adoc index 388931e..a492d5b 100644 --- a/modules/ai-gateway/pages/gateway-architecture.adoc +++ b/modules/ai-gateway/pages/gateway-architecture.adoc @@ -6,7 +6,7 @@ :learning-objective-2: Explain the request lifecycle through policy evaluation stages :learning-objective-3: Identify supported providers, features, and current limitations -This page provides technical details about AI Gateway's architecture, request processing, and capabilities. For an overview of AI Gateway, see xref:overview.adoc[] +This page provides technical details about AI Gateway's architecture, request processing, and capabilities. For an overview of AI Gateway, see xref:ai-gateway:overview.adoc[] == Architecture overview @@ -45,7 +45,7 @@ The control plane manages gateway configuration and policy definition: The data plane handles all runtime request processing: -* **Request ingestion**: Accept requests via OpenAI-compatible API endpoints +* **Request ingestion**: Accept requests through OpenAI-compatible API endpoints * **Authentication**: Validate API keys and gateway access * **Policy evaluation**: Apply rate limits, spend limits, and routing policies * **Provider pool management**: Select primary or fallback providers based on availability @@ -82,7 +82,7 @@ Each policy evaluation happens synchronously in the request path. If rate limits For MCP tool requests, the lifecycle differs slightly to support deferred tool loading: -. Application discovers tools via `/mcp` endpoint +. Application discovers tools through `/mcp` endpoint . Gateway aggregates tools from approved MCP servers . Application receives search + orchestrator tools (deferred loading) . Application invokes specific tool @@ -122,7 +122,7 @@ Immutable rules that route requests based on: *Automatic Failover:* -Built-in fallback behavior when primary providers are unavailable (configurable via preference toggles). +Built-in fallback behavior when primary providers are unavailable (configurable through preference toggles). *6 User Preference Toggles:* @@ -215,5 +215,5 @@ endif::[] == Next steps -* xref:gateway-quickstart.adoc[] -* xref:aggregation.adoc[] +* xref:ai-gateway:gateway-quickstart.adoc[] +* xref:ai-gateway:aggregation.adoc[] diff --git a/modules/ai-gateway/pages/gateway-quickstart.adoc b/modules/ai-gateway/pages/gateway-quickstart.adoc index a6a623e..07d6c9d 100644 --- a/modules/ai-gateway/pages/gateway-quickstart.adoc +++ b/modules/ai-gateway/pages/gateway-quickstart.adoc @@ -63,7 +63,7 @@ When creating a gateway, you choose between two modes: * *AI Hub Mode*: Zero-configuration with pre-configured routing and backend pools. Just add provider credentials and start routing requests. Ideal for quickstarts and standard use cases. * *Custom Mode*: Full control over all routing rules, backend pools, and policies. Requires manual configuration. Ideal for custom routing logic and specialized requirements. -See xref:gateway-modes.adoc[] to understand which mode fits your needs. This quickstart focuses on Custom mode configuration. +See xref:ai-gateway:gateway-modes.adoc[] to understand which mode fits your needs. This quickstart focuses on Custom mode configuration. ==== endif::[] @@ -371,7 +371,7 @@ Guard for field existence: has(request.body.max_tokens) && request.body.max_tokens > 1000 ---- -For more CEL examples, see xref:routing-cel.adoc[]. +For more CEL examples, see xref:ai-gateway:routing-cel.adoc[]. == Connect AI tools to your gateway @@ -527,8 +527,8 @@ const openai = new OpenAI({ == Next steps -* xref:routing-cel.adoc[] -* xref:aggregation.adoc[] +* xref:ai-gateway:routing-cel.adoc[] +* xref:ai-gateway:aggregation.adoc[] * xref:integrations:index.adoc[] -* xref:gateway-architecture.adoc[] -* xref:overview.adoc[] +* xref:ai-gateway:gateway-architecture.adoc[] +* xref:ai-gateway:overview.adoc[] diff --git a/modules/ai-gateway/pages/overview.adoc b/modules/ai-gateway/pages/overview.adoc index 1269dd6..1f13a7e 100644 --- a/modules/ai-gateway/pages/overview.adoc +++ b/modules/ai-gateway/pages/overview.adoc @@ -50,7 +50,7 @@ Applications authenticate to ADP with OIDC service accounts instead of long-live === Per-provider observability -// TODO(review-before-publish): forward-looking framing — "is in development" and "today". Confirm whether the *Cost & usage* surface is shipped before merge; if so, drop the development qualifier and describe what users see; if not, drop the sentence entirely or rephrase to describe the current "Coming soon" placeholder without the in-development claim. +// TODO(review-before-publish): forward-looking framing: "is in development" and "today". Confirm whether the *Cost & usage* surface is shipped before merge; if so, drop the development qualifier and describe what users see; if not, drop the sentence entirely or rephrase to describe the current "Coming soon" placeholder without the in-development claim. The provider's detail page in the ADP UI records request and token counts. A *Cost & usage* section is in development (the UI shows a "Coming soon" placeholder today). == What's in the UI diff --git a/modules/ai-gateway/partials/ai-hub/configure-ai-hub.adoc b/modules/ai-gateway/partials/ai-hub/configure-ai-hub.adoc index d7b8d0f..2d4daf6 100644 --- a/modules/ai-gateway/partials/ai-hub/configure-ai-hub.adoc +++ b/modules/ai-gateway/partials/ai-hub/configure-ai-hub.adoc @@ -91,7 +91,7 @@ AI Hub mode automatically provisions 6 backend pools to handle different request * Authentication: x-api-key header * Transform: OpenAI → Anthropic Messages API * Timeout: Standard (60 seconds) -* Models: All `anthropic/*` models via OpenAI-compatible endpoint +* Models: All `anthropic/*` models through OpenAI-compatible endpoint -- . *Anthropic with Transform (Streaming)*: Converts OpenAI format to Anthropic's native format for streaming requests @@ -180,7 +180,7 @@ While routing rules are immutable, you can customize routing behavior through us include::ROOT:partial$ai-hub-preference-toggles.adoc[] -=== Set preferences via Console +=== Set preferences through Console . Navigate to your AI Hub gateway. . Click *Settings* → *Preferences*. @@ -196,7 +196,7 @@ include::ROOT:partial$ai-hub-preference-toggles.adoc[] Changes take effect immediately for new requests. -=== Set preferences via API +=== Set preferences through API [,bash] ---- diff --git a/modules/ai-gateway/partials/ai-hub/eject-to-custom-mode.adoc b/modules/ai-gateway/partials/ai-hub/eject-to-custom-mode.adoc index c27a4cc..128490c 100644 --- a/modules/ai-gateway/partials/ai-hub/eject-to-custom-mode.adoc +++ b/modules/ai-gateway/partials/ai-hub/eject-to-custom-mode.adoc @@ -140,7 +140,7 @@ Store these files securely. You'll reference them when configuring Custom mode r Define your post-ejection configuration: . *Routing rules*: Write CEL expressions that replicate AI Hub behavior, then add your custom rules -. *Backend pools*: Identify modifications needed (timeouts, custom providers, etc.) +. *Backend pools*: Identify modifications needed (timeouts, custom providers, and so on) . *Testing strategy*: Plan how you'll validate that existing functionality still works . *Rollout approach*: Decide whether to eject immediately or test in staging first @@ -185,13 +185,13 @@ Provide builders with clear expectations: The ejection process is irreversible. Follow these steps carefully. -=== Step 1: Initiate ejection +=== Initiate ejection . Navigate to your gateway in the console. . Click *Settings*. . Click *Eject to Custom Mode* button. -=== Step 2: Confirm understanding +=== Confirm understanding The console presents warnings about ejection: @@ -204,14 +204,14 @@ The console presents warnings about ejection: Check all boxes to proceed. -=== Step 3: Execute ejection +=== Execute ejection . Enter the gateway name to confirm: `[Your Gateway Name]` . Click *Eject to Custom Mode*. Ejection typically completes in seconds. The gateway remains available during the transition. -You can also eject via API: +You can also eject through API: [,bash] ---- @@ -230,7 +230,7 @@ Expected response: } ---- -=== Step 4: Verify ejection +=== Verify ejection After ejection completes: diff --git a/modules/ai-gateway/partials/ai-hub/gateway-modes.adoc b/modules/ai-gateway/partials/ai-hub/gateway-modes.adoc index 32075cf..bdb490e 100644 --- a/modules/ai-gateway/partials/ai-hub/gateway-modes.adoc +++ b/modules/ai-gateway/partials/ai-hub/gateway-modes.adoc @@ -44,7 +44,7 @@ When you create a gateway, you choose between two modes that differ in configura |*Routing preferences* |6 configurable toggles -|N/A (full control via rules) +|N/A (full control through rules) |*Modify backends* |Cannot modify/delete @@ -67,7 +67,7 @@ AI Hub mode provides instant, pre-configured access to OpenAI, Anthropic, and Go AI Hub mode eliminates complex LLM gateway configuration by providing pre-built routing rules and backend pools. Platform admins add provider credentials (OpenAI, Anthropic, Google Gemini) once, and all teams immediately benefit from intelligent routing across both providers. -Teams adopting LLMs typically face significant friction: configuring backends and routing rules takes hours, different providers have incompatible APIs, and developers must learn each provider's quirks. AI Hub mode solves this by providing instant access—IT adds API keys once, all teams benefit immediately. +Teams adopting LLMs typically face significant friction: configuring backends and routing rules takes hours, different providers have incompatible APIs, and developers must learn each provider's quirks. AI Hub mode solves this by providing instant access: IT adds API keys once, all teams benefit immediately. === Pre-configured components @@ -86,7 +86,7 @@ When you create an AI Hub gateway, you automatically get: * Model prefix routing: `openai/*`, `anthropic/*` * Model name pattern routing: `gpt-*`, `claude-*`, `o1-*` -* Special routing: embeddings, images, audio → OpenAI only +* Special routing: Embeddings, images, audio → OpenAI only * Native SDK detection: `/v1/messages` → Anthropic passthrough * Streaming detection → Extended timeout backends diff --git a/modules/ai-gateway/partials/ai-hub/use-ai-hub-gateway.adoc b/modules/ai-gateway/partials/ai-hub/use-ai-hub-gateway.adoc index 4e8d6e7..5a899a2 100644 --- a/modules/ai-gateway/partials/ai-hub/use-ai-hub-gateway.adoc +++ b/modules/ai-gateway/partials/ai-hub/use-ai-hub-gateway.adoc @@ -258,8 +258,8 @@ model = "openai/gpt-5.2" model = "anthropic/claude-sonnet-4.5" # ⚠️ Works but relies on pattern matching -model = "gpt-5.2" # Routes to OpenAI via pattern matching -model = "claude-sonnet-4.5" # Routes to Anthropic via pattern matching +model = "gpt-5.2" # Routes to OpenAI through pattern matching +model = "claude-sonnet-4.5" # Routes to Anthropic through pattern matching ---- Explicit provider prefixes ensure deterministic routing and make your code more maintainable. diff --git a/modules/governance/pages/budgets.adoc b/modules/governance/pages/budgets.adoc index b3b0479..fdd212c 100644 --- a/modules/governance/pages/budgets.adoc +++ b/modules/governance/pages/budgets.adoc @@ -24,7 +24,7 @@ Every LLM call routed through AI Gateway becomes a *spending event*. Each event * Request count. * The provider, model, user, and organization context the call ran under. -Events flow through a Kafka pipeline and roll up into queryable storage. No setup required — spending is captured the moment your first agent or MCP server runs through the gateway. +Events flow through a Kafka pipeline and roll up into queryable storage. No setup required: spending is captured the moment your first agent or MCP server runs through the gateway. Streaming and non-streaming requests are tracked the same way. Cache-write tokens (Anthropic 4.x, OpenAI 4.x prompt caches) are attributed correctly on streaming responses so cost rollups stay accurate when an agent reuses long system prompts. @@ -68,7 +68,7 @@ For more expressive queries, `SpendingFilter` also accepts an AIP-160 `filter` e Some guardrail evaluators call an LLM to do their work. A toxicity classifier, for example, runs the request or response through a separate model and accrues per-call cost in the process. PII detection over regex doesn't, but anything LLM-based does. -Guardrail evaluator cost surfaces in the same spending pipeline as user-facing LLM calls. The evaluator's cost is attributed to the *evaluator's configured upstream provider* — usually a small classifier model, separate from the user-facing LLM — so per-provider breakdowns separate the two automatically. +Guardrail evaluator cost surfaces in the same spending pipeline as user-facing LLM calls. The evaluator's cost is attributed to the *evaluator's configured upstream provider* (usually a small classifier model, separate from the user-facing LLM), so per-provider breakdowns separate the two automatically. For the per-evaluator cost model and how it interacts with the dashboard's spend view, see xref:governance:guardrails/index.adoc[Configure guardrails]. @@ -150,7 +150,7 @@ After saving an override, send a test request through the affected model and eit If the cost still reflects the catalog price, the override may not have propagated yet. Wait a few seconds for cost-reporting to pick it up, then re-test. -// TODO: roadmap items not yet shipped — configurable caps (daily/monthly, per org/agent/user, per provider/model); halt-vs-notify behavior on cap hits; per-agent caps; alert hooks (webhook/email/chat); multi-tenant cap-setting with override semantics. Once cap-management lands, add a "Set caps" how-to and split this page into a sub-folder if content outgrows a single page. Open Q C1 in the companion plan. +// TODO: roadmap items not yet shipped: configurable caps (daily/monthly, per org/agent/user, per provider/model); halt-vs-notify behavior on cap hits; per-agent caps; alert hooks (webhook/email/chat); multi-tenant cap-setting with override semantics. Once cap-management lands, add a "Set caps" how-to and split this page into a sub-folder if content outgrows a single page. Open Q C1 in the companion plan. == Next steps diff --git a/modules/governance/pages/guardrails/cost-tracking.adoc b/modules/governance/pages/guardrails/cost-tracking.adoc index 7dc4cbd..9c7bb18 100644 --- a/modules/governance/pages/guardrails/cost-tracking.adoc +++ b/modules/governance/pages/guardrails/cost-tracking.adoc @@ -2,7 +2,7 @@ :description: See what each evaluator costs, where the cost surfaces in transcripts and dashboards, and how guardrail spend interacts with token budgets. :page-topic-type: reference :personas: platform_admin -// TODO: confirm persona vocabulary against docs-team-standards. If a Guardrails-specific persona exists (e.g., security_admin), apply it here. Open Q D4 in the companion plan. +// TODO: confirm persona vocabulary against docs-team-standards. If a Guardrails-specific persona exists (for example, security_admin), apply it here. Open Q D4 in the companion plan. // TODO: this page lands at GA. The Guardrails plan (https://redpandadata.atlassian.net/wiki/spaces/DOC/pages/1881702438) lists this page as a should-ship deliverable; the cost-pool integration with Budgets fills in once eng confirms whether evaluator cost flows into the user-facing budget pool, a separate guardrail-evaluator pool, or both. Open Qs C2, C3 in the companion plan. @@ -25,7 +25,7 @@ Each evaluator type has a different cost shape: |No transcript cost line. Compute time absorbed into gateway latency metrics. |*Toxicity* -|Per-call LLM cost. Counts against the *evaluator's configured upstream provider* — typically a small classifier model, separate from the user-facing LLM. +|Per-call LLM cost. Counts against the *evaluator's configured upstream provider*: Typically a small classifier model, separate from the user-facing LLM. |Per-call cost line in the transcript, alongside the user-facing LLM call. Aggregated into provider-breakdown views in the governance dashboard. |*Custom webhook* @@ -37,9 +37,9 @@ Each evaluator type has a different cost shape: Guardrail-attributed cost surfaces in three places, ordered from most granular to most aggregated: -* *Transcripts* — per-call cost line per fired evaluator, recorded alongside the user-facing LLM call. See xref:observability:transcripts.adoc[Read a transcript]. -* *Metrics* — aggregate cost per guardrail per provider per time window. See xref:observability:metrics.adoc[Metrics]. -* *Governance dashboard* — guardrail-attributed spend appears in the spend view, broken down by provider. See xref:governance:dashboard/overview.adoc[Read the governance overview]. +* *Transcripts*: Per-call cost line per fired evaluator, recorded alongside the user-facing LLM call. See xref:observability:transcripts.adoc[Read a transcript]. +* *Metrics*: Aggregate cost per guardrail per provider per time window. See xref:observability:metrics.adoc[Metrics]. +* *Governance dashboard*: Guardrail-attributed spend appears in the spend view, broken down by provider. See xref:governance:dashboard/overview.adoc[Read the governance overview]. // TODO: confirm whether the dashboard's spend view distinguishes guardrail-evaluator spend from user-facing LLM spend. Open Q C3 in the companion plan. @@ -47,19 +47,19 @@ Guardrail-attributed cost surfaces in three places, ordered from most granular t Guardrail spend can grow unexpectedly when traffic spikes or when a Toxicity guardrail runs at `BOTH` phases on a high-throughput provider. Three knobs control it: -* *Per-guardrail toggle* — disable a guardrail to short-circuit its evaluator. The guardrail config is preserved; re-enable when ready. Useful as a kill switch when an evaluator's cost runs away. -* *Phase scoping* — running a Toxicity evaluator at `OUTPUT` only (instead of `BOTH`) halves the per-request cost. -* *Token budgets* — see xref:governance:budgets.adoc[Token budgets and limits]. Guardrail evaluator cost flows into the same spending-event pipeline as user-facing LLM cost; per-provider breakdowns separate the two. +* *Per-guardrail toggle*: Disable a guardrail to short-circuit its evaluator. The guardrail config is preserved; re-enable when ready. Useful as a kill switch when an evaluator's cost runs away. +* *Phase scoping*: Running a Toxicity evaluator at `OUTPUT` only (instead of `BOTH`) halves the per-request cost. +* *Token budgets*: See xref:governance:budgets.adoc[Token budgets and limits]. Guardrail evaluator cost flows into the same spending-event pipeline as user-facing LLM cost; per-provider breakdowns separate the two. -// TODO: confirm whether evaluator cost flows into the same budget pool as user-facing LLM cost, or a separate guardrail-evaluator pool. The master plan calls for "guardrail-cost separation documented" in the Budgets workflow GA scope. Open Q C2 in the companion plan. +// TODO: confirm whether evaluator cost flows into the same budget pool as user-facing LLM cost, or a separate guardrail-evaluator pool. The parent plan calls for "guardrail-cost separation documented" in the Budgets workflow GA scope. Open Q C2 in the companion plan. == Cost versus latency tradeoff Each evaluator type has a different cost-versus-latency profile: -* *PII* is cheap and fast — regex-based detection adds milliseconds, no LLM call. -* *Toxicity* is expensive and slow — the classifier call adds tokens and latency. -* *Custom webhook* is whatever your webhook makes it — control your own infrastructure spend and latency profile. +* *PII* is cheap and fast: Regex-based detection adds milliseconds, no LLM call. +* *Toxicity* is expensive and slow: The classifier call adds tokens and latency. +* *Custom webhook* is whatever your webhook makes it: Control your own infrastructure spend and latency profile. A typical optimization: disable Toxicity on `INPUT` and run it only on `OUTPUT`. Most policy violations are about what the model generates, not what the user asks; cutting the `INPUT` phase halves both the cost and the latency of the Toxicity guardrail without losing meaningful coverage. diff --git a/modules/governance/pages/guardrails/create-guardrail.adoc b/modules/governance/pages/guardrails/create-guardrail.adoc index a2aba8f..3136991 100644 --- a/modules/governance/pages/guardrails/create-guardrail.adoc +++ b/modules/governance/pages/guardrails/create-guardrail.adoc @@ -2,7 +2,7 @@ :description: Configure a guardrail, pick an evaluator type and phase, attach it to one or more LLM providers, and verify that it fires. :page-topic-type: how-to :personas: platform_admin -// TODO: confirm persona vocabulary against docs-team-standards. If a Guardrails-specific persona exists (e.g., security_admin), apply it here. Open Q D4 in the companion plan. +// TODO: confirm persona vocabulary against docs-team-standards. If a Guardrails-specific persona exists (for example, security_admin), apply it here. Open Q D4 in the companion plan. :learning-objective-1: Create and configure a guardrail of a chosen evaluator type :learning-objective-2: Attach the guardrail to one or more LLM providers and enable it :learning-objective-3: Verify the guardrail fires and trace the violation through the transcript @@ -67,7 +67,7 @@ Fill in the per-type config block. The form fields differ per evaluator type; se Select one or more LLM providers to attach the guardrail to. Multi-attach is supported: one guardrail can apply to many providers. -// TODO: confirm whether guardrails also attach at other scopes (agents, MCP servers, organizations). The pre-pivot proto attached via `provider_ids[]` and `route_ids[]`; routes were removed in cloudv2 commit `7eff2ecbbf`. Open Qs A3, A4 in the companion plan. +// TODO: confirm whether guardrails also attach at other scopes (agents, MCP servers, organizations). The pre-pivot proto attached through `provider_ids[]` and `route_ids[]`; routes were removed in cloudv2 commit `7eff2ecbbf`. Open Qs A3, A4 in the companion plan. == Enable the guardrail @@ -100,7 +100,7 @@ The request should return an error. Open the request's transcript and confirm a * *Evaluator returns false positives*: see xref:governance:guardrails/violations.adoc[Read violations] for tuning patterns per evaluator type. * *Evaluator times out or is unavailable*: see xref:governance:guardrails/violations.adoc[Read violations] for the evaluator-down section. -* *Attached provider doesn't fire the guardrail*: confirm attachment (right provider, right phase), enabled state, and that requests are actually reaching the gateway (not bypassing via a direct provider URL). +* *Attached provider doesn't fire the guardrail*: confirm attachment (right provider, right phase), enabled state, and that requests are actually reaching the gateway (not bypassing through a direct provider URL). == Next steps diff --git a/modules/governance/pages/guardrails/overview.adoc b/modules/governance/pages/guardrails/overview.adoc index 7564db1..575de7c 100644 --- a/modules/governance/pages/guardrails/overview.adoc +++ b/modules/governance/pages/guardrails/overview.adoc @@ -2,9 +2,9 @@ :description: Learn what guardrails are, the evaluator types you can choose from, the INPUT and OUTPUT phase model, and where violations show up. :page-topic-type: overview :personas: platform_admin, evaluator, app_developer -// TODO: confirm persona vocabulary against docs-team-standards. The Guardrails plan uses canonical personas; if a Guardrails-specific persona exists (e.g., security_admin), apply it here. Open Q D4 in the companion plan. +// TODO: confirm persona vocabulary against docs-team-standards. The Guardrails plan uses canonical personas; if a Guardrails-specific persona exists (for example, security_admin), apply it here. Open Q D4 in the companion plan. :learning-objective-1: Describe what a guardrail does and why you would attach one to an LLM provider -:learning-objective-2: Distinguish between the three evaluator types — PII, Toxicity, and Custom webhook — and the situations each fits +:learning-objective-2: Distinguish between the three evaluator types (PII, Toxicity, and Custom webhook) and the situations each fits :learning-objective-3: Recognize where a guardrail violation surfaces and which page to read next A *guardrail* is a configurable safety or policy filter that runs on the request or response side of every LLM call routed through AI Gateway. Use a guardrail to prevent personally identifiable information (PII) from leaving your organization, filter toxic or off-policy responses before they reach end users, or delegate the decision to a custom webhook that enforces policy your way. @@ -19,9 +19,9 @@ After reading this page, you will be able to: Every guardrail runs at one or both of two phases: -* *INPUT* — the gateway evaluates the user's prompt before forwarding it upstream. Use INPUT to stop sensitive content from reaching a third-party model in the first place. -* *OUTPUT* — the gateway evaluates the model's response before returning it to the caller. Use OUTPUT to filter what the model generates. -* *BOTH* — runs the evaluator at both phases. Common for PII (defend in both directions); rare for Toxicity (where INPUT-side filtering is usually less useful). +* *INPUT*: The gateway evaluates the user's prompt before forwarding it upstream. Use INPUT to stop sensitive content from reaching a third-party model in the first place. +* *OUTPUT*: The gateway evaluates the model's response before returning it to the caller. Use OUTPUT to filter what the model generates. +* *BOTH*: Runs the evaluator at both phases. Common for PII (defend in both directions); rare for Toxicity (where INPUT-side filtering is usually less useful). Streaming responses change the timing slightly: where async evaluation is supported, OUTPUT evaluators run alongside the stream rather than blocking it. Sync evaluators (and all INPUT evaluators) run before the request continues. @@ -34,14 +34,14 @@ Streaming responses change the timing slightly: where async evaluation is suppor |Type |What it does |Where it fits |Cost shape |*PII* -|Detects personally identifiable information (names, emails, phone numbers, SSNs, etc.) in text using regex and entity-recognition rules. +|Detects personally identifiable information (names, emails, phone numbers, SSNs, and so on) in text using regex and entity-recognition rules. |Defending data exfiltration to third-party models. Typically runs at `BOTH` phases. |No per-call LLM cost. Compute time only. |*Toxicity* |Runs the input or output through a toxicity classifier and flags content above a configurable threshold. |Filtering what the model generates. Typically runs at `OUTPUT` only. -|Per-call LLM cost — counts against the evaluator's configured upstream provider, not the user-facing LLM. +|Per-call LLM cost: counts against the evaluator's configured upstream provider, not the user-facing LLM. |*Custom webhook* |Delegates the decision to a user-provided HTTPS endpoint. The gateway POSTs the content to your endpoint and acts on the pass/block response. @@ -57,19 +57,19 @@ For per-type config schemas, supported phases, and behavior on match, see xref:g When an evaluator decides to block a request, the gateway stops forwarding it (or stops returning the response, on OUTPUT) and returns an error to the caller. Every fired guardrail records a *violation* entry on the request's transcript, captured in the same observability pipeline that records the LLM call itself. Read the transcript to see which guardrail fired, at which phase, and what content matched. See xref:observability:transcripts.adoc[Read a transcript]. -A different scenario — the evaluator itself errored out (for example, a custom webhook timed out or a classifier model is unavailable) — is handled separately. See xref:governance:guardrails/violations.adoc[Read violations] for evaluator-down behavior, fail-closed versus fail-open defaults, and per-guardrail overrides. +A different scenario is handled separately: the evaluator itself errored out (for example, a custom webhook timed out or a classifier model is unavailable). See xref:governance:guardrails/violations.adoc[Read violations] for evaluator-down behavior, fail-closed versus fail-open defaults, and per-guardrail overrides. // TODO: confirm fail-closed vs. fail-open default at GA, and whether it's configurable per guardrail. Open Qs B2 and B5 in the companion plan. == Where you attach a guardrail -A guardrail attaches to one or more LLM providers. Each provider can carry many guardrails — a typical setup pairs one PII guardrail with one Toxicity guardrail on the same provider, then layers a Custom-webhook guardrail on top for org-specific policy. +A guardrail attaches to one or more LLM providers. Each provider can carry many guardrails: a typical setup pairs one PII guardrail with one Toxicity guardrail on the same provider, then layers a Custom-webhook guardrail on top for org-specific policy. -// TODO: confirm whether guardrails also attach at other scopes — agents, MCP servers, organizations — once team-ai answers the post-pivot resource-shape question. The pre-pivot proto attached via `provider_ids[]` and `route_ids[]`; routes were removed in cloudv2 commit `7eff2ecbbf`. Open Qs A1, A3, A4 in the companion plan. +// TODO: confirm whether guardrails also attach at other scopes: agents, MCP servers, organizations: once team-ai answers the post-pivot resource-shape question. The pre-pivot proto attached through `provider_ids[]` and `route_ids[]`; routes were removed in cloudv2 commit `7eff2ecbbf`. Open Qs A1, A3, A4 in the companion plan. == Where to go next -* xref:governance:guardrails/create-guardrail.adoc[Create a guardrail] — walk through configuring and attaching your first guardrail. -* xref:governance:guardrails/types-reference.adoc[Evaluator types reference] — full config schemas for PII, Toxicity, and Custom-webhook evaluators. -* xref:governance:guardrails/violations.adoc[Read violations] — investigate why a guardrail fired and tune false-positive rates. -* xref:governance:guardrails/cost-tracking.adoc[Cost tracking] — see what each evaluator costs and where the cost surfaces. +* xref:governance:guardrails/create-guardrail.adoc[Create a guardrail]: walk through configuring and attaching your first guardrail. +* xref:governance:guardrails/types-reference.adoc[Evaluator types reference]: full config schemas for PII, Toxicity, and Custom-webhook evaluators. +* xref:governance:guardrails/violations.adoc[Read violations]: investigate why a guardrail fired and tune false-positive rates. +* xref:governance:guardrails/cost-tracking.adoc[Cost tracking]: see what each evaluator costs and where the cost surfaces. diff --git a/modules/governance/pages/guardrails/types-reference.adoc b/modules/governance/pages/guardrails/types-reference.adoc index 8314fad..33c425e 100644 --- a/modules/governance/pages/guardrails/types-reference.adoc +++ b/modules/governance/pages/guardrails/types-reference.adoc @@ -2,7 +2,7 @@ :description: Definitive reference for every evaluator type's config schema, supported phases, behavior on match, and gotchas. :page-topic-type: reference :personas: platform_admin -// TODO: confirm persona vocabulary against docs-team-standards. If a Guardrails-specific persona exists (e.g., security_admin), apply it here. Open Q D4 in the companion plan. +// TODO: confirm persona vocabulary against docs-team-standards. If a Guardrails-specific persona exists (for example, security_admin), apply it here. Open Q D4 in the companion plan. // TODO: this page lands at GA. The Guardrails plan (https://redpandadata.atlassian.net/wiki/spaces/DOC/pages/1881702438) lists this page as a must-ship deliverable; the per-type config schemas are filled in once eng confirms the post-pivot Guardrail resource shape and the evaluator type set at GA. Open Qs A1, A2, A5, B3, B4 in the companion plan. @@ -16,63 +16,63 @@ Each evaluator type has its own config schema, supported phase set, behavior-on- == PII evaluator -*What it does* — detects personally identifiable information (names, emails, phone numbers, SSNs, addresses, and other entity types) in text using regex and entity-recognition rules. +*What it does*: Detects personally identifiable information (names, emails, phone numbers, SSNs, addresses, and other entity types) in text using regex and entity-recognition rules. -*Phases supported* — `INPUT`, `OUTPUT`, `BOTH`. +*Phases supported*: `INPUT`, `OUTPUT`, `BOTH`. // TODO: lift the full config schema from `apps/aigw/internal/guardrails/pii.go` once the post-pivot proto is final. Document each field's name, type, default, and example. Likely fields: entity types to detect (allowlist or denylist), locale (US-only patterns versus EU patterns), confidence threshold. -*Behavior on match* — block (default). Redact-and-pass behavior may be configurable; confirm whether the build exposes redact-mode and how it interacts with the per-type config. +*Behavior on match*: Block (default). Redact-and-pass behavior may be configurable; confirm whether the build exposes redact-mode and how it interacts with the per-type config. // TODO: confirm block-vs-redact options at GA. Open Q B3 in the companion plan. -*Cost* — none beyond compute. Regex is negligible; entity-recognition can be non-trivial. +*Cost*: None beyond compute. Regex is negligible; entity-recognition can be non-trivial. *Gotchas:* * Regex-based detection produces false positives on uncommon PII formats (international phone numbers, non-US SSN equivalents). -* Locale-specific patterns matter — a US-tuned config will miss EU PII patterns and vice versa. +* Locale-specific patterns matter: A US-tuned config will miss EU PII patterns and vice versa. * PII matches in code blocks or quoted JSON payloads can produce surprising blocks; tune the entity allowlist if your traffic includes structured payloads. == Toxicity evaluator -*What it does* — runs the input or output through a toxicity classifier and flags content above a configurable threshold. +*What it does*: Runs the input or output through a toxicity classifier and flags content above a configurable threshold. -*Phases supported* — typically `OUTPUT`; `INPUT` and `BOTH` also valid but rarely useful. +*Phases supported*: Typically `OUTPUT`; `INPUT` and `BOTH` also valid but rarely useful. -// TODO: confirm the config schema with eng. The phase5-aigw-guardrails branch in cloudv2 ships a "keyword" evaluator that may rename to Toxicity at GA, stay as a fourth type, or be dropped entirely. Open Q A5 in the companion plan. Likely fields: classifier model identifier, threshold (0.0–1.0), category set to flag (hate, harassment, self-harm, sexual, violence, etc.). +// TODO: confirm the config schema with eng. The phase5-aigw-guardrails branch in cloudv2 ships a "keyword" evaluator that may rename to Toxicity at GA, stay as a fourth type, or be dropped entirely. Open Q A5 in the companion plan. Likely fields: classifier model identifier, threshold (0.0–1.0), category set to flag (hate, harassment, self-harm, sexual, violence, and so on). -*Behavior on match* — block. +*Behavior on match*: Block. -*Cost* — per-call LLM cost. Counts against the *evaluator's configured upstream provider* (typically a small classifier model, separate from the user-facing LLM). Token cost surfaces alongside the user-facing LLM call in the same transcript. See xref:governance:guardrails/cost-tracking.adoc[Cost tracking]. +*Cost*: Per-call LLM cost. Counts against the *evaluator's configured upstream provider* (typically a small classifier model, separate from the user-facing LLM). Token cost surfaces alongside the user-facing LLM call in the same transcript. See xref:governance:guardrails/cost-tracking.adoc[Cost tracking]. *Gotchas:* -* Threshold tuning matters — too aggressive blocks legitimate traffic; too permissive lets toxic content through. Start at the classifier's recommended default and tune from violation review. +* Threshold tuning matters: Too aggressive blocks legitimate traffic; too permissive lets toxic content through. Start at the classifier's recommended default and tune from violation review. * Latency adds to overall response time. If async-OUTPUT evaluation isn't supported for your model's stream type, the user-visible latency includes the classifier call. * The classifier model itself can fail or be down. See xref:governance:guardrails/violations.adoc[Read violations] for the evaluator-down section. == Custom webhook evaluator -*What it does* — delegates the evaluation to a user-provided HTTPS endpoint. The gateway POSTs the content to your endpoint and acts on the response. +*What it does*: Delegates the evaluation to a user-provided HTTPS endpoint. The gateway POSTs the content to your endpoint and acts on the response. -*Phases supported* — `INPUT`, `OUTPUT`, `BOTH`. +*Phases supported*: `INPUT`, `OUTPUT`, `BOTH`. *Webhook contract:* // TODO: lift the exact request and response shape from `apps/aigw/internal/guardrails/registry.go` and the custom-webhook handler once the webhook contract lands. Open Q B4 in the companion plan. -* *Request shape* — the gateway POSTs a JSON document containing the phase (`INPUT` or `OUTPUT`), the content payload (prompt or response text), request metadata (request ID for correlation, model identifier, attached provider), and any extra fields the contract specifies. -* *Response shape* — your endpoint returns a JSON document containing the decision (`pass` or `block`), an optional reason string surfaced in the violation entry, and (if redact-mode is supported) an optional redacted-content payload. -* *Authentication* — the gateway authenticates to your webhook using a shared secret stored in the ADP secret store. mTLS or signed-JWT alternatives may be available. -* *Retry / timeout* — the gateway honors a default per-call timeout. On webhook unavailable, the evaluator-down behavior applies (see xref:governance:guardrails/violations.adoc[Read violations]). +* *Request shape*: The gateway POSTs a JSON document containing the phase (`INPUT` or `OUTPUT`), the content payload (prompt or response text), request metadata (request ID for correlation, model identifier, attached provider), and any extra fields the contract specifies. +* *Response shape*: Your endpoint returns a JSON document containing the decision (`pass` or `block`), an optional reason string surfaced in the violation entry, and (if redact-mode is supported) an optional redacted-content payload. +* *Authentication*: The gateway authenticates to your webhook using a shared secret stored in the ADP secret store. mTLS or signed-JWT alternatives may be available. +* *Retry / timeout*: The gateway honors a default per-call timeout. On webhook unavailable, the evaluator-down behavior applies (see xref:governance:guardrails/violations.adoc[Read violations]). // TODO: confirm webhook authentication options at GA. Open Q B4c in the companion plan. -*Cost* — gateway charges nothing per call. Your webhook's compute cost is your own. +*Cost*: Gateway charges nothing per call. Your webhook's compute cost is your own. *Gotchas:* * Slow webhooks add to user-visible latency, especially on `INPUT` (the request is blocked until the webhook responds). * Webhook errors should fail closed (block) by default for safety, but make this configurable per guardrail if your use case favors availability. -* Logging and observability of the webhook itself is your responsibility — the gateway only records the decision the webhook returned. +* Logging and observability of the webhook itself is your responsibility: The gateway only records the decision the webhook returned. diff --git a/modules/governance/pages/guardrails/violations.adoc b/modules/governance/pages/guardrails/violations.adoc index 4fa4ca8..81d55e5 100644 --- a/modules/governance/pages/guardrails/violations.adoc +++ b/modules/governance/pages/guardrails/violations.adoc @@ -2,7 +2,7 @@ :description: Investigate why a guardrail fired, distinguish a violation from an evaluator failure, and tune the configuration. :page-topic-type: how-to :personas: app_developer, platform_admin -// TODO: confirm persona vocabulary against docs-team-standards. If a Guardrails-specific persona exists (e.g., security_admin), apply it here. Open Q D4 in the companion plan. +// TODO: confirm persona vocabulary against docs-team-standards. If a Guardrails-specific persona exists (for example, security_admin), apply it here. Open Q D4 in the companion plan. :learning-objective-1: Locate a violation entry in a transcript and identify which guardrail fired :learning-objective-2: Distinguish a guardrail violation from an evaluator failure and apply the right response :learning-objective-3: Recognize common false-positive patterns per evaluator type and tune the configuration @@ -21,14 +21,14 @@ After reading this page, you will be able to: A violation is the gateway's record that a guardrail's evaluator returned a `block` decision on a specific request, at a specific phase. Every violation carries the guardrail name, the phase (`INPUT` or `OUTPUT`), a redacted summary of what content matched, and the action the gateway took (block, redact, or pass-through-with-warning). -A violation is distinct from an *evaluator failure*: a failure is when the evaluator itself errored out (custom webhook timed out, classifier model unavailable, regex parser crashed). Failures and violations surface differently and are handled differently. See xref:_evaluator_down_behavior[Evaluator-down behavior] below. +A violation is distinct from an *evaluator failure*: A failure is when the evaluator itself errored out (custom webhook timed out, classifier model unavailable, regex parser crashed). Failures and violations surface differently and are handled differently. See xref:_evaluator_down_behavior[Evaluator-down behavior] below. == Where violations show up Violations surface in two places: -* *Transcripts* — each request's transcript carries a violation entry per fired guardrail, alongside the LLM call entry, tool calls, and cost data. See xref:observability:transcripts.adoc[Read a transcript] for the full transcript walkthrough. -* *Metrics* — aggregate violation counts per guardrail per provider per time window. See xref:observability:metrics.adoc[Metrics]. +* *Transcripts*: Each request's transcript carries a violation entry per fired guardrail, alongside the LLM call entry, tool calls, and cost data. See xref:observability:transcripts.adoc[Read a transcript] for the full transcript walkthrough. +* *Metrics*: Aggregate violation counts per guardrail per provider per time window. See xref:observability:metrics.adoc[Metrics]. // TODO: confirm the violation field shape in the transcript proto. The Transcripts plan (workflow #7) didn't call out a violation field specifically; coordinate with that workflow's author so the xref above resolves to a real proto field. Open Q C1 in the companion plan. @@ -36,10 +36,10 @@ Violations surface in two places: Open the transcript for a request that fired a guardrail and walk through the violation entry: -* *Guardrail name* — the human-readable identifier you assigned at create time. -* *Phase* — `INPUT` (matched the user's prompt) or `OUTPUT` (matched the model's response). -* *Matched content* — a redacted summary, not the full payload. The full payload remains in the request body itself; the violation entry is a pointer. -* *Action taken* — `block` (request stopped, error returned to caller), `redact` (matched fields stripped, request continued), or `pass-through-with-warning` (request continued, violation logged for review). +* *Guardrail name*: The human-readable identifier you assigned at create time. +* *Phase*: `INPUT` (matched the user's prompt) or `OUTPUT` (matched the model's response). +* *Matched content*: A redacted summary, not the full payload. The full payload remains in the request body itself; the violation entry is a pointer. +* *Action taken*: `block` (request stopped, error returned to caller), `redact` (matched fields stripped, request continued), or `pass-through-with-warning` (request continued, violation logged for review). // TODO: confirm action-taken value set at GA. Open Q B3 in the companion plan. @@ -49,19 +49,19 @@ Use these patterns as a starting checklist when a guardrail fires unexpectedly. === PII -* *Regex too broad* — the entity allowlist matched on a benign substring. Tune the entity types to the specific PII categories you care about. -* *Locale mismatch* — the config is tuned for US patterns but traffic includes EU PII (or vice versa). Add the relevant locale's pattern set or scope the guardrail to a per-region provider. -* *Structured-payload matches* — code blocks, JSON payloads, or sample data contain strings that resemble PII. Adjust the allowlist or scope the guardrail to user-text fields only. +* *Regex too broad*: The entity allowlist matched on a benign substring. Tune the entity types to the specific PII categories you care about. +* *Locale mismatch*: The config is tuned for US patterns but traffic includes EU PII (or vice versa). Add the relevant locale's pattern set or scope the guardrail to a per-region provider. +* *Structured-payload matches*: Code blocks, JSON payloads, or sample data contain strings that resemble PII. Adjust the allowlist or scope the guardrail to user-text fields only. === Toxicity -* *Threshold too aggressive* — drop the threshold and re-evaluate from the violation history. Categories matter too — disable categories that don't apply to your use case. -* *Wrong phase* — `INPUT` toxicity blocks the user from asking certain questions; usually `OUTPUT` is what you want. Switch the phase to `OUTPUT` only. +* *Threshold too aggressive*: Drop the threshold and re-evaluate from the violation history. Categories matter too; disable categories that don't apply to your use case. +* *Wrong phase*: `INPUT` toxicity blocks the user from asking certain questions; usually `OUTPUT` is what you want. Switch the phase to `OUTPUT` only. === Custom webhook -* *Webhook returning `block` for legitimate content* — the bug is in your webhook. Add logging to your webhook to see which inputs are being flagged. -* *Webhook timing out* — see xref:_evaluator_down_behavior[Evaluator-down behavior] below; the gateway fails closed (or open, depending on configuration) when a webhook can't be reached. +* *Webhook returning `block` for legitimate content*: The bug is in your webhook. Add logging to your webhook to see which inputs are being flagged. +* *Webhook timing out*: See xref:_evaluator_down_behavior[Evaluator-down behavior] below; the gateway fails closed (or open, depending on configuration) when a webhook can't be reached. == Guardrail expected to fire but didn't @@ -70,15 +70,15 @@ If you expect a guardrail to fire and it doesn't: * Confirm the guardrail is *enabled*. Disabled guardrails skip evaluation entirely. * Confirm the *attached provider* is the one the request actually used. A guardrail attached to `provider-a` doesn't fire on requests routed to `provider-b`. * Confirm the *phase alignment*. A guardrail set to `INPUT` only doesn't fire on the response side. A guardrail set to `OUTPUT` only doesn't fire on the request side. -* Confirm the request *actually reached the gateway* — direct-to-provider requests that bypass the gateway are invisible to guardrails. +* Confirm the request *actually reached the gateway*: Direct-to-provider requests that bypass the gateway are invisible to guardrails. [#_evaluator_down_behavior] == Evaluator-down behavior When an evaluator can't run (custom webhook unreachable, classifier model down, internal evaluator panic), the gateway has two options: -* *Fail closed* — block the request. Safe default; preserves the policy at the cost of availability. -* *Fail open* — pass the request through. Available default; preserves throughput at the cost of policy coverage. +* *Fail closed*: Block the request. Safe default; preserves the policy at the cost of availability. +* *Fail open*: Pass the request through. Available default; preserves throughput at the cost of policy coverage. // TODO: confirm the GA default and whether it's configurable per guardrail. Open Qs B2, B5 in the companion plan. @@ -86,9 +86,9 @@ The default is fail-closed. Per-guardrail override is available for guardrails w == Async versus sync evaluation -Per the AI Gateway design, evaluators run async where possible — specifically, `OUTPUT` evaluators alongside non-streaming responses can complete in parallel with the response delivery. `INPUT` evaluators always run synchronously: the request is blocked until the evaluator returns, because pass/block has to be decided before the request can dispatch upstream. +Per the AI Gateway design, evaluators run async where possible: specifically, `OUTPUT` evaluators alongside non-streaming responses can complete in parallel with the response delivery. `INPUT` evaluators always run synchronously: the request is blocked until the evaluator returns, because pass/block has to be decided before the request can dispatch upstream. -// TODO: confirm shipping behavior at GA — the design intent is async OUTPUT where possible, but streaming responses change the timing. Open Q B1 in the companion plan. +// TODO: confirm shipping behavior at GA: the design intent is async OUTPUT where possible, but streaming responses change the timing. Open Q B1 in the companion plan. == Next steps diff --git a/modules/integrations/pages/remote-mcp-clients.adoc b/modules/integrations/pages/remote-mcp-clients.adoc index df8337f..be1507a 100644 --- a/modules/integrations/pages/remote-mcp-clients.adoc +++ b/modules/integrations/pages/remote-mcp-clients.adoc @@ -59,7 +59,7 @@ Before you wire up the chat-client connector, make sure you have: * An MCP server already created in AI Gateway. See xref:mcp:create-server.adoc[Create an MCP Server]. * The MCP server's *API URL*. Copy it from the server's *Overview* tab. -* For user-delegated MCP servers: an OAuth Provider configured for the upstream system. See xref:mcp:oauth-providers.adoc[Configure an OAuth Provider]. +* For user-delegated MCP servers: An OAuth Provider configured for the upstream system. See xref:mcp:oauth-providers.adoc[Configure an OAuth Provider]. * End-users have accounts with the chat client (Claude, ChatGPT, Gemini, Cursor) and the upstream system the MCP server connects to. == Register an OAuth Client in AI Gateway @@ -120,7 +120,7 @@ Anthropic supports custom MCP connectors in Claude.ai (web), Claude Desktop, and . Click *Add*. The connector appears in the Connectors list with a `CUSTOM` badge. . Click *Connect* on the new connector row. Claude opens a browser tab pointed at AI Gateway's authorization endpoint. Sign in with your AI Gateway identity (Auth0 today, Zitadel in a future release). Once approved, the connector becomes invokable in any conversation. + -// TODO(review-before-publish): forward-looking framing — "Auth0 today, Zitadel in a future release" exposes an internal IdP migration plan. Confirm whether to name the IdP at all in customer prose, or drop both names and just say "your AI Gateway identity provider". +// TODO(review-before-publish): forward-looking framing: "Auth0 today, Zitadel in a future release" exposes an internal IdP migration plan. Confirm whether to name the IdP at all in customer prose, or drop both names and just say "your AI Gateway identity provider". // TODO: capture screenshots of the Add custom connector modal and the post-connect Connectors list against `adp-production`. @@ -163,7 +163,7 @@ This handshake runs *once per user* when the connector is first added. (Only for user-delegated MCP servers.) -This handshake runs *once per user, per upstream*. For an MCP server using user-delegated OAuth (GitHub, Slack, Atlassian, Workday, etc.): +This handshake runs *once per user, per upstream*. For an MCP server using user-delegated OAuth (GitHub, Slack, Atlassian, Workday, and so on): . The user invokes a tool that requires upstream auth. . AI Gateway has no stored upstream token for this user yet. The MCP protocol returns a `FAILED_PRECONDITION` response with an `OAuthConnectionRequired` error detail. The detail carries an `authorize_url` pointing at AI Gateway's OAuth bridge for the configured upstream provider, for example: `\https://aigw..clusters.rdpa.co/oauth/v1/authorize?provider_name=github&scopes=read:user,repo`. diff --git a/modules/integrations/partials/integrations/claude-code-admin.adoc b/modules/integrations/partials/integrations/claude-code-admin.adoc index af82e65..0fc5f80 100644 --- a/modules/integrations/partials/integrations/claude-code-admin.adoc +++ b/modules/integrations/partials/integrations/claude-code-admin.adoc @@ -30,8 +30,8 @@ Claude Code connects to AI Gateway through two primary endpoints: The gateway handles: -. Authentication via bearer tokens in the `Authorization` header -. Gateway selection via the endpoint URL +. Authentication through bearer tokens in the `Authorization` header +. Gateway selection through the endpoint URL . Model routing using the `vendor/model_id` format . MCP server aggregation for multi-tool workflows . Request logging and cost tracking per gateway @@ -389,7 +389,7 @@ Track Claude Code activity through gateway observability features. |=== -=== Query logs via API +=== Query logs through API Programmatically access logs for integration with monitoring systems: diff --git a/modules/integrations/partials/integrations/cline-admin.adoc b/modules/integrations/partials/integrations/cline-admin.adoc index 991f97b..5a5fd37 100644 --- a/modules/integrations/partials/integrations/cline-admin.adoc +++ b/modules/integrations/partials/integrations/cline-admin.adoc @@ -42,7 +42,7 @@ Cline connects to AI Gateway through two primary endpoints: The gateway handles: -. Authentication via bearer tokens in the `Authorization` header +. Authentication through bearer tokens in the `Authorization` header . Model routing using the `vendor/model_id` format . MCP server aggregation for multi-tool workflows . Request logging and cost tracking per gateway @@ -326,9 +326,9 @@ Provide these instructions to users configuring Cline in VS Code. Users configure Cline's API provider and credentials through the Cline extension interface. -IMPORTANT: API provider configuration (API keys, base URLs, custom headers) is managed via Cline's extension global state, not VS Code `settings.json`. These settings are stored in the extension's internal state and must be configured through the Cline UI. +IMPORTANT: API provider configuration (API keys, base URLs, custom headers) is managed through Cline's extension global state, not VS Code `settings.json`. These settings are stored in the extension's internal state and must be configured through the Cline UI. -==== Configure via Cline UI +==== Configure through Cline UI . Open the Cline extension panel in VS Code . Click the settings icon or gear menu @@ -352,13 +352,13 @@ Configure Cline to connect to the aggregated MCP endpoint through the Cline UI o . Search for "Cline > Mcp: Mode" . Enable the MCP mode toggle -==== Configure MCP server via Cline UI +==== Configure MCP server through Cline UI . Open the Cline extension panel in VS Code . Navigate to MCP server settings . Add the Redpanda AI Gateway MCP server with the connection details -==== Configure via cline_mcp_settings.json +==== Configure through cline_mcp_settings.json Alternatively, edit `cline_mcp_settings.json` (located in the Cline extension storage directory): @@ -438,7 +438,7 @@ Cline autonomous operations may generate request sequences. Look for patterns to |=== -=== Query logs via API +=== Query logs through API Programmatically access logs for integration with monitoring systems: diff --git a/modules/integrations/partials/integrations/continue-admin.adoc b/modules/integrations/partials/integrations/continue-admin.adoc index ec2f631..64aee0d 100644 --- a/modules/integrations/partials/integrations/continue-admin.adoc +++ b/modules/integrations/partials/integrations/continue-admin.adoc @@ -29,8 +29,8 @@ Key characteristics: * Uses native provider formats (Anthropic format for Anthropic, OpenAI format for OpenAI) * Supports multiple LLM providers simultaneously with per-provider configuration -* Custom API endpoints via `apiBase` configuration -* Custom headers via `requestOptions.headers` +* Custom API endpoints through `apiBase` configuration +* Custom headers through `requestOptions.headers` * Built-in MCP support for tool discovery and execution * Autocomplete, chat, and inline edit modes @@ -44,7 +44,7 @@ Continue.dev connects to AI Gateway differently than unified-format clients: The gateway handles: -. Authentication via bearer tokens in the `Authorization` header +. Authentication through bearer tokens in the `Authorization` header . Provider-specific request formats without transformation . Model routing using provider-native model identifiers . MCP server aggregation for multi-tool workflows @@ -580,7 +580,7 @@ Continue.dev generates different request patterns: |=== -=== Query logs via API +=== Query logs through API Programmatically access logs for integration with monitoring systems: diff --git a/modules/integrations/partials/integrations/cursor-admin.adoc b/modules/integrations/partials/integrations/cursor-admin.adoc index 006df4d..0b5faae 100644 --- a/modules/integrations/partials/integrations/cursor-admin.adoc +++ b/modules/integrations/partials/integrations/cursor-admin.adoc @@ -32,7 +32,7 @@ Key characteristics: * Limited support for custom headers (makes multi-tenant deployments challenging) * Supports MCP protocol with a 40-tool limit * Built-in code completion and chat modes -* Configuration via settings file (`~/.cursor/config.json`) +* Configuration through settings file (`~/.cursor/config.json`) == Architecture overview @@ -43,10 +43,10 @@ Cursor IDE connects to AI Gateway through standardized endpoints: The gateway handles: -. Authentication via bearer tokens in the `Authorization` header -. Gateway selection via the endpoint URL +. Authentication through bearer tokens in the `Authorization` header +. Gateway selection through the endpoint URL . Model routing using vendor prefixes (for example, `anthropic/claude-sonnet-4.5`) -. Format transforms from OpenAI format to provider-native formats (for Anthropic, Google, etc.) +. Format transforms from OpenAI format to provider-native formats (for Anthropic, Google, and so on) . MCP server aggregation for multi-tool workflows . Request logging and cost tracking per gateway @@ -627,7 +627,7 @@ Cursor generates different request patterns: |Metric |Purpose |Request volume by provider -|Identify which providers are most used via model prefix routing +|Identify which providers are most used through model prefix routing |Token usage by model |Track consumption patterns (completion vs chat) @@ -646,7 +646,7 @@ Cursor generates different request patterns: |=== -=== Query logs via API +=== Query logs through API Programmatically access logs for integration with monitoring systems: diff --git a/modules/integrations/partials/integrations/github-copilot-admin.adoc b/modules/integrations/partials/integrations/github-copilot-admin.adoc index d55a4f1..2474fc1 100644 --- a/modules/integrations/partials/integrations/github-copilot-admin.adoc +++ b/modules/integrations/partials/integrations/github-copilot-admin.adoc @@ -32,7 +32,7 @@ Key characteristics: * Limited support for custom headers (similar to Cursor IDE) * Supports BYOK for Business/Enterprise subscriptions * Built-in code completion, chat, and inline editing modes -* Configuration via IDE settings or organization policies +* Configuration through IDE settings or organization policies * High request volume from code completion features == Architecture overview @@ -44,8 +44,8 @@ GitHub Copilot connects to AI Gateway through standardized endpoints: The gateway handles: -. Authentication via bearer tokens in the `Authorization` header -. Gateway selection via URL path routing or query parameters +. Authentication through bearer tokens in the `Authorization` header +. Gateway selection through URL path routing or query parameters . Model routing and aliasing for friendly names . Format transforms from OpenAI format to provider-native formats . Request logging and cost tracking per gateway @@ -621,7 +621,7 @@ GitHub Copilot generates distinct request patterns: |Metric |Purpose |Request volume by model -|Identify most-used models via aliases +|Identify most-used models through aliases |Token usage by model |Track consumption patterns (completion vs chat) @@ -643,7 +643,7 @@ GitHub Copilot generates distinct request patterns: |=== -=== Query logs via API +=== Query logs through API Programmatically access logs for integration with monitoring systems: diff --git a/modules/integrations/partials/integrations/github-copilot-user.adoc b/modules/integrations/partials/integrations/github-copilot-user.adoc index 16fc444..9487e6a 100644 --- a/modules/integrations/partials/integrations/github-copilot-user.adoc +++ b/modules/integrations/partials/integrations/github-copilot-user.adoc @@ -28,7 +28,7 @@ Before configuring GitHub Copilot, ensure you have: ** API key with access to the gateway * Your IDE: ** VS Code with GitHub Copilot extension installed -** Or JetBrains IDE (IntelliJ IDEA, PyCharm, etc.) with GitHub Copilot plugin +** Or JetBrains IDE (IntelliJ IDEA, PyCharm, and so on) with GitHub Copilot plugin == About GitHub Copilot and AI Gateway @@ -112,7 +112,7 @@ Replace `https://gw.ai.panda.com/v1` with your gateway endpoint. IMPORTANT: This experimental feature requires configuring API keys and custom headers through the Copilot Chat UI, not in `settings.json`. -==== Configure API key and headers via Copilot Chat UI +==== Configure API key and headers through Copilot Chat UI . Open Copilot Chat in VS Code (`Cmd+I` or `Ctrl+I`) . Click the model selector dropdown @@ -163,7 +163,7 @@ Add the base URL configuration in VS Code settings: Replace `https://gw.ai.panda.com/v1` with your gateway endpoint. -==== Configure API key and headers via Copilot Chat UI +==== Configure API key and headers through Copilot Chat UI IMPORTANT: Do not configure API keys or custom headers in `settings.json`. Use the Copilot Chat UI instead. @@ -195,7 +195,7 @@ JetBrains IDE integration requires GitHub Copilot Enterprise with Bring Your Own === Configure BYOK with AI Gateway -. Open your JetBrains IDE (IntelliJ IDEA, PyCharm, etc.) +. Open your JetBrains IDE (IntelliJ IDEA, PyCharm, and so on) . Navigate to *Settings/Preferences*: ** macOS: `Cmd+,` ** Windows/Linux: `Ctrl+Alt+S` @@ -320,7 +320,7 @@ For large organizations deploying GitHub Copilot Enterprise with AI Gateway acro === Centralized configuration management -Distribute IDE configuration files via: +Distribute IDE configuration files through: * **Git repository**: Store `settings.json` or IDE configuration in a shared repository * **Configuration management tools**: Puppet, Chef, Ansible @@ -372,13 +372,13 @@ Single key for all developers: === Automated provisioning workflow . Developer joins organization -. Identity system (Okta, Azure AD, etc.) triggers provisioning: +. Identity system (Okta, Azure AD, and so on) triggers provisioning: .. Create Redpanda API key .. Assign to appropriate gateway .. Generate IDE configuration file with embedded keys .. Distribute to developer workstation . Developer installs IDE and GitHub Copilot -. Configuration auto-applies (via MDM or configuration management) +. Configuration auto-applies (through MDM or configuration management) . Developer starts using Copilot immediately === Observability and governance diff --git a/modules/mcp/pages/create-server.adoc b/modules/mcp/pages/create-server.adoc index 6d0f457..dfd09b9 100644 --- a/modules/mcp/pages/create-server.adoc +++ b/modules/mcp/pages/create-server.adoc @@ -111,7 +111,7 @@ Both managed and self-managed servers offer the same five authentication modes. |2-legged OAuth client credentials. One shared upstream identity for every caller. Provide `client_id`, `client_secret_ref`, `token_url`, and any required `scopes`. |*User-delegated OAuth* -|Each end-user authenticates against the upstream system with their own credentials, and Redpanda injects the user's token at call time. Pick the configured *OAuth Provider* and the required scopes. The first time a user calls a tool that needs this server, Redpanda surfaces a consent prompt; the resulting connection is stored in the token vault and shows up under *My Connections*. See xref:mcp:user-delegated-oauth.adoc[User-delegated OAuth] for the full flow. +|Each end-user authenticates against the upstream system with their own credentials, and Redpanda injects the user's token at call time. Pick the configured *OAuth Provider* and the required scopes. The first time a user calls a tool that needs this server, Redpanda surfaces a consent prompt; Redpanda stores the resulting connection in the token vault, where it shows up under *My Connections*. See xref:mcp:user-delegated-oauth.adoc[User-delegated OAuth] for the full flow. |=== NOTE: Choosing between *Service-account OAuth* and *User-delegated OAuth* is the credential-mode decision. Service-account auth gives every caller the same identity at the upstream; user-delegated auth gives each caller their own. diff --git a/modules/mcp/pages/managed/ironclad.adoc b/modules/mcp/pages/managed/ironclad.adoc index b3cc1c3..e024aa4 100644 --- a/modules/mcp/pages/managed/ironclad.adoc +++ b/modules/mcp/pages/managed/ironclad.adoc @@ -25,8 +25,8 @@ It is *not* a replacement for the Ironclad web UI for complex workflow managemen Before you create the server, make sure you have: * An Ironclad tenant where you can register an OAuth app. -* An OAuth Provider configured in ADP for Ironclad. See xref:../oauth-providers.adoc[Configure an OAuth Provider]. -* Familiarity with xref:../user-delegated-oauth.adoc[User-delegated OAuth]. +* An OAuth Provider configured in ADP for Ironclad. See xref:mcp:oauth-providers.adoc[Configure an OAuth Provider]. +* Familiarity with xref:mcp:user-delegated-oauth.adoc[User-delegated OAuth]. == Get Ironclad credentials @@ -159,6 +159,6 @@ This page does not cover: == Related topics -* xref:../oauth-providers.adoc[Configure an OAuth Provider] -* xref:../user-delegated-oauth.adoc[User-delegated OAuth] -* xref:../create-server.adoc[Create an MCP Server] +* xref:mcp:oauth-providers.adoc[Configure an OAuth Provider] +* xref:mcp:user-delegated-oauth.adoc[User-delegated OAuth] +* xref:mcp:create-server.adoc[Create an MCP Server] diff --git a/modules/mcp/pages/managed/jira.adoc b/modules/mcp/pages/managed/jira.adoc index 62f4eb9..3942684 100644 --- a/modules/mcp/pages/managed/jira.adoc +++ b/modules/mcp/pages/managed/jira.adoc @@ -6,7 +6,7 @@ :learning-objective-2: Pick the right scopes for read-only vs. read-write workflows :learning-objective-3: Walk a user through the consent flow and verify the connection -The *Jira* managed MCP server gives agents access to Jira issues, projects, and workflows on behalf of the calling user. It's the enterprise counterpart to xref:managed/slack.adoc[the Slack deep-dive] — both use user-delegated OAuth, but Atlassian's flow has its own scope model and quirks worth calling out. +The *Jira* managed MCP server gives agents access to Jira issues, projects, and workflows on behalf of the calling user. It's the enterprise counterpart to xref:managed/slack.adoc[the Slack deep-dive]: both use user-delegated OAuth, but Atlassian's flow has its own scope model and quirks worth calling out. After completing this guide, you will be able to: @@ -32,7 +32,7 @@ The Jira managed type exposes tools for: + // TODO: confirm whether Redpanda publishes a reference Atlassian app or whether each customer brings their own. * An OAuth Provider in the ADP UI configured for Atlassian's authorize/token URLs and carrying the app's client credentials. -* Familiarity with xref:../user-delegated-oauth.adoc[]. +* Familiarity with xref:mcp:user-delegated-oauth.adoc[]. == Atlassian's scope model @@ -46,13 +46,13 @@ Atlassian uses a granular, prefixed scope namespace. Common scopes: |Read user profile. |`read:jira-work` -|Read issues, projects, sprints, etc. (read-only operation). +|Read issues, projects, sprints, and so on (read-only operation). |`write:jira-work` |Create, update, transition issues, add comments. |`offline_access` -|Issue a refresh token so Redpanda can refresh expired access tokens. *Required* for any long-lived MCP server — without it, tokens expire after one hour and users re-consent every time. +|Issue a refresh token so Redpanda can refresh expired access tokens. *Required* for any long-lived MCP server: without it, tokens expire after one hour and users re-consent every time. |=== // TODO: confirm minimum required scopes per tool from the Jira server's tool registration. @@ -66,9 +66,9 @@ NOTE: Always include `offline_access` in `required_scopes`. Without it, `OAuthTo . Fill in identity fields (`name`, `description`). . In the Jira configuration form: + -* *Auth* — choose *User-delegated OAuth*. -* *OAuth Provider* — pick the Atlassian provider you configured. -* *Required scopes* — at minimum `read:jira-user`, `read:jira-work`, `offline_access`. Add `write:jira-work` if your agents will create or update issues. +* *Auth*: Choose *User-delegated OAuth*. +* *OAuth Provider*: Pick the Atlassian provider you configured. +* *Required scopes*: At minimum `read:jira-user`, `read:jira-work`, `offline_access`. Add `write:jira-work` if your agents will create or update issues. . Click *Create*. // TODO: capture screenshots of the Jira form on `adp-production`. @@ -86,7 +86,7 @@ NOTE: Always include `offline_access` in `required_scopes`. Without it, `OAuthTo Point an agent at the *API URL* on the server's detail page. Each user calling the agent will trigger their own consent flow on first call. -For agents that need both read and write capabilities, define the server's `required_scopes` to include all the scopes any tool might need — Atlassian doesn't allow per-tool scope upgrades, so the user consents once with the full set. +For agents that need both read and write capabilities, define the server's `required_scopes` to include all the scopes any tool might need: Atlassian doesn't allow per-tool scope upgrades, so the user consents once with the full set. == Troubleshooting @@ -111,6 +111,6 @@ For agents that need both read and write capabilities, define the server's `requ == Limitations -* *Atlassian app management*: the OAuth app and its callback URLs are managed in `developer.atlassian.com`, not in ADP. -* *Jira Server / Data Center* (self-hosted): this MCP type targets Atlassian Cloud. Self-hosted Jira may need a self-managed MCP server instead. See xref:../register-remote.adoc[Register a self-managed MCP server]. -* *Confluence access*: separate scope namespace; not exposed by this MCP server. +* *Atlassian app management*: The OAuth app and its callback URLs are managed in `developer.atlassian.com`, not in ADP. +* *Jira Server / Data Center* (self-hosted): This MCP type targets Atlassian Cloud. Self-hosted Jira may need a self-managed MCP server instead. See xref:mcp:register-remote.adoc[Register a self-managed MCP server]. +* *Confluence access*: Separate scope namespace; not exposed by this MCP server. diff --git a/modules/mcp/pages/managed/kafka.adoc b/modules/mcp/pages/managed/kafka.adoc index 2011ee9..3789af1 100644 --- a/modules/mcp/pages/managed/kafka.adoc +++ b/modules/mcp/pages/managed/kafka.adoc @@ -6,7 +6,7 @@ :learning-objective-2: Produce a test message through the Inspector and consume it back :learning-objective-3: Pick the right authentication mode for your broker (PLAIN / SCRAM / mTLS / OAuth) -The *Kafka* managed MCP server gives agents read and write access to topics on either an Apache Kafka cluster or a Redpanda cluster — the same protocol, the same tools. Despite the name, it works against any Kafka-compatible broker. +The *Kafka* managed MCP server gives agents read and write access to topics on either an Apache Kafka cluster or a Redpanda cluster: the same protocol, the same tools. Despite the name, it works against any Kafka-compatible broker. After completing this guide, you will be able to: @@ -29,9 +29,9 @@ The Kafka managed type proxies a managed Kafka client. It exposes tools for: * A Kafka or Redpanda cluster reachable from the Agentic Data Plane. + -// TODO: confirm reachability requirements (public bootstrap, peering, etc.) once the standalone ADP product surface ships. +// TODO: confirm reachability requirements (public bootstrap, peering, and so on) once the standalone ADP product surface ships. * The cluster's bootstrap servers and SASL/TLS settings. -* For SCRAM or PLAIN: secrets in the ADP secret store for the username and password (`UPPER_SNAKE_CASE`, for example `KAFKA_SASL_USER` and `KAFKA_SASL_PASSWORD`). +* For SCRAM or PLAIN: Secrets in the ADP secret store for the username and password (`UPPER_SNAKE_CASE`, for example `KAFKA_SASL_USER` and `KAFKA_SASL_PASSWORD`). == Configure @@ -42,10 +42,10 @@ The Kafka managed type proxies a managed Kafka client. It exposes tools for: + // TODO: enumerate exact fields from `proto/mcps/redpanda/mcps/kafka/v1/kafka_config.proto`. + -* *Seed brokers* — comma-separated bootstrap addresses. -* *TLS* settings — usually on for production, off for local dev. -* *SASL mechanism* — `PLAIN`, `SCRAM-SHA-256`, `SCRAM-SHA-512`, or `OAUTHBEARER`. -* *Username / password references* — `UPPER_SNAKE_CASE` secret references. +* *Seed brokers*: Comma-separated bootstrap addresses +* *TLS* settings: Usually on for production, off for local dev +* *SASL mechanism*: `PLAIN`, `SCRAM-SHA-256`, `SCRAM-SHA-512`, or `OAUTHBEARER` +* *Username / password references*: `UPPER_SNAKE_CASE` secret references . Click *Create*. // TODO: screenshot of the Kafka form filled in for a Redpanda Cloud cluster on `adp-production`. @@ -59,11 +59,11 @@ The Kafka managed type proxies a managed Kafka client. It exposes tools for: // TODO: confirm exact tool names for produce/consume/list-topics and capture screenshots. -See xref:../test-tools.adoc[] for general Inspector usage. +See xref:mcp:test-tools.adoc[] for general Inspector usage. == Authentication -The Kafka managed type's authentication is part of its config, not the generic MCP auth oneof — it speaks Kafka protocol auth (SASL/SSL), not MCP auth. +The Kafka managed type's authentication is part of its config, not the generic MCP auth oneof: it speaks Kafka protocol auth (SASL/SSL), not MCP auth. [cols="1,2"] |=== @@ -111,6 +111,6 @@ Once the Kafka server is created, point an agent at the *API URL* on the server' == Limitations -* *Kafka administration* (creating topics, managing ACLs): this server is read/produce/consume-focused. For administration, use rpk or your broker's admin API. -* *Schema registry*: not exposed by this MCP server. -* *Streaming joins or processing*: for stream processing, use Redpanda Connect. +* *Kafka administration* (creating topics, managing ACLs): This server is read/produce/consume-focused. For administration, use rpk or your broker's admin API. +* *Schema registry*: Not exposed by this MCP server. +* *Streaming joins or processing*: For stream processing, use Redpanda Connect. diff --git a/modules/mcp/pages/managed/openapi.adoc b/modules/mcp/pages/managed/openapi.adoc index e054465..09688d7 100644 --- a/modules/mcp/pages/managed/openapi.adoc +++ b/modules/mcp/pages/managed/openapi.adoc @@ -6,7 +6,7 @@ :learning-objective-2: Pick the right authentication mode for the upstream API :learning-objective-3: Verify generated tools through the Inspector -The *OpenAPI* managed MCP server is the "bring your own API" escape hatch. Hand it an OpenAPI 3 (or Swagger 2) spec, and it generates one MCP tool per operation in the spec. No custom code, no per-API managed type — useful when the API you want to expose is not in the catalog. +The *OpenAPI* managed MCP server is the "bring your own API" escape hatch. Hand it an OpenAPI 3 (or Swagger 2) spec, and it generates one MCP tool per operation in the spec. No custom code, no per-API managed type: useful when the API you want to expose is not in the catalog. After completing this guide, you will be able to: @@ -39,15 +39,15 @@ The OpenAPI managed type: + // TODO: enumerate exact fields from `apps/aigw/internal/mcp/managed/mcps/openapi/register.go` (spec source, base URL override, tool-name strategy, operation include/exclude lists). + -* *Spec source* — URL or inline text. -* *Base URL override* (optional) — useful when the spec's `servers` block doesn't match your environment. -* *Operations to expose* (optional) — filter by `operationId`, path, or method if you want only a subset. +* *Spec source*: URL or inline text. +* *Base URL override* (optional): Useful when the spec's `servers` block doesn't match your environment. +* *Operations to expose* (optional): Filter by `operationId`, path, or method if you want only a subset. . Configure auth (see <>). . Click *Create*. == Authentication -OpenAPI is the most flexible managed type for auth — the upstream API can need anything. All the standard auth modes apply: +OpenAPI is the most flexible managed type for auth: the upstream API can need anything. All the standard auth modes apply: [cols="1,2"] |=== @@ -57,7 +57,7 @@ OpenAPI is the most flexible managed type for auth — the upstream API can need |Public APIs (rare in practice). |*Static key* -|API expects a token in a header. Set `header_name` to the API's expected header (`Authorization`, `X-Api-Key`, etc.). +|API expects a token in a header. Set `header_name` to the API's expected header (`Authorization`, `X-Api-Key`, and so on). |*Token passthrough* |The API already validates the caller's `Authorization` header. @@ -66,7 +66,7 @@ OpenAPI is the most flexible managed type for auth — the upstream API can need |API supports OAuth client credentials and you want one shared identity. |*User-delegated OAuth* -|API supports OAuth on behalf of users and you want per-user identities. Requires an OAuth Provider configured for that API. See xref:../user-delegated-oauth.adoc[]. +|API supports OAuth on behalf of users and you want per-user identities. Requires an OAuth Provider configured for that API. See xref:mcp:user-delegated-oauth.adoc[]. |=== == Test @@ -77,7 +77,7 @@ OpenAPI is the most flexible managed type for auth — the upstream API can need // TODO: capture screenshots of a non-trivial OpenAPI spec rendered in the Inspector once we walk a real example on `adp-production`. -See xref:../test-tools.adoc[] for general Inspector usage. +See xref:mcp:test-tools.adoc[] for general Inspector usage. == Use with agents @@ -96,7 +96,7 @@ Once tools generate cleanly, point an agent at the *API URL* on the server's det |The operation may be missing an `operationId`, or the operation include/exclude filters might be excluding it. |Tool input schema looks wrong -|Spec features (oneOf, discriminator, etc.) might not translate cleanly. See the TODO above on supported versions/features. +|Spec features (oneOf, discriminator, and so on) might not translate cleanly. See the TODO above on supported versions/features. |Calls return 401 |Auth mode or credentials are wrong. Confirm secret content and the API's expected auth header. @@ -109,6 +109,6 @@ Once tools generate cleanly, point an agent at the *API URL* on the server's det == Limitations -* *Custom tool logic*: the OpenAPI type is purely a spec-to-tools generator. For business logic on top of the API, use a xref:../register-remote.adoc[self-managed MCP server]. +* *Custom tool logic*: The OpenAPI type is purely a spec-to-tools generator. For business logic on top of the API, use a xref:mcp:register-remote.adoc[self-managed MCP server]. * *GraphQL APIs*: OpenAPI doesn't describe GraphQL. For GraphQL APIs, use a self-managed server. -* *gRPC services*: same as GraphQL — use a self-managed server. +* *gRPC services*: Same as GraphQL: use a self-managed server. diff --git a/modules/mcp/pages/managed/ramp.adoc b/modules/mcp/pages/managed/ramp.adoc index 0516888..4d44041 100644 --- a/modules/mcp/pages/managed/ramp.adoc +++ b/modules/mcp/pages/managed/ramp.adoc @@ -25,8 +25,8 @@ It is suitable for expense analysis, spend-policy enforcement, and corporate car Before you create the server, make sure you have: * A Ramp account with admin access to the Ramp Developer Portal. -* An OAuth Provider configured in ADP for Ramp. See xref:../oauth-providers.adoc[Configure an OAuth Provider]. -* Familiarity with xref:../user-delegated-oauth.adoc[User-delegated OAuth]. +* An OAuth Provider configured in ADP for Ramp. See xref:mcp:oauth-providers.adoc[Configure an OAuth Provider]. +* Familiarity with xref:mcp:user-delegated-oauth.adoc[User-delegated OAuth]. == Get Ramp credentials @@ -216,6 +216,6 @@ This page does not cover: == Related topics -* xref:../oauth-providers.adoc[Configure an OAuth Provider] -* xref:../user-delegated-oauth.adoc[User-delegated OAuth] -* xref:../create-server.adoc[Create an MCP Server] +* xref:mcp:oauth-providers.adoc[Configure an OAuth Provider] +* xref:mcp:user-delegated-oauth.adoc[User-delegated OAuth] +* xref:mcp:create-server.adoc[Create an MCP Server] diff --git a/modules/mcp/pages/managed/slack.adoc b/modules/mcp/pages/managed/slack.adoc index 388d26c..7688b55 100644 --- a/modules/mcp/pages/managed/slack.adoc +++ b/modules/mcp/pages/managed/slack.adoc @@ -32,8 +32,8 @@ Before you create the server, make sure you have: * A Slack OAuth app registered (your own or a Redpanda-published reference app). + // TODO: confirm whether Redpanda ships a reference Slack OAuth app or whether each customer brings their own. Document the path. -* An OAuth Provider configured in the ADP UI under *OAuth Providers*, pointing at Slack's authorize/token URLs and carrying the OAuth app's client credentials. See xref:../oauth-providers.adoc[Configure an OAuth Provider]. -* Familiarity with xref:../user-delegated-oauth.adoc[]. +* An OAuth Provider configured in the ADP UI under *OAuth Providers*, pointing at Slack's authorize/token URLs and carrying the OAuth app's client credentials. See xref:mcp:oauth-providers.adoc[Configure an OAuth Provider]. +* Familiarity with xref:mcp:user-delegated-oauth.adoc[]. == Configure diff --git a/modules/mcp/pages/managed/sql.adoc b/modules/mcp/pages/managed/sql.adoc index da7c103..d2b1d38 100644 --- a/modules/mcp/pages/managed/sql.adoc +++ b/modules/mcp/pages/managed/sql.adoc @@ -43,19 +43,19 @@ It exposes a small set of tools for querying schemas and running parameterized q . Fill in the identity fields (`name`, `description`). . In the SQL configuration form, provide: + -// TODO: enumerate exact fields from `proto/mcps/redpanda/mcps/sql/v1/sql_config.proto` — driver, DSN/connection-string format, query timeout, read-vs-write-tool split, max rows. +// TODO: enumerate exact fields from `proto/mcps/redpanda/mcps/sql/v1/sql_config.proto`: driver, DSN/connection-string format, query timeout, read-vs-write-tool split, max rows. + -* *Driver* — Postgres, MySQL, ClickHouse, MSSQL, or SQLite. -* *Connection string* — driver-specific. -* *Password reference* — `UPPER_SNAKE_CASE` secret reference. -* *Query timeout* (optional). +* *Driver*: Postgres, MySQL, ClickHouse, MSSQL, or SQLite +* *Connection string*: Driver-specific +* *Password reference*: `UPPER_SNAKE_CASE` secret reference +* *Query timeout* (optional) . Click *Create*. // TODO: screenshot of the SQL form filled in for a Postgres example on `adp-production`. == Test -After create, exercise the server through the Inspector tab. See xref:../test-tools.adoc[]. +After create, exercise the server through the Inspector tab. See xref:mcp:test-tools.adoc[]. A canonical first call: @@ -70,10 +70,10 @@ A canonical first call: The SQL managed type supports: -* *None* — only useful for local SQLite without auth. -* *Static key* — most common; the password lives in a secret reference. +* *None*: Only useful for local SQLite without auth. +* *Static key*: Most common; the password lives in a secret reference. -User-delegated OAuth and service-account OAuth are *not* supported for SQL — there's no per-user identity model that maps to a database password. +User-delegated OAuth and service-account OAuth are *not* supported for SQL: there's no per-user identity model that maps to a database password. // TODO: confirm exact auth modes the SQL type registers in its `register_mcp.go` once an auth-mode list is exposed on `ManagedMCPType`. @@ -106,5 +106,5 @@ Once the SQL server is created, point an agent at the *API URL* on the server's == Limitations -* *Schema migrations*: this server is read/query-focused; running DDL is not the intended use. -* *Per-user database identities*: see <>. +* *Schema migrations*: This server is read/query-focused; running DDL is not the intended use. +* *Per-user database identities*: See <>. diff --git a/modules/mcp/pages/managed/workday.adoc b/modules/mcp/pages/managed/workday.adoc index 15c1289..4185fcd 100644 --- a/modules/mcp/pages/managed/workday.adoc +++ b/modules/mcp/pages/managed/workday.adoc @@ -197,5 +197,5 @@ This page does not cover: == Related topics -* xref:create-server.adoc[Create an MCP Server] -* xref:test-tools.adoc[Test a server's tools] +* xref:mcp:create-server.adoc[Create an MCP Server] +* xref:mcp:test-tools.adoc[Test a server's tools] diff --git a/modules/mcp/pages/managed/zendesk.adoc b/modules/mcp/pages/managed/zendesk.adoc index 4247802..9cf7aba 100644 --- a/modules/mcp/pages/managed/zendesk.adoc +++ b/modules/mcp/pages/managed/zendesk.adoc @@ -31,7 +31,7 @@ Before you create the server, make sure you have: * A Zendesk Support instance. * For *API token* mode: ability to create an API token under *Apps and integrations > APIs > Zendesk API*. -* For *User OAuth* mode: a Zendesk OAuth client and an OAuth Provider configured in ADP. See xref:../oauth-providers.adoc[Configure an OAuth Provider]. +* For *User OAuth* mode: a Zendesk OAuth client and an OAuth Provider configured in ADP. See xref:mcp:oauth-providers.adoc[Configure an OAuth Provider]. == Get Zendesk credentials @@ -50,7 +50,7 @@ Before you create the server, make sure you have: For per-user authentication, register an OAuth client on Zendesk and a matching OAuth Provider in ADP: . Configure a Zendesk OAuth client under *Apps and integrations > APIs > OAuth Clients* (Confidential client, Authorization Code grant). -. Register a matching OAuth Provider in ADP. See xref:../oauth-providers.adoc[Configure an OAuth Provider]. Use Zendesk's authorize and token endpoints. +. Register a matching OAuth Provider in ADP. See xref:mcp:oauth-providers.adoc[Configure an OAuth Provider]. Use Zendesk's authorize and token endpoints. . Each end-user authenticates once through the OAuth flow; tokens are stored in the gateway's token vault. *Required scopes*: `read tickets:write hc:read` covers all 12 tools. Drop `tickets:write` if the MCP only needs to read. @@ -255,7 +255,7 @@ Common symptoms and fixes: |The agent role on Zendesk's side is below *Light Agent*. Upgrade the role or use API-token mode with a Light Agent or Admin email. |`OAuthConnectionRequired` (User-OAuth mode) -|First call from a user with no stored token. The user completes Zendesk's OAuth consent flow, the token lands in the vault, and subsequent calls reuse it. See xref:../user-delegated-oauth.adoc[User-delegated OAuth]. +|First call from a user with no stored token. The user completes Zendesk's OAuth consent flow, the token lands in the vault, and subsequent calls reuse it. See xref:mcp:user-delegated-oauth.adoc[User-delegated OAuth]. |`scope_upgrade_required` (User-OAuth mode) |Server's `required_scopes` was extended after users had already consented. Users re-consent with the higher scope. @@ -273,7 +273,7 @@ This page does not cover: == Related topics -* xref:../oauth-providers.adoc[Configure an OAuth Provider] -* xref:../user-delegated-oauth.adoc[User-delegated OAuth] -* xref:../create-server.adoc[Create an MCP Server] -* xref:../test-tools.adoc[Test a server's tools] +* xref:mcp:oauth-providers.adoc[Configure an OAuth Provider] +* xref:mcp:user-delegated-oauth.adoc[User-delegated OAuth] +* xref:mcp:create-server.adoc[Create an MCP Server] +* xref:mcp:test-tools.adoc[Test a server's tools] diff --git a/modules/mcp/pages/oauth-providers.adoc b/modules/mcp/pages/oauth-providers.adoc index 019b19f..76ec2f9 100644 --- a/modules/mcp/pages/oauth-providers.adoc +++ b/modules/mcp/pages/oauth-providers.adoc @@ -18,7 +18,7 @@ After completing this guide, you will be able to: ==== OAuth providers and OAuth clients are different resources. An *OAuth provider* (this page) is a definition of an upstream system the gateway authenticates against. An *OAuth client* is a per-application credential issued *by* the AI Gateway's own identity provider, used by external clients to call ADP. They live under separate sidebar entries (*OAuth Providers* and *OAuth Clients*) and have separate proto, permissions, and lifecycles. -To register or manage an OAuth Client (Claude Desktop, ChatGPT, Cursor, and so on) — including revoking its refresh tokens to force a re-sign-in — see xref:integrations:remote-mcp-clients.adoc[Connect remote MCP clients to AI Gateway]. +To register or manage an OAuth Client (Claude Desktop, ChatGPT, Cursor, and so on), including revoking its refresh tokens to force a re-sign-in, see xref:integrations:remote-mcp-clients.adoc[Connect remote MCP clients to AI Gateway]. ==== == Prerequisites diff --git a/modules/mcp/pages/register-remote.adoc b/modules/mcp/pages/register-remote.adoc index 2a20056..f240ded 100644 --- a/modules/mcp/pages/register-remote.adoc +++ b/modules/mcp/pages/register-remote.adoc @@ -6,7 +6,7 @@ :learning-objective-2: Pick the right transport (SSE vs. Streamable HTTP) and authentication mode :learning-objective-3: Confirm tool discovery completed and the server is reachable through its proxy URL -Register your existing MCP server with Redpanda to add authentication, observability, and agent aggregation without changing your server's code. This guide covers the self-managed path from xref:create-server.adoc[Create an MCP Server] in depth. Choose this when you already run a server and want Redpanda to proxy it. +Register your existing MCP server with Redpanda to add authentication, observability, and agent aggregation without changing your server's code. This guide covers the self-managed path from xref:mcp:create-server.adoc[Create an MCP Server] in depth. Choose this when you already run a server and want Redpanda to proxy it. After completing this guide, you will be able to: @@ -32,7 +32,7 @@ If you don't already run a server, prefer a managed type. See xref:managed/manag * The endpoint URL. `http://` is allowed for everything except user-delegated OAuth, which requires `https://` (proto rule `remote_mcp.user_oauth_requires_https`). * Knowledge of which transport the server speaks (SSE or Streamable HTTP). If you don't know, see <>. * If using static-key or service-account-OAuth: secrets pre-created in the ADP secret store, `UPPER_SNAKE_CASE` (proto regex `^[A-Z][A-Z0-9_]*$`). -* If using user-delegated OAuth: an OAuth Provider already configured. See xref:user-delegated-oauth.adoc[User-delegated OAuth]. +* If using user-delegated OAuth: an OAuth Provider already configured. See xref:mcp:user-delegated-oauth.adoc[User-delegated OAuth]. == Create the server @@ -40,7 +40,7 @@ If you don't already run a server, prefer a managed type. See xref:managed/manag . In the marketplace picker, choose *Remote (Proxied)*. + // TODO: screenshot of the marketplace picker with Remote (Proxied) highlighted. -. Fill in the identity fields (`name`, `description`, `enabled`); same constraints as in xref:create-server.adoc[Create an MCP Server]. +. Fill in the identity fields (`name`, `description`, `enabled`); same constraints as in xref:mcp:create-server.adoc[Create an MCP Server]. . Provide the *URL* and *Transport*. . Configure authentication (see <>). . Click *Create*. @@ -73,7 +73,7 @@ curl -X POST -H "Content-Type: application/json" \ [[authentication]] == Authentication -The five auth modes from xref:create-server.adoc#configure-authentication[create-server.adoc] all apply. Three patterns are particularly common for self-managed servers: +The five auth modes from xref:mcp:create-server.adoc#configure-authentication[create-server.adoc] all apply. Three patterns are particularly common for self-managed servers: [cols="1,2"] |=== @@ -89,13 +89,13 @@ The five auth modes from xref:create-server.adoc#configure-authentication[create |The upstream server already validates client tokens; Redpanda just forwards the caller's `Authorization` header. |=== -For user-delegated OAuth, the URL must be `https://` and you also need an OAuth Provider. See xref:user-delegated-oauth.adoc[User-delegated OAuth]. +For user-delegated OAuth, the URL must be `https://` and you also need an OAuth Provider. See xref:mcp:user-delegated-oauth.adoc[User-delegated OAuth]. // TODO: screenshots of each auth-mode form panel after walking `adp-production`. == Tool discovery -After create, Redpanda runs a live `tools/list` (the `ListMCPServerTools` RPC) against the server. The result is cached on the `MCPServer.tools` output-only field and shown on the detail page's *Overview* tab. The *Inspector* tab (see xref:test-tools.adoc[Test a server's tools]) exercises individual glossterm:tool[,tools]. +After create, Redpanda runs a live `tools/list` (the `ListMCPServerTools` RPC) against the server. The result is cached on the `MCPServer.tools` output-only field and shown on the detail page's *Overview* tab. The *Inspector* tab (see xref:mcp:test-tools.adoc[Test a server's tools]) exercises individual glossterm:tool[,tools]. If the tools list is empty or stale, hit the *Refresh tools* action on the Overview tab to re-query the server. @@ -108,7 +108,7 @@ If the tools list is empty or stale, hit the *Refresh tools* action on the Overv |Error |What it means |`OAuthConnectionRequired` -|The user-delegated auth path needs a stored token vault entry for the calling user. Redpanda surfaces an `authorize_url` so the user can complete the consent flow. See xref:user-delegated-oauth.adoc[User-delegated OAuth]. +|The user-delegated auth path needs a stored token vault entry for the calling user. Redpanda surfaces an `authorize_url` so the user can complete the consent flow. See xref:mcp:user-delegated-oauth.adoc[User-delegated OAuth]. |`OAuthTokenExpired` |The user's stored token has expired and refresh failed. Surface the new authorize URL and have the user re-consent. @@ -139,6 +139,6 @@ If the tools list is empty or stale, hit the *Refresh tools* action on the Overv == Related topics -* xref:user-delegated-oauth.adoc[User-delegated OAuth] for the consent flow. +* xref:mcp:user-delegated-oauth.adoc[User-delegated OAuth] for the consent flow. * xref:ai-gateway:aggregation.adoc[MCP aggregation] for fronting many MCP servers behind a single URL with AI Gateway. * Server hosting and deployment guidance is your responsibility: Redpanda doesn't operate self-managed servers; deployment, scaling, and patching are up to you. diff --git a/modules/mcp/pages/user-delegated-oauth.adoc b/modules/mcp/pages/user-delegated-oauth.adoc index b20efe8..77129e2 100644 --- a/modules/mcp/pages/user-delegated-oauth.adoc +++ b/modules/mcp/pages/user-delegated-oauth.adoc @@ -16,14 +16,14 @@ After completing this guide, you will be able to: == Prerequisites -* An OAuth provider resource configured in the ADP UI under *OAuth providers*. The provider declares the upstream's `authorize_url`, `token_url`, supported scopes, and client credentials. See xref:oauth-providers.adoc[Configure an OAuth Provider]. +* An OAuth provider resource configured in the ADP UI under *OAuth providers*. The provider declares the upstream's `authorize_url`, `token_url`, supported scopes, and client credentials. See xref:mcp:oauth-providers.adoc[Configure an OAuth Provider]. * The required scopes for the upstream API you plan to call. * For *self-managed* MCP servers: the server URL must be `https://` (proto rule `remote_mcp.user_oauth_requires_https`). HTTP is rejected at create time. * For *managed* MCP servers: the type must support user-delegated OAuth. SQL doesn't; Slack, Jira, and Google managed types do. Check xref:managed/managed-catalog.adoc[Managed catalog] before configuring. == Configure the server -. Create or edit your MCP server (see xref:create-server.adoc[Create an MCP Server]). +. Create or edit your MCP server (see xref:mcp:create-server.adoc[Create an MCP Server]). . In the auth section, choose *User-delegated OAuth*. . Pick the configured *OAuth provider* (`UserOAuthAuth.provider_name`). . List the *required scopes* (`UserOAuthAuth.required_scopes`). Redpanda enforces these at consent time. @@ -34,7 +34,7 @@ After completing this guide, you will be able to: NOTE: Choosing user-delegated OAuth instead of service-account OAuth *is* the credential-mode decision: there's no separate field. User-delegated gives each caller a per-user upstream identity; service-account gives every caller one shared identity. Switching between them later requires re-consent for every active user. -TIP: To configure user-delegated OAuth from the CLI, use `--user-oauth-provider` and `--user-oauth-scopes` on `rpk ai mcp create` or `rpk ai mcp update`. See xref:create-server.adoc[Create an MCP Server]. +TIP: To configure user-delegated OAuth from the CLI, use `--user-oauth-provider` and `--user-oauth-scopes` on `rpk ai mcp create` or `rpk ai mcp update`. See xref:mcp:create-server.adoc[Create an MCP Server]. == The user connection flow @@ -66,7 +66,7 @@ When refresh fails (revoked token, idle too long, upstream error), the next tool If you want one shared upstream identity for every caller (instead of per-user identities), choose *Service-account OAuth* on the server instead of *User-delegated OAuth*. With service-account OAuth, every caller of every tool sees the same upstream identity; the upstream system has no idea which ADP user invoked the tool. With user-delegated OAuth, the upstream system sees each end-user as themselves and applies their own permissions. -For the field-by-field service-account-OAuth setup, see xref:create-server.adoc#configure-authentication[create-server.adoc]. +For the field-by-field service-account-OAuth setup, see xref:mcp:create-server.adoc#configure-authentication[create-server.adoc]. == Worked examples @@ -97,6 +97,6 @@ For the field-by-field service-account-OAuth setup, see xref:create-server.adoc# == Related topics -* xref:oauth-providers.adoc[Configure an OAuth Provider] for the separate workflow that creates the providers referenced from MCP servers. -* xref:create-server.adoc#configure-authentication[Service-account OAuth setup] for the shared-identity alternative. +* xref:mcp:oauth-providers.adoc[Configure an OAuth Provider] for the separate workflow that creates the providers referenced from MCP servers. +* xref:mcp:create-server.adoc#configure-authentication[Service-account OAuth setup] for the shared-identity alternative. * Token vault internals are not user-configurable. Redpanda manages the vault; users see their own connections under *My Connections*. diff --git a/modules/observability/pages/concepts.adoc b/modules/observability/pages/concepts.adoc index 9094f19..acec47c 100644 --- a/modules/observability/pages/concepts.adoc +++ b/modules/observability/pages/concepts.adoc @@ -18,7 +18,7 @@ After reading this page, you will be able to: == What are transcripts -A transcript records the complete execution of an agentic behavior from start to finish. It captures every step — across multiple agents, tools, models, and services — in a single, traceable record. The AI Gateway and every glossterm:AI agent[,agent] and glossterm:MCP server[] in your Agentic Data Plane (ADP) automatically emit OpenTelemetry traces to a glossterm:topic[] called `redpanda.otel_traces`. Redpanda's immutable distributed log stores these traces. +A transcript records the complete execution of an agentic behavior from start to finish. It captures every step (across multiple agents, tools, models, and services) in a single, traceable record. The AI Gateway and every glossterm:AI agent[,agent] and glossterm:MCP server[] in your Agentic Data Plane (ADP) automatically emit OpenTelemetry traces to a glossterm:topic[] called `redpanda.otel_traces`. Redpanda's immutable distributed log stores these traces. Transcripts capture: @@ -60,7 +60,7 @@ Agent transcripts contain these span types: | Track reasoning time and identify iteration patterns. | `invoke_agent` -| Agent and sub-agent invocation in multi-agent architectures, following the https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-agent-spans/[OpenTelemetry agent invocation semantic conventions^]. Represents one agent calling another via the glossterm:Agent2Agent (A2A) protocol[,A2A protocol]. +| Agent and sub-agent invocation in multi-agent architectures, following the https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-agent-spans/[OpenTelemetry agent invocation semantic conventions^]. Represents one agent calling another through the glossterm:Agent2Agent (A2A) protocol[,A2A protocol]. | Trace calls between root agents and sub-agents, measure cross-agent latency, and identify which sub-agent was invoked. | `openai`, `anthropic`, or other LLM providers @@ -334,12 +334,12 @@ Transcripts are optimized for execution-level observability and governance. For [[history-reconstruction]] == Reconstructed transcript history -Trace data on `redpanda.otel_traces` is subject to a retention policy. When a transcript covers a long-running conversation whose earliest spans have already been evicted, Redpanda reconstructs the missing turns from the LLM message context carried on later spans (`gen_ai.input.messages`) and sets the boolean field `is_reconstructed` to `true` on each affected turn. The UI surfaces this as a **Reconstructed** badge on those turns — `is_reconstructed` is the backing data field; "Reconstructed" is the visible label. Reconstructed turns preserve the high-level intent and role ordering of the conversation, but do not preserve byte-level fidelity: token counts, per-turn latency, and tool-call arguments are unavailable for the reconstructed range. +Trace data on `redpanda.otel_traces` is subject to a retention policy. When a transcript covers a long-running conversation whose earliest spans have already been evicted, Redpanda reconstructs the missing turns from the LLM message context carried on later spans (`gen_ai.input.messages`) and sets the boolean field `is_reconstructed` to `true` on each affected turn. The UI surfaces this as a **Reconstructed** badge on those turns: `is_reconstructed` is the backing data field; "Reconstructed" is the visible label. Reconstructed turns preserve the high-level intent and role ordering of the conversation, but do not preserve byte-level fidelity: token counts, per-turn latency, and tool-call arguments are unavailable for the reconstructed range. -// TODO: Verify model names on this page against the GA build (gpt-5.2, claude-sonnet-4-5, gpt-5-nano, gemini-3-flash-preview) — replace any that ship with a different GA identifier. +// TODO: Verify model names on this page against the GA build (gpt-5.2, claude-sonnet-4-5, gpt-5-nano, gemini-3-flash-preview): replace any that ship with a different GA identifier. == Next steps -* xref:transcripts.adoc[] +* xref:observability:transcripts.adoc[] * xref:agents:monitor.adoc[] * xref:mcp:test-tools.adoc[] diff --git a/modules/observability/pages/ingest-custom-traces.adoc b/modules/observability/pages/ingest-custom-traces.adoc index f301579..4c0d5ac 100644 --- a/modules/observability/pages/ingest-custom-traces.adoc +++ b/modules/observability/pages/ingest-custom-traces.adoc @@ -1,7 +1,7 @@ = Ingest OpenTelemetry Traces from Custom Agents :description: Configure a Redpanda Connect pipeline to ingest OpenTelemetry traces from custom agents into Redpanda's immutable log for unified governance and observability. :page-topic-type: how-to -:learning-objective-1: pass:q[Configure and deploy a Redpanda Connect pipeline to receive OpenTelemetry traces from custom agents via HTTP and publish them to `redpanda.otel_traces`] +:learning-objective-1: pass:q[Configure and deploy a Redpanda Connect pipeline to receive OpenTelemetry traces from custom agents through HTTP and publish them to `redpanda.otel_traces`] :learning-objective-2: Validate trace data format and compatibility with existing MCP server traces :learning-objective-3: Secure the ingestion endpoint using authentication mechanisms @@ -142,7 +142,7 @@ gRPC:: The `otlp_grpc` input component: * Exposes an OpenTelemetry Collector gRPC receiver -* Accepts traces via the OTLP gRPC protocol +* Accepts traces through the OTLP gRPC protocol * Converts incoming OTLP data into individual Redpanda OTEL v1 Protobuf messages The following example shows a minimal pipeline configuration. ADP automatically injects authentication handling. @@ -497,8 +497,8 @@ Following the OpenTelemetry semantic conventions, agent spans should include the ** `gen_ai.operation.name` - Set to `"invoke_agent"` for agent execution spans ** `gen_ai.agent.name` - Human-readable name of your agent (displayed in Transcripts view) * LLM provider details: -** `gen_ai.provider.name` - LLM provider identifier (e.g., `"openai"`, `"anthropic"`, `"gcp.vertex_ai"`) -** `gen_ai.request.model` - Model name (e.g., `"gpt-4"`, `"claude-sonnet-4"`) +** `gen_ai.provider.name` - LLM provider identifier (for example, `"openai"`, `"anthropic"`, `"gcp.vertex_ai"`) +** `gen_ai.request.model` - Model name (for example, `"gpt-4"`, `"claude-sonnet-4"`) * Token usage (for cost tracking): ** `gen_ai.usage.input_tokens` - Number of input tokens consumed ** `gen_ai.usage.output_tokens` - Number of output tokens generated @@ -520,10 +520,10 @@ Set these attributes on your spans for proper display and filtering in the Trans | Human-readable name displayed in Transcripts view | `gen_ai.provider.name` -| LLM provider (e.g., `"openai"`, `"anthropic"`) +| LLM provider (for example, `"openai"`, `"anthropic"`) | `gen_ai.request.model` -| Model name (e.g., `"gpt-4"`, `"claude-sonnet-4"`) +| Model name (for example, `"gpt-4"`, `"claude-sonnet-4"`) | `gen_ai.usage.input_tokens` / `gen_ai.usage.output_tokens` | Token counts for cost tracking diff --git a/modules/observability/pages/transcripts.adoc b/modules/observability/pages/transcripts.adoc index b1c5d47..cd7ebfa 100644 --- a/modules/observability/pages/transcripts.adoc +++ b/modules/observability/pages/transcripts.adoc @@ -33,13 +33,13 @@ The transcripts list shows every recent execution across the agents and MCP serv Each row in the list represents one execution (one trace). Columns include: -* *Start time* — when the execution began (sortable). -* *Agent* / *Service* — the agent or MCP server name from the span's `service.name` resource attribute. -* *Status* — `RUNNING`, `COMPLETED`, or `ERROR`. -* *Turn count* — number of turns in the conversation. -* *Total tokens* — sum of input + output tokens across the conversation. -* *USD cost* — total cost for the execution, derived from per-model pricing. See <> if this column shows `0`. -* *Duration* — wall-clock time between the first and last span. +* *Start time*: When the execution began (sortable). +* *Agent* / *Service*: The agent or MCP server name from the span's `service.name` resource attribute. +* *Status*: `RUNNING`, `COMPLETED`, or `ERROR`. +* *Turn count*: Number of turns in the conversation. +* *Total tokens*: Sum of input + output tokens across the conversation. +* *USD cost*: Total cost for the execution, derived from per-model pricing. See <> if this column shows `0`. +* *Duration*: Wall-clock time between the first and last span. A transcript marked _reconstructed_ is one in which some turns were rebuilt from LLM message context after the original spans were evicted from `redpanda.otel_traces`. See xref:observability:concepts.adoc#history-reconstruction[Reconstructed transcript history] for what that means. @@ -51,12 +51,12 @@ Use filters to narrow the list to the executions you care about. // TODO: Re-verify every filter chip against the GA UI on adp-production. The list below reflects the beta Console MFE and may have moved. -* *Service* — isolate operations from a single agent, MCP server, or AI Gateway. -* *LLM Calls* — executions that invoked a language model. -* *Tool Calls* — executions that invoked one or more tools. -* *Errors Only* — executions with `TranscriptStatus.ERROR` or a per-turn error. -* *Slow (>5s)* — executions exceeding five seconds end-to-end. -* *Attribute* — exact-match filter on a specific span attribute. Supported keys include agent names, LLM model names, tool names, span/trace IDs, and `gen_ai.conversation.id`. +* *Service*: Isolate operations from a single agent, MCP server, or AI Gateway. +* *LLM Calls*: Executions that invoked a language model. +* *Tool Calls*: Executions that invoked one or more tools. +* *Errors Only*: Executions with `TranscriptStatus.ERROR` or a per-turn error. +* *Slow (>5s)*: Executions exceeding five seconds end-to-end. +* *Attribute*: Exact-match filter on a specific span attribute. Supported keys include agent names, LLM model names, tool names, span/trace IDs, and `gen_ai.conversation.id`. Combine filters to narrow further (for example, *Tool Calls* + *Errors Only* to find failed tool executions). @@ -81,15 +81,15 @@ Click any row to open the transcript detail view. The view has two parts: a summ The summary header reports: -* *Duration* — end-to-end wall-clock time. -* *Status* — `RUNNING`, `COMPLETED`, or `ERROR`. -* *Turn count* — number of turns (SYSTEM, USER, ASSISTANT, TOOL). -* *Tokens* — input, output, and total. -* *USD cost* — total cost, with a breakdown by turn on hover. -* *LLM calls* — number of assistant turns that invoked a model. -* *Tool calls* — number of tool invocations across the conversation. -* *Conversation ID* — the `gen_ai.conversation.id` shared by related invocations. Follow it to find earlier or later executions in the same user session. -* *Service* — the agent or MCP server that produced the transcript. +* *Duration*: End-to-end wall-clock time. +* *Status*: `RUNNING`, `COMPLETED`, or `ERROR`. +* *Turn count*: Number of turns (SYSTEM, USER, ASSISTANT, TOOL). +* *Tokens*: Input, output, and total. +* *USD cost*: Total cost, with a breakdown by turn on hover. +* *LLM calls*: Number of assistant turns that invoked a model. +* *Tool calls*: Number of tool invocations across the conversation. +* *Conversation ID*: The `gen_ai.conversation.id` shared by related invocations. Follow it to find earlier or later executions in the same user session. +* *Service*: The agent or MCP server that produced the transcript. // TODO: Verify the exact field set and labels on the GA UI. The list above reflects the proto (`TranscriptSummary`, `TranscriptUsage`) plus the beta UI; the Console MFE may add or rename fields. @@ -97,10 +97,10 @@ The summary header reports: Turns are listed in order by role: -* *SYSTEM* — the system prompt and any priming instructions. -* *USER* — a user message that started or continued the conversation. -* *ASSISTANT* — a response from the LLM. Shows the model, input/output token counts, USD cost for that turn, and latency. If the assistant turn called a tool, its tool calls are nested underneath. -* *TOOL* — a tool invocation. Shows the tool name, the arguments passed, the result, and the latency of the call. +* *SYSTEM*: The system prompt and any priming instructions. +* *USER*: A user message that started or continued the conversation. +* *ASSISTANT*: A response from the LLM. Shows the model, input/output token counts, USD cost for that turn, and latency. If the assistant turn called a tool, its tool calls are nested underneath. +* *TOOL*: A tool invocation. Shows the tool name, the arguments passed, the result, and the latency of the call. Any turn may carry the `is_reconstructed` marker. Reconstructed turns preserve role order and the high-level content of the conversation but do not carry per-turn token counts, latency, or tool-call arguments. See xref:observability:concepts.adoc#history-reconstruction[Reconstructed transcript history] for the mechanics. @@ -108,8 +108,8 @@ Any turn may carry the `is_reconstructed` marker. Reconstructed turns preserve r An errored transcript shows `TranscriptStatus.ERROR` in the summary header. The specific failure appears on the turn that raised it, with: -* *Code* — `TranscriptError.code` (for example, a provider error code or `INVALID_ARGUMENT`). -* *Message* — a short description from the LLM provider, the tool, or the agent runtime. +* *Code*: `TranscriptError.code` (for example, a provider error code or `INVALID_ARGUMENT`) +* *Message*: A short description from the LLM provider, the tool, or the agent runtime If the failure happened during a tool call, the error is attached to the TOOL turn; if it was an LLM call, it's on the ASSISTANT turn; if neither, it's on the trace root. @@ -125,7 +125,7 @@ If the failure happened during a tool call, the error is attached to the TOOL tu . Apply the *Slow (>5s)* filter. . Open a slow transcript and scan the per-turn latency column to find the bottleneck turn. -. For tool-bound bottlenecks, expand the tool call to see arguments and result size — a large result often correlates with slow tool execution. +. For tool-bound bottlenecks, expand the tool call to see arguments and result size: a large result often correlates with slow tool execution. === Analyze tool usage @@ -160,7 +160,7 @@ If the failure happened during a tool call, the error is attached to the TOOL tu A transcript stays in `RUNNING` until the root span closes. Common causes: -* The agent or MCP server is still executing (this is normal — wait, or open a newer transcript). +* The agent or MCP server is still executing (this is normal: Wait, or open a newer transcript). * The root span never flushed because the process was killed mid-execution. Expect this to resolve once the OTLP ingestion lag clears; if it doesn't after several minutes, the trace is likely orphaned. === USD cost shows 0 @@ -172,7 +172,7 @@ If cost is `0` for a transcript that clearly used tokens, check: * The model is in the pricing table. To use a custom rate (negotiated contract, internal chargeback), see xref:governance:budgets.adoc#override-per-model-pricing[Override per-model pricing]. * The cost-reporting pipeline is enabled on your ADP environment. -* The LLM-call spans carry the `gen_ai.usage.*` attributes the pipeline reads — either the token-count inputs (`gen_ai.usage.input_tokens`, `gen_ai.usage.output_tokens`) or the explicit USD-cost fields listed on the concepts page. +* The LLM-call spans carry the `gen_ai.usage.*` attributes the pipeline reads: Either the token-count inputs (`gen_ai.usage.input_tokens`, `gen_ai.usage.output_tokens`) or the explicit USD-cost fields listed on the concepts page. [NOTE] ==== @@ -190,7 +190,7 @@ For long-running conversations, accept some reconstruction; for short conversati === Transcript missing entirely -* Confirm the agent or MCP server actually ran — check its logs and the corresponding session or task topic. +* Confirm the agent or MCP server actually ran: Check its logs and the corresponding session or task topic. * Confirm your user has read access to `redpanda.otel_traces`. // TODO: Replace with the standalone-ADP permission model once available. * Confirm the feature flag enabling Transcripts is on for your environment. diff --git a/modules/observability/partials/observability-logs.adoc b/modules/observability/partials/observability-logs.adoc index 8c10e8f..62aad40 100644 --- a/modules/observability/partials/observability-logs.adoc +++ b/modules/observability/partials/observability-logs.adoc @@ -374,7 +374,7 @@ Shows: * Full request headers * Full request body (formatted JSON) -* All parameters (temperature, max_tokens, etc.) +* All parameters (temperature, max_tokens, and so on) * Custom headers used for routing Example: @@ -506,8 +506,7 @@ Token Generation Rate: 71 tokens/second 3. Check error message: - * Gateway error: Issue with configuration, rate limits, etc. - * Provider error: Issue with upstream API (OpenAI, Anthropic, etc.) + * Gateway error: Issue with configuration, rate limits, and so on * Provider error: Issue with upstream API (OpenAI, Anthropic, and so on) 4. Check routing: * Was fallback triggered? (May indicate primary provider issue) @@ -619,7 +618,7 @@ Use case: Chargeback/showback to customers // PLACEHOLDER: Confirm log retention policy -Retention period: // PLACEHOLDER: e.g., 30 days, 90 days, configurable +Retention period: // PLACEHOLDER: for example, 30 days, 90 days, configurable After retention period: @@ -642,7 +641,7 @@ Export logs (if needed for longer retention): 2. Click "Export to CSV" 3. Download includes all filtered logs with full fields -=== Export via API +=== Export through API // PLACEHOLDER: If API is available for log export @@ -693,7 +692,7 @@ AI Gateway does not log (if applicable): If redaction is supported: * Configure redaction rules for specific fields -* Mask PII (email addresses, phone numbers, etc.) +* Mask PII (email addresses, phone numbers, and so on) * Redact custom header values Example: diff --git a/modules/observability/partials/observability-metrics.adoc b/modules/observability/partials/observability-metrics.adoc index 371b3b2..a196ca5 100644 --- a/modules/observability/partials/observability-metrics.adoc +++ b/modules/observability/partials/observability-metrics.adoc @@ -121,7 +121,7 @@ Breakdowns: * By gateway (for chargeback/showback) * By model (for cost optimization) * By provider (for negotiation leverage) -* By custom header (if configured, e.g., `x-customer-id`) +* By custom header (if configured, for example, `x-customer-id`) Use cases: @@ -165,7 +165,7 @@ Breakdowns: Use cases: * Identify slow models or providers -* Set SLO targets (e.g., "p95 < 2 seconds") +* Set SLO targets (for example, "p95 < 2 seconds") * Detect performance regressions Example insights: @@ -189,7 +189,7 @@ What it shows: Percentage of failed requests over time Metrics: * Total error rate (%) -* Errors by status code (400, 401, 429, 500, etc.) +* Errors by status code (400, 401, 429, 500, and so on) * Errors by model * Errors by provider @@ -207,7 +207,7 @@ Breakdowns: Use cases: * Detect provider outages -* Identify configuration issues (e.g., model not enabled) +* Identify configuration issues (for example, model not enabled) * Monitor rate limit breaches Example insights: @@ -226,7 +226,7 @@ Target: Typically 99%+ for production workloads Use cases: * Monitor overall health -* Set up alerts (e.g., "Alert if success rate < 95%") +* Set up alerts (for example, "Alert if success rate < 95%") === Fallback rate @@ -360,7 +360,7 @@ Widgets: * Spend by gateway (stacked bar chart) * Spend by model (pie chart) * Spend by provider (pie chart) -* Spend by custom dimension (if configured, e.g., customer ID) +* Spend by custom dimension (if configured, for example, customer ID) * Spend trend (time series with forecast) * Budget utilization (progress bar: $X / $Y monthly limit) @@ -507,7 +507,7 @@ alerts: Use case: Import into spreadsheet for analysis, reporting -=== Export via API +=== Export through API // PLACEHOLDER: If API is available for metrics @@ -545,7 +545,7 @@ Response: Supported integrations: * *Prometheus*: Native metrics endpoint on port 9090 at `/metrics` -* *OpenTelemetry*: Traces exported to Redpanda topics via the OpenTelemetry exporter +* *OpenTelemetry*: Traces exported to Redpanda topics through the OpenTelemetry exporter == Common analysis tasks @@ -630,7 +630,7 @@ Decision: If mini's error rate is acceptable, save 10x on costs === "Why did costs spike yesterday?" 1. View cost trend graph -2. Identify spike (e.g., Jan 10th: $500 vs usual $100) +2. Identify spike (for example, Jan 10th: $500 vs usual $100) 3. Drill down: * By gateway: Which gateway caused the spike? * By model: Did someone switch to expensive model? @@ -807,7 +807,7 @@ Track trends, not point-in-time * Day-to-day variance is normal * Look for week-over-week and month-over-month trends -* Seasonal patterns (e.g., more usage on weekdays) +* Seasonal patterns (for example, more usage on weekdays) == Troubleshoot metrics issues diff --git a/modules/reference/pages/rpk/rpk-ai/rpk-ai-install.adoc b/modules/reference/pages/rpk/rpk-ai/rpk-ai-install.adoc index 1463804..126c001 100644 --- a/modules/reference/pages/rpk/rpk-ai/rpk-ai-install.adoc +++ b/modules/reference/pages/rpk/rpk-ai/rpk-ai-install.adoc @@ -21,7 +21,7 @@ rpk ai install [flags] |=== |*Value* |*Type* |*Description* -|--ai-version |string |Redpanda AI CLI version to install (e.g. 0.1.2) (default "latest"). +|--ai-version |string |Redpanda AI CLI version to install (for example, 0.1.2) (default "latest"). |--force |- |Force install of the Redpanda AI CLI. diff --git a/modules/reference/pages/rpk/rpk-x-options.adoc b/modules/reference/pages/rpk/rpk-x-options.adoc index 64afe79..bc010c3 100644 --- a/modules/reference/pages/rpk/rpk-x-options.adoc +++ b/modules/reference/pages/rpk/rpk-x-options.adoc @@ -1,6 +1,6 @@ = rpk -X -Use the `-X` flag to override any rpk-specific configuration option for a single command, without modifying your rpk profile. Each option follows the form `key=value` — for example, `rpk -X tls.enabled=true` enables TLS for the Kafka API. +Use the `-X` flag to override any rpk-specific configuration option for a single command, without modifying your rpk profile. Each option follows the form `key=value`: for example, `rpk -X tls.enabled=true` enables TLS for the Kafka API. Every `-X` option also has an environment-variable equivalent: prefix with `RPK_` and replace periods (`.`) with underscores (`_`). For example, `tls.enabled` becomes `RPK_TLS_ENABLED`.