Route LLM calls through AI Gateway for per-request cost tracking by tim-inkeep · Pull Request #2803 · inkeep/agents

tim-inkeep · 2026-03-23T19:22:13Z

Usage Tracking

Enriched OTel spans at all LLM call sites (main generation, compression, distillation, status updates, artifact metadata, evaluations) with token counts, generation type, and scoping metadata (tenant/project/agent/conversation IDs)
Added ai-sdk-callbacks with onFinish / onStepFinish handlers that capture usage.promptTokens, usage.completionTokens, finish reason, and cost data onto OTel spans
Added usage_events table to runtime schema for future persistent usage tracking
Moved token-estimator from agents-api to agents-core and deprecated heuristic estimation in favor of actual AI SDK usage data

Gateway Routing & Cost Tracking

When AI_GATEWAY_API_KEY is set, Anthropic, OpenAI, and Google models are automatically routed through the Vercel AI Gateway via ModelFactory
Cost is extracted per-request from providerMetadata.gateway.cost (actual credits debited), falling back to providerMetadata.gateway.marketCost (market rate estimate), then 0
Removed PricingService entirely — no more rate card lookups from models.dev or gateway catalog, no alias maps, no silent lookup failures
Non-routable providers (Azure, OpenRouter, NIM, Custom) continue to connect directly with cost reported as $0

Manage UI

Added cost dashboard (/cost and /projects/:id/cost) with CostDashboard component
Enhanced conversation trace views with per-step cost, token counts, and model info on timeline items
Updated SigNoz stats queries for new OTel attribute structure
Added cost nav item to sidebar

Docs

Added note in models.mdx about automatic gateway routing for cost tracking when AI_GATEWAY_API_KEY is set

Key Technical Decisions

Decision	Rationale
Route via gateway at `ModelFactory.createModel()` level	Transparent to all call sites — no changes needed upstream
Cost priority: `gateway.cost` > `gateway.marketCost` > `0`	`cost` = actual billing, `marketCost` = estimate (used for BYOK where credits aren't debited), `0` = no gateway
Gateway-routable providers: anthropic, openai, google only	Azure needs resource config, OpenRouter is itself a routing layer, NIM is self-hosted, Custom is arbitrary endpoints
Delete PricingService (no fallback)	Gateway response metadata is the source of truth; rate card estimation was fragile and inaccurate
Keep `extractUsageTokens()` in middleware	Still needed for normalizing AI SDK v3 structured / v1 flat usage formats for token recording
Deprecate `estimateTokens()` heuristic	AI SDK provides actual token counts; char/4 estimation kept only for pre-generation checks where SDK data isn't available yet

Files Changed

Core (`packages/agents-core`)

model-factory.ts — gateway routing logic, GATEWAY_ROUTABLE_PROVIDERS, wrapLanguageModel with gatewayCostMiddleware
usage-cost-middleware.ts — new file: extractGatewayCost() reads from providerMetadata.gateway, extractUsageTokens() normalizes token counts
otel-attributes.ts — added GEN_AI_COST_ESTIMATED_USD, generation type constants, scoping attributes
token-estimator.ts — moved from agents-api, deprecated
usage-tracker.ts — type export for GenerationType
runtime-schema.ts — usage_events table definition
index.ts — new exports
Deleted: pricing-service.ts, pricing-service.test.ts

API (`agents-api`)

ai-sdk-callbacks.ts — onFinish/onStepFinish callbacks that write usage + cost to OTel spans
generate.ts — passes callbacks and generation context to AI SDK calls
AgentSession.ts — passes usage context to status update and artifact metadata generations
distill-utils.ts — passes usage context to distillation/compression calls
EvaluationService.ts — passes usage context to eval simulation and scoring calls
BaseCompressor.ts, ConversationCompressor.ts, MidGenerationCompressor.ts — passes usage context to compression calls
agent-types.ts — extended generation response types with usage fields

Manage UI (`agents-manage-ui`)

cost-dashboard.tsx — new cost analytics dashboard component
cost/page.tsx — org-level cost page
projects/[projectId]/cost/page.tsx — project-level cost page
signoz-stats.ts — updated OTel queries for cost and usage data
conversation trace routes — enriched with per-step cost/token data
sidebar-nav — added cost navigation

Add append-only usage_events table for tracking LLM generation usage across all call sites. Includes token counts (input, output, reasoning, cached), dynamic pricing cost estimate, generation type classification, and OTel correlation fields. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Two-tier dynamic pricing: gateway getAvailableModels() as primary (when AI_GATEWAY_API_KEY is set), models.dev API as universal fallback. In-memory cache with periodic refresh (1h gateway, 6h models.dev). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Insert, query (paginated), and summary aggregation functions for usage_events table. Supports groupBy model/agent/day/generation_type. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

recordUsage() extracts tokens from AI SDK responses, looks up pricing, sets OTel span attributes, and fire-and-forgets a usage_event insert. New SPAN_KEYS: total_tokens, reasoning_tokens, cached_read_tokens, response.model, cost.estimated_usd, generation.step_count, generation.type. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add usage, totalUsage, and response fields to ResolvedGenerationResponse. resolveGenerationResponse now resolves these Promise-based getters from the AI SDK alongside steps/text/finishReason/output. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Call recordUsage() after resolveGenerationResponse in runGenerate(), capturing tenant/project/agent/subAgent context, model, streaming status, and finish reason. Fire-and-forget, non-blocking. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add recordUsage() calls for status_update and artifact_metadata generation types in AgentSession. Compression call sites deferred (need context threading through function signatures). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Consolidate estimateTokens() and AssembleResult into packages/agents-core/src/utils/token-estimator.ts. Update all 10 import sites in agents-api to use @inkeep/agents-core. Removes duplicate code and prepares for usage tracker integration. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Replace recordUsage() with trackedGenerate() — wraps generateText/ streamText calls to automatically record usage on success AND failure. Failed calls check error type: 429/network = 0 tokens, other errors = estimated input tokens from prompt. All call sites (generate.ts, AgentSession status updates + artifact metadata, EvaluationService simulation) now use the wrapper consistently. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

GET /manage/v1/usage/summary — aggregated usage by model/agent/day/ generation_type with optional projectId filter. GET /manage/v1/usage/events — paginated individual usage events with filters for project, agent, model, generation type. Both enforce tenant auth with project-level access checks. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Tenant-level usage dashboard at /{tenantId}/usage with: - Summary stats: total tokens, estimated cost, generation count, models - Token usage over time chart (daily buckets via AreaChartCard) - Breakdown tables by model and generation type - Project filter and date range picker - Nav item added to sidebar Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Extract UsageDashboard, UsageStatCards, UsageBreakdownTable into reusable component. Both tenant-level (/{tenantId}/usage) and project-level (/{tenantId}/projects/{projectId}/usage) pages import the shared component. Register Usage tag in OpenAPI spec + docs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Route handlers use c.get('tenantId') from middleware context - Client fetches through /api/usage Next.js proxy (forwards cookies) - Initialize PricingService at server startup for cost estimation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

resolvedModel from the AI SDK doesn't include provider prefix (e.g. 'claude-sonnet-4-6' not 'anthropic/claude-sonnet-4-6'). Parse requestedModel once at the top and use the extracted modelName for pricing lookup, falling back to resolvedModel when available. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…cking Data layer: - Add steps JSONB column for per-step token breakdown - Populate traceId/spanId from active OTel span - Add conversation/message groupBy + conversationId filter - Thread agentId/conversationId through compression call chain - Wrap compression generateText calls with trackedGenerate Traces integration: - Conversation detail route fetches usage events and merges cost into activities by spanId (with parentSpanId fallback) - Cost shows on timeline items and span detail panels - Usage Cost card on conversation detail page UI: - Events table with pagination, trace links, agent/sub-agent columns - 50/50 chart + events layout - conversationId filter in usage API client Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Write path: - Remove Postgres insert from persistEvent, keep OTel span attributes - Add all schema fields as span attributes (requested_model, provider, status, streamed, byok, finish_reason, duration_ms, error_code, message_id) - Add UsageCostSpanProcessor that enriches doGenerate/doStream spans with per-step cost from PricingService before export - Standardize experimental_telemetry.metadata.generationType across all call sites (status_update, artifact_metadata, compression) - Extract USAGE_GENERATION_TYPES constant for shared filter Read path: - Add getUsageCostSummary (SigNoz aggregation by model/agent/type/conversation) - Add getUsageEventsList (SigNoz LIST query for individual spans) - Dashboard fetches from SigNoz via existing signoz-stats client - Events table and chart derived from SigNoz span data - Filter to only spans with known generationType to reduce noise trackedGenerate no longer takes db parameter. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

vercel · 2026-03-23T19:22:16Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
agents-api	Ready	Preview, Comment	Mar 25, 2026 8:11pm
agents-docs	Ready	Preview, Comment	Mar 25, 2026 8:11pm
agents-manage-ui	Ready	Preview, Comment	Mar 25, 2026 8:11pm

changeset-bot · 2026-03-23T19:22:18Z

🦋 Changeset detected

Latest commit: 1df0a2a

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 10 packages

Name	Type
@inkeep/agents-core	Patch
@inkeep/agents-api	Patch
@inkeep/agents-manage-ui	Patch
@inkeep/agents-cli	Patch
@inkeep/agents-sdk	Patch
@inkeep/agents-work-apps	Patch
@inkeep/ai-sdk-provider	Patch
@inkeep/create-agents	Patch
@inkeep/agents-email	Patch
@inkeep/agents-mcp	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

tim-inkeep · 2026-03-23T19:23:50Z

@pullfrog, please review my PR

pullfrog

Solid PR — the shift from heuristic token estimation to actual AI SDK usage data is well-motivated and the implementation is clean. The wrapLanguageModel approach to fix pricing lookups is elegant. Two medium-severity issues (potential PricingService interval leak, v3 middleware spec version risk) and a few minor items below.

^{｜ Fix all ➔ ｜ Fix 👍s ➔ ｜ View workflow run ｜ pullfrog.com ｜ 𝕏}

pullfrog · 2026-03-23T19:26:35Z

packages/agents-core/src/utils/pricing-service.ts

+    if (this.initialized) return;
+    this.initialized = true;
+
+    await Promise.allSettled([this.refreshGateway(), this.refreshModelsDev()]);


Medium: initialize() is not idempotent against concurrent callers. The initialized guard prevents re-entry but uses a synchronous boolean — if two callers race past the guard before the first sets this.initialized = true, both will set up duplicate intervals. Consider storing the init promise:

private initPromise: Promise<void> | null = null; async initialize(): Promise<void> { if (!this.initPromise) { this.initPromise = this.doInitialize(); } return this.initPromise; }

Alternatively, guard the interval creation behind this.gatewayInterval == null to be safe.

pullfrog · 2026-03-23T19:26:35Z

packages/agents-core/src/utils/pricing-service.ts

+    if (this.modelsDevInterval) clearInterval(this.modelsDevInterval);
+    this.gatewayInterval = null;
+    this.modelsDevInterval = null;
+    this.initialized = false;


Minor: destroy() does not clear initPromise / caches. If someone calls destroy() then initialize() again, this.initialized is false but the caches still contain stale data from the previous lifecycle. Not blocking — the singletons are long-lived in practice — but worth noting for test hygiene.

pullfrog · 2026-03-23T19:26:35Z

packages/agents-core/src/utils/usage-cost-middleware.ts

+}
+
+export const usageCostMiddleware: LanguageModelMiddleware = {
+  specificationVersion: 'v3',


Medium: specificationVersion: 'v3' ties this to an unreleased/experimental middleware API version. If the AI SDK ships a breaking change to the v3 spec (usage shape, callback signatures), this will silently break cost tracking. Confirm this version is stable in the ai package version pinned in your lockfile. If not, add a comment noting the version dependency.

pullfrog · 2026-03-23T19:26:35Z

packages/agents-core/src/utils/usage-cost-middleware.ts

+    const result = await doGenerate();
+
+    try {
+      const inputTokens = result.usage.inputTokens.total ?? 0;


Minor: result.usage.inputTokens.total assumes a nested .total property. This matches the v3 spec's structured usage shape, but the old v1/v2 shape used flat inputTokens: number. If any codepath bypasses wrapLanguageModel and hits this middleware with the old shape, it will throw. The try/catch on line 77 guards against this, so it's safe — just noting the implicit contract.

pullfrog · 2026-03-23T19:26:35Z

packages/agents-core/src/utils/model-factory.ts

-            `To access other models, use OpenRouter (openrouter/model-id), Vercel AI Gateway (gateway/model-id), NVIDIA NIM (nim/model-id), or Custom OpenAI-compatible (custom/model-id).`
-        );
-    }
+    return wrapLanguageModel({


The modelId: modelString here passes the full provider/model-name string (e.g. anthropic/claude-sonnet-4). This is what calculateAndSetCost receives as modelId, and then it splits on / to extract the model name when providerId is present (line 29 of usage-cost-middleware.ts). This works correctly — just confirming the data flow is intentional since the middleware does its own parsing.

pullfrog · 2026-03-23T19:26:35Z

agents-api/src/domains/run/agents/generation/ai-sdk-callbacks.ts

+    if (hasReliableUsage) {
+      // Use actual token counts from the last completed step
+      // Next step's context ≈ last step's input + last step's output (assistant response appended)
+      totalTokens = actualInputTokens + (actualOutputTokens ?? 0);


Correctness check: totalTokens = actualInputTokens + (actualOutputTokens ?? 0) approximates the next step's context size as "last input + last output". This is a good heuristic but slightly oversimplifies — the output gets appended as a new assistant message, so the actual input for the next step includes the original context plus the output tokens, which is what inputTokens already captures for the current step. So the formula effectively double-counts the prior context. In practice this is conservative (triggers compression earlier), which is arguably safer. Worth documenting the rationale.

pullfrog · 2026-03-23T19:26:35Z

agents-api/src/domains/run/agents/generation/ai-sdk-callbacks.ts

            safetyBuffer,
            triggerAt,
            remaining: hardLimit - totalTokens,
+            source: steps.length > 0 ? 'actual_sdk_usage' : 'estimated',


Nit: source: steps.length > 0 ? 'actual_sdk_usage' : 'estimated' — at this point in the code, we're inside the compressionNeeded branch. The source was already determined above, but this ternary re-derives it from steps.length which doesn't account for the hasReliableUsage check (e.g. steps.length > 0 but inputTokens was 0 → fell back to estimate). Consider using a local source variable set at the decision point.

pullfrog · 2026-03-23T19:26:35Z

packages/agents-core/src/db/runtime/runtime-schema.ts

+// USAGE GENERATION TYPES (table removed — usage now tracked via OTel/SigNoz)
+// ============================================================================
+
+import { USAGE_GENERATION_TYPES } from '../../constants/otel-attributes';


Importing from ../../constants/otel-attributes inside a schema file is a bit unusual — it creates a dependency from the DB schema layer to the telemetry constants layer. Since this is just a type re-export and the comment says "table removed — usage now tracked via OTel/SigNoz", it makes sense, but consider whether USAGE_GENERATION_TYPES + GenerationType belong in otel-attributes.ts or in a shared usage-types.ts to keep the schema file focused on DB concerns.

pullfrog · 2026-03-23T19:26:35Z

agents-api/src/domains/run/tools/distill-utils.ts

+        }),
+      };
+
+      const result = await generateText(genConfig as Parameters<typeof generateText>[0]);


The as Parameters<typeof generateText>[0] cast here and in several other places (AgentSession.ts, EvaluationService.ts) suggests the config object doesn't naturally satisfy the generateText parameter type. This is a known pattern when building configs incrementally, but the number of casts in this PR is growing. Not blocking — just flagging for awareness.

pullfrog · 2026-03-23T19:26:35Z

packages/agents-core/src/utils/pricing-service.ts

+const MODEL_ALIASES: Record<string, string[]> = {
+  'claude-sonnet-4': ['claude-sonnet-4'],
+  'claude-opus-4': ['claude-opus-4'],
+  'claude-haiku-3.5': ['claude-3-5-haiku', 'claude-3.5-haiku'],
+  'claude-sonnet-3.5': ['claude-3-5-sonnet', 'claude-3.5-sonnet'],
+  'claude-opus-3': ['claude-3-opus'],
+  'claude-haiku-3': ['claude-3-haiku'],
+};


The alias map is Anthropic-only right now. OpenAI, Google, and other providers have similar aliasing needs (e.g. gpt-4o vs gpt-4o-2024-08-06). This is fine as a starting point — the stripDateSuffix regex handles the most common case — but the map will need expansion as users hit pricing misses for other providers.

pullfrog · 2026-03-23T19:26:52Z

TL;DR — Replaces the inaccurate character-based token estimation heuristic with actual AI SDK token usage for mid-generation compression decisions, adds a PricingService that resolves model costs from AI Gateway and models.dev, wraps every language model with cost-tracking middleware that writes gen_ai.cost.estimated_usd to OTEL spans, and introduces a Usage Dashboard in the manage UI for visualizing costs and token consumption.

Key changes

Use actual AI SDK StepResult.usage for compression — Mid-generation compression now reads inputTokens + outputTokens from the last completed step instead of the text.length / 4 heuristic, falling back to estimates only on step 0 or when usage data is unavailable.
Add PricingService with dual-source pricing lookup — New service fetches per-model pricing from AI Gateway (hourly) and models.dev (6-hourly), with alias resolution, date-suffix stripping, and deduplicated miss logging.
Wrap all models with usageCostMiddleware — Every non-mock LanguageModel from ModelFactory is now wrapped to intercept generate/stream results, look up pricing, and record estimated cost on the active OTEL span.
Enrich OTEL span attributes for usage telemetry — 30+ new span keys (gen_ai.cost.estimated_usd, gen_ai.generation.type, gen_ai.provider, etc.) and a USAGE_GENERATION_TYPES enum classify every generation call.
Consolidate estimateTokens into @inkeep/agents-core — The local token-estimator in agents-api is deleted; the canonical version in agents-core is marked @deprecated with documented accepted usages.
Add Usage Dashboard UI — New /usage pages at tenant and project levels display cost summaries, model breakdowns, generation-type breakdowns, cost-over-time charts, and detailed event tables sourced from SigNoz.
Show cost and token data in conversation trace timeline — Timeline items now display token counts and estimated cost from OTEL spans.

_{Summary ｜ 46 files ｜ 22 commits ｜ base: main ← implement/usage-tracker}

Actual token usage for mid-generation compression

Before: Compression decisions used calculateContextSize() — a text.length / 4 heuristic — which could trigger compression too early or too late.
After: The prepareStep callback now passes the AI SDK steps array; handlePrepareStepCompression reads steps[last].usage.inputTokens and outputTokens for the real count, falling back to estimates with a warning log only when actual data is unavailable (step 0 or undefined usage).

The new isCompressionNeededFromActualUsage(totalTokens) method on MidGenerationCompressor compares actual tokens against the configured threshold directly, bypassing estimation entirely.

When does the fallback kick in?
On the very first step (step 0) there is no prior usage data, and some providers may return undefined or 0 for inputTokens. In both cases the system logs a warning and uses the character-based estimate as a safety net.

ai-sdk-callbacks.ts · generate.ts · MidGenerationCompressor.ts

Pricing service and cost middleware

Before: No pricing data or cost tracking existed anywhere in the system.
After: PricingService fetches model pricing from AI Gateway and models.dev on startup, usageCostMiddleware wraps every language model to compute per-call cost, and the result is recorded as gen_ai.cost.estimated_usd on the active OTEL span.

PricingService handles model name normalization — stripping date suffixes (e.g. claude-sonnet-4-20250514 → claude-sonnet-4), resolving static aliases (e.g. claude-sonnet-4 ↔ claude-3-5-sonnet), and scoping lookups by provider. Cost calculation accounts for five token types: input, output, reasoning, cached-read, and cached-write.

How are pricing misses handled?
When a model has no pricing entry, the miss is logged once per refresh cycle (deduplicated via a Set of model IDs) so operators are alerted without log spam.

pricing-service.ts · usage-cost-middleware.ts · model-factory.ts · index.ts

Enriched OTEL span attributes

Before: Generation spans carried minimal metadata (subAgentId, phase).
After: Spans now include gen_ai.cost.estimated_usd, gen_ai.generation.type, gen_ai.provider, gen_ai.generation.status, gen_ai.generation.duration_ms, context.breakdown.actual_input_tokens, and tenant/project/agent/conversation/session IDs.

The USAGE_GENERATION_TYPES enum classifies every generation call (sub_agent_generation, mid_generation_compression, conversation_compression, status_update, artifact_metadata, eval_simulation, eval_scoring), enabling downstream aggregation by type.

otel-attributes.ts · AgentSession.ts · distill-utils.ts

Usage Dashboard UI

Before: No cost or usage visibility in the management interface.
After: New /usage pages at tenant and project levels display stat cards (total tokens, estimated cost, generation count, models used), breakdown tables by model and generation type, a cost-over-time area chart, and a detailed events table linking to conversation traces.

Data is fetched from SigNoz via two new query methods — getUsageCostSummary() for aggregated data and getUsageEventsList() for individual span-level events. A "Cost" nav item is added to both the tenant and project sidebars.

usage-dashboard.tsx · signoz-stats.ts · usage/page.tsx (tenant) · usage/page.tsx (project)

^{｜ View workflow run ｜ Triggered by Pullfrog ｜ pullfrog.com ｜ 𝕏}

pullfrog · 2026-03-23T19:27:54Z

TL;DR — Replaces the text.length / 4 token estimation heuristic with actual AI SDK StepResult.usage token counts for mid-generation compression decisions, adds a new PricingService that resolves model costs from AI Gateway and models.dev, and wires automatic cost annotation into every model call via usageCostMiddleware. Includes a new usage dashboard UI backed by SigNoz/OTel span queries.

Key changes

Use actual SDK token counts for compression — handlePrepareStepCompression now reads inputTokens + outputTokens from the last completed AI SDK step instead of estimating from serialized message length, falling back to the old heuristic only on step 0.
New PricingService with dual-source lookup — Fetches model pricing from AI Gateway and models.dev with periodic refresh, date-suffix stripping, and a Claude alias map for model name normalization.
usageCostMiddleware for automatic cost annotation — A LanguageModelMiddleware that intercepts completions, calculates cost via PricingService, and writes gen_ai.cost.estimated_usd to the active OTel span.
ModelFactory wraps all models with cost middleware — wrapLanguageModel is applied to every non-mock model with correct modelId and providerId propagation, fixing the pricing lookup context.
Expanded OTel attributes for usage tracking — 20+ new SPAN_KEYS covering generation type, tenant/project/agent/conversation IDs, cost, reasoning tokens, cached tokens, and finish reason.
Generation telemetry enrichment across all call sites — generate.ts, AgentSession, artifact metadata, and distill utilities now emit generationType, tenantId, projectId, agentId, and conversationId in telemetry metadata.
estimateTokens moved to agents-core and deprecated — Token estimator relocated from agents-api to packages/agents-core/src/utils/token-estimator.ts and marked @deprecated.
Usage dashboard UI — New UsageDashboard component with stat cards, breakdown tables by model and generation type, cost-over-time chart, and events list.
Usage pages at org and project level — New routes at /{tenantId}/usage and /{tenantId}/projects/{projectId}/usage with sidebar "Cost" nav items.
SigNoz API methods for usage queries — getUsageCostSummary and getUsageEventsList aggregate and list usage data from OTel spans.
Per-span cost in traces timeline — ActivityItem gains costUsd and timeline items display estimated cost inline for AI generation spans.

_{Summary ｜ 46 files ｜ 21 commits ｜ base: main ← implement/usage-tracker}

Actual SDK token counts for compression decisions

Before: isCompressionNeeded serialized messages to JSON and used estimateTokens(text) (~4 chars per token heuristic) to decide when to compress.
After: isCompressionNeededFromActualUsage uses inputTokens + outputTokens from the AI SDK's StepResult.usage, falling back to the estimate only on step 0 or when the provider returns 0.

The prepareStep callback signature now receives { messages, steps } instead of just { messages }. handlePrepareStepCompression extracts the last step's usage data and calls the new isCompressionNeededFromActualUsage method on MidGenerationCompressor, which compares against hardLimit - safetyBuffer. The old isCompressionNeeded(messages[]) method is marked @deprecated.

When does the fallback heuristic still fire?

On step 0 (no prior steps exist) and when a provider returns undefined or 0 for token counts — ensuring compression still triggers for providers that don't report usage.

ai-sdk-callbacks.ts · MidGenerationCompressor.ts · BaseCompressor.ts

Pricing service and cost middleware

Before: No pricing data existed — token costs were not tracked anywhere.
After: PricingService fetches from AI Gateway and models.dev on startup, resolves costs per input/output/cache token, and usageCostMiddleware writes gen_ai.cost.estimated_usd to every generation's OTel span.

PricingService normalizes model names by stripping date suffixes (e.g. claude-sonnet-4-20250514 → claude-sonnet-4) and applying a static alias map for Claude model families. Pricing misses are logged once per refresh cycle to avoid noise. ModelFactory.createLanguageModel wraps every non-mock model with wrapLanguageModel({ middleware: usageCostMiddleware, modelId, providerId }), ensuring the middleware always has provider context for lookups.

How does the dual-source pricing lookup work?

The service first checks AI Gateway pricing, then falls back to models.dev. It refreshes the AI Gateway source every hour and models.dev every 6 hours. A combined lookup map is built at each refresh, and model names are normalized through date-suffix stripping and alias resolution before lookup.

pricing-service.ts · usage-cost-middleware.ts · model-factory.ts

OTel instrumentation and generation telemetry

Before: Generation metadata carried only an operation field with no tenant, project, or agent context.
After: Every generation emits generationType, tenantId, projectId, agentId, and conversationId — plus 20+ new span attributes for cost, reasoning tokens, cached tokens, and generation status.

All generation call sites — generate.ts, AgentSession status updates, artifact metadata, and distill-utils — now populate telemetry metadata with full context. SPAN_KEYS in otel-attributes.ts gains constants for GEN_AI_COST_ESTIMATED_USD, GENERATION_TYPE, GENERATION_STATUS, GENERATION_DURATION_MS, GENERATION_IS_BYOK, GENERATION_IS_STREAMED, and token-level breakdowns.

otel-attributes.ts · generate.ts · AgentSession.ts

Usage dashboard UI and SigNoz API integration

Before: No visibility into token usage or costs in the Manage UI.
After: Org-level and project-level usage pages show stat cards (total tokens, estimated cost, generation count, models used), breakdown tables, a cost-over-time chart, and an events list — all queried from SigNoz spans.

UsageDashboard renders at /{tenantId}/usage (with a project filter dropdown) and /{tenantId}/projects/{projectId}/usage (pre-filtered). The sidebar gains a "Cost" nav item at both levels. SigNozStatsAPI adds getUsageCostSummary (aggregated by model/agent/type/conversation) and getUsageEventsList (individual span events). The traces timeline also gains inline cost display for AI generation spans.

usage-dashboard.tsx · signoz-stats.ts · usage/page.tsx

^{｜ View workflow run ｜ Triggered by Pullfrog ｜ pullfrog.com ｜ 𝕏}

…ent render events on this dashboard

claude

PR Review Summary

(0) Total Issues | Risk: Low

This is a delta review covering 5 commits since the last automated review (2140a3678).

✅ All Prior Issues Resolved

All Critical, Major, and Minor issues from the 11 prior automated review cycles have been addressed:

Issue	Status
🔴 External HTTP call without timeout	✅ Fixed — `AbortSignal.timeout(10_000)` added to models.dev fetch
🟠 Nested property access may throw TypeError	✅ Fixed — `extractUsageTokens()` helper handles both nested and flat usage shapes
🟠 `initialize()` not idempotent	✅ Fixed — `initPromise` pattern ensures single initialization
🟠 Serverless optimization	✅ Fixed — On-access refresh strategy, no `setInterval` timers
🟡 Project filter state not persisted in URL	✅ Fixed — Now uses `useQueryState('projectId', parseAsString)`
🟡 Unused SPAN_KEYS constants	✅ Fixed — Trimmed to only 4 used `GEN_AI_*` constants
🟡 Hardcoded `gen_ai.cost.pricing_unavailable` string	✅ Fixed — Uses `SPAN_KEYS.GEN_AI_COST_PRICING_UNAVAILABLE`

Delta Changes Reviewed

Commit	Summary	Assessment
`026772a45`	Removed `inkeep-agents-manage-ui` from service name filter	✅ Intentional — component render events won't be tracked in cost dashboard
`57519243a`	Trimmed unused SPAN_KEYS, updated projectId persistence	✅ Addresses @shagun-singh-inkeep's feedback
`f29e54b72`	Major signoz-stats refactoring	✅ Cleaner query construction
`7308c05cb`, `a2994de20`	Merge commits from main	✅ Unrelated to usage-tracker feature

Implementation Quality ✅

The PR implements end-to-end LLM cost tracking with solid patterns:

PricingService: Dual-source lookup (AI Gateway + models.dev) with lazy initialization and on-access refresh (serverless-optimized)
usageCostMiddleware: AI SDK v3 middleware with extractUsageTokens() handling both nested and flat usage shapes
Cost Dashboard: URL-persisted filters (useQueryState), time range presets, project filtering
Test Coverage: Comprehensive tests for pricing-service.ts (285 lines) and usage-cost-middleware.ts (296 lines)
OTEL Attributes: Clean constant organization with only used keys exported

Test Coverage Verified

Both new utility files have comprehensive test coverage:

File	Tests	Coverage
`pricing-service.test.ts`	15 tests	`calculateCost`, `getModelPricing`, `initialize` idempotency, date suffix stripping, error handling, destroy lifecycle
`usage-cost-middleware.test.ts`	14 tests	v3 structured usage, flat usage, `pricing_unavailable` attribute, no-span scenarios, model ID parsing

✅ APPROVE

Summary: After 11 review iterations and comprehensive human reviewer feedback from @shagun-singh-inkeep, this PR is production-ready. All Critical, Major, and Minor issues have been properly addressed. The implementation follows best practices for:

Serverless optimization — Lazy init with stale-while-revalidate refresh pattern
Error handling — Graceful degradation when pricing unavailable
Type safety — Handles both AI SDK v3 structured and flat usage shapes
Observability — Proper OTEL span attributes for cost tracking
UI/UX — URL-persisted filters for shareable dashboard state

Ship it! 🚀

Note: Unable to submit formal GitHub approval due to permission constraints — this is a COMMENT review with approval recommendation.

Reviewers (1)

Reviewer	Returned	Main Findings	Consider	While You're Here	Inline Comments	Pending Recs	Discarded
`orchestrator (delta)`	5	0	0	0	0	0	0
Total	5	0	0	0	0	0	0

Note: Delta review verified all prior findings addressed. No new issues found.

itoqa · 2026-03-24T21:11:32Z

Ito Test Report ✅

15 test cases ran. 15 passed.

The unified QA run passed all 15/15 test cases with zero failures, confirming stable behavior across cost, AI-calls, conversation traces, and component render-generation flows in local non-production testing. Key findings were that access controls and deep-link login gating worked correctly, filtering/time-range interactions (including empty and future ranges plus rapid toggling) remained coherent without crashes, usage/cost totals and timeline estimated costs were accurate, mobile cost pages and trace navigation were usable, and security checks (cross-project tampering, malformed IDs, and query-parameter XSS payloads) produced safe denied/inert outcomes with no data leakage or backend internals exposure.

✅ Passed (15)

Category	Summary	Screenshot
Adversarial	Unauthenticated direct access to `/default/cost` redirected to login with `returnUrl`; protected cost content stayed hidden.
Adversarial	Tampered projectId query and unauthorized conversation deep-link stayed empty-safe/denied with no foreign trace exposure.
Adversarial	Script payloads in query params remained inert and `window.__xss` stayed undefined across cost and AI-calls routes.
Adversarial	Malformed conversation ID route shows safe error behavior without exposing backend internals.
Edge	A far-past custom range rendered correct empty states with zeroed totals and no stale or malformed rows.
Edge	Future custom end dates are tolerated via end-time clamping to now-1ms.
Logic	Tenant and project AI Calls pages use the intended usage-scoped query paths.
Mobile	Mobile 390x844 cost pages remained usable and `View trace` navigation to conversation details worked as expected.
Rapid	Repeated rapid project-filter and preset toggling settled to a coherent final state without runtime error overlays.
Happy-path	Sidebar Cost navigation opened `/default/cost` and rendered the `Cost & Token Usage` page shell.
Happy-path	Project filter apply/remove behavior worked: selecting a project set `projectId` in the URL and removing the filter cleared it back to all projects.
Happy-path	Tenant and project cost pages handled preset transitions (30d/24h/7d) with consistent query-state updates and stable widgets.
Happy-path	AI Usage & Cost summary totals align with the same usage event rows used for per-event rendering.
Happy-path	Timeline row and expanded detail panel both surface estimated cost for eligible AI activity.
Happy-path	Data and artifact component render generation completed in new and modify modes with streaming endpoint responses observed.

Commit: 2eb1617

View Full Run

Tell us how we did: Give Ito Feedback

claude

PR Review Summary

(0) Total Issues | Risk: Low

This is a delta review covering 2 commits since the last automated review (026772a45f0d).

✅ Clean Architectural Simplification

The delta removes the PricingService entirely (573 lines) and simplifies cost tracking to use the gateway's own response metadata — exactly what was needed for serverless optimization.

Change	Impact
Deleted `pricing-service.ts` (288 lines)	✅ No more interval-based refresh, no dual-source lookups
Deleted `pricing-service.test.ts` (285 lines)	✅ Tests no longer needed
Simplified `usage-cost-middleware.ts`	✅ Now extracts cost from `providerMetadata.gateway.cost` directly
Added `GATEWAY_ROUTABLE_PROVIDERS`	✅ Auto-routes anthropic/openai/google through gateway when `AI_GATEWAY_API_KEY` set
Removed `GEN_AI_COST_PRICING_UNAVAILABLE`	✅ No longer needed — gateway provides cost or we log a warning
Updated tests (293 lines)	✅ Comprehensive coverage of new approach
Documentation note	✅ Added note about automatic gateway routing

Implementation Quality ✅

1. Gateway routing logic — Clean conditional routing:

const shouldRouteViaGateway =
  !!process.env.AI_GATEWAY_API_KEY &&
  (GATEWAY_ROUTABLE_PROVIDERS as readonly string[]).includes(provider) &&
  Object.keys(providerConfig).length === 0;

Only activates when gateway API key is set
Only for supported providers (anthropic, openai, google)
Preserves direct connections for custom configurations

2. Cost extraction — Simple priority chain:

// gateway.cost (actual billing) → gateway.marketCost (BYOK estimate) → 0
const cost = parseFloat(gw.cost as string);
if (!Number.isNaN(cost) && cost > 0) return cost;
const marketCost = parseFloat(gw.marketCost as string);
if (!Number.isNaN(marketCost) && marketCost > 0) return marketCost;
return 0;

Logs warning when routed through gateway but no cost data (line 53-58)

3. Test coverage — Comprehensive:

Cost extraction from gateway.cost and marketCost fallback
BYOK scenario (cost=0 with marketCost available)
Non-numeric strings handled gracefully
No active span scenario
extractUsageTokens for v3 structured and flat formats

Prior Issues Status

All prior issues from 12+ automated review cycles have been addressed. This delta directly resolves the serverless optimization concern from @robert-inkeep:

"@claude there's a network call being made for the pricing, can you check how this should be optimized, given we are running this with serverless vercel"

✅ Resolved — No more external network calls for pricing. Cost comes directly from the gateway response metadata that's already part of the LLM call result.

✅ APPROVE

Summary: Excellent simplification. The PR removes 573 lines of complex pricing service code and replaces it with a ~15-line cost extraction function that uses the gateway's own response metadata. This is the right architectural choice — the gateway knows the actual cost, so there's no need for fragile rate card lookups. Tests are comprehensive, documentation is updated, and serverless compatibility is now guaranteed. Ship it! 🚀

Note: Unable to submit formal GitHub approval due to permission constraints — this is a COMMENT review with approval recommendation.

Reviewers (1)

Reviewer	Returned	Main Findings	Consider	While You're Here	Inline Comments	Pending Recs	Discarded
`orchestrator`	0	0	0	0	0	0	0
Total	0	0	0	0	0	0	0

Note: Delta review of a clean simplification — no sub-reviewers dispatched as changes are straightforward architectural cleanup.

claude

PR Review Summary

(0) Total Issues | Risk: Low

This is a delta review covering 12 commits since the last automated review (df5ec748210c).

✅ Delta Analysis

The delta contains no changes to the usage-tracker feature. All commits are merge commits from main bringing in unrelated changes:

Commit	Description	Impact on PR
`1df0a2af7`	Merge branch 'main'	N/A — merge commit
`1199d452f`	PDF URL attachments (#2804)	Unrelated — file upload feature
`96aee7243`	upgrade create-agents-template (#2832)	Unrelated — template upgrade
`0ec1c5c25`	Update deploy-signoz-nango.mdx (#2831)	Unrelated — docs update
`4c57d6b8c`	Version Packages (#2812)	Unrelated — version bumps
`ef7987f69`, `1b37b2966`, `82bd424e4`	UI component unification (#2829, #2830, #2810)	Unrelated — shadcn refactor
`249183068`, `8a0ab005c`, `6b9498616`	Skills updates (#2824, #2825, #2826)	Unrelated — AI tooling
`296c79aeb`	Sidebar tooltip fix (#2811)	Unrelated — UI fix

✅ Core Feature Unchanged

The usage-tracker implementation files remain stable:

usage-cost-middleware.ts — No changes
model-factory.ts — No changes
otel-attributes.ts — No changes
ai-sdk-callbacks.ts — No changes
cost-dashboard.tsx — No changes
signoz-stats.ts — No changes

✅ Prior Issues Status

All Critical, Major, and Minor issues from the 14 prior automated review cycles have been addressed:

Category	Status
🔴 Critical issues	✅ All resolved
🟠 Major issues	✅ All resolved
🟡 Minor issues	✅ All resolved
👤 Human feedback (@shagun-singh-inkeep)	✅ All addressed

Implementation Summary

The PR successfully implements end-to-end LLM cost tracking:

Gateway Routing — When AI_GATEWAY_API_KEY is set, anthropic/openai/google models are automatically routed through Vercel AI Gateway
Cost Extraction — gatewayCostMiddleware extracts cost from providerMetadata.gateway.cost (actual billing) → marketCost (BYOK estimate) → 0
PricingService Removal — Deleted 573 lines of fragile rate card lookup code in favor of gateway response metadata
Comprehensive Tests — 293 lines of test coverage for the middleware
Cost Dashboard — New UI for cost analytics with project filtering and time range presets

✅ APPROVE

Summary: Clean merge from main with no changes to the usage-tracker feature. After 14 review cycles with all issues addressed, this PR is ready to ship. The implementation elegantly routes LLM calls through Vercel AI Gateway for per-request cost tracking, removing the need for fragile rate card lookups. 🚀

Note: Unable to submit formal GitHub approval due to permission constraints — this is a COMMENT review with approval recommendation.

Reviewers (1)

Reviewer	Returned	Main Findings	Consider	While You're Here	Inline Comments	Pending Recs	Discarded
`orchestrator`	0	0	0	0	0	0	0
Total	0	0	0	0	0	0	0

Note: Delta review — no usage-tracker files changed since last review. No sub-reviewers dispatched.

github-actions · 2026-03-25T20:35:12Z

🔎💬 Inkeep AI search and chat service is syncing content for source 'Inkeep Agent Framework Docs'

itoqa · 2026-03-25T21:10:15Z

Ito Test Report ❌

22 test cases ran. 1 failed, 21 passed.

Overall, 21 of 22 tests passed, showing strong coverage and stable behavior across cost dashboard routing and filter/query-state handling, rapid interaction resilience, XSS-inert query rendering, auth sentinel redirects, back/forward restoration, cost empty/formatting/status edge cases, conversation detail cost/timeline fallbacks, AI-calls query/date-range safeguards, and mobile usability at iPhone 13 size. The single important failure was a medium-severity, pre-existing API defect where generate-render endpoints return HTTP 500 instead of the expected HTTP 404 for unknown data/artifact component IDs, which can mislead clients and monitoring by classifying normal not-found cases as server errors.

❌ Failed (1)

Category	Summary	Screenshot
Adversarial	🟠 Unknown data/artifact component IDs return HTTP 500 due to thrown upstream ApiError handling, instead of returning a not-found HTTP 404.

🟠 Generate-render API invalid ID handling

What failed: The routes return HTTP 500 for unknown component IDs; expected behavior is HTTP 404 with a not-found message for the missing resource.
Impact: API consumers receive server-error semantics for a normal not-found condition, which breaks reliable client error handling and produces misleading failure signals. This can also mask true server faults in monitoring because expected 404s are misclassified as 500s.
Introduced by this PR: No – pre-existing bug (code not changed in this PR)
Steps to reproduce:
1. Send POST to /api/data-components/does-not-exist/generate-render with valid tenantId and projectId.
2. Send POST to /api/artifact-components/does-not-exist/generate-render with valid tenantId and projectId.
3. Observe response status codes and compare with expected not-found behavior.
Code analysis: I inspected both generate-render route handlers plus the shared management API request layer. The route code attempts to return 404 only when fetchDataComponent/fetchArtifactComponent return falsy values, but these helpers call makeManagementApiRequest, which throws ApiError on any non-2xx response (including 404); that thrown error is then caught by the route-level catch block and converted to HTTP 500.
Why this is likely a bug: The not-found path is effectively unreachable for upstream 404 responses because non-2xx responses throw before the route can branch, and the catch unconditionally maps that case to 500.

Relevant code:

agents-manage-ui/src/app/api/data-components/[dataComponentId]/generate-render/route.ts (lines 34-38)

const dataComponent = await fetchDataComponent(tenantId, projectId, dataComponentId);

if (!dataComponent) {
  return new Response('Data component not found', { status: 404 });
}

agents-manage-ui/src/lib/api/api-config.ts (lines 81-89)

if (!response.ok) {
  let errorData: any;
  try {
    const text = await response.text();
    errorData = text ? JSON.parse(text) : null;
  } catch {
    errorData = null;
  }

agents-manage-ui/src/lib/api/api-config.ts (lines 132-138)

throw new ApiError(
  {
    code: errorCode,
    message: errorMessage,
  },
  response.status
);

agents-manage-ui/src/app/api/artifact-components/[artifactComponentId]/generate-render/route.ts (lines 139-143)

} catch (error) {
  console.error('Error generating artifact component render:', error);
  return new Response(error instanceof Error ? error.message : 'Internal server error', {
    status: 500,
  });
}

✅ Passed (21)

Category	Summary	Screenshot
Adversarial	Rapid filter/time-range churn preserved interactivity and converged to the final selected state.
Adversarial	Encoded XSS payload in `projectId` stayed inert on org cost view; no script execution observed.
Adversarial	Invalid conversation deep link shows controlled error UI and Back to Overview returns to traces.
Adversarial	Logged-out sentinel redirected without session, and protected access resumed after clearing sentinel with session context.
Adversarial	Rapid Back/Forward over project/org cost and conversation routes restored expected screens without navigation lockups.
Adversarial	Both generate-render endpoints correctly reject empty JSON bodies with HTTP 400 and a clear missing tenant/project validation message.
Edge	With only customStartDate set, the AI Calls page remained stable and rendered a single-date range without crashing.
Edge	Custom-range handling prevented future-end submission and outgoing AI-calls trace payloads stayed at or before current time.
Edge	Project cost dashboard showed empty-state messaging across breakdown tables and events card with empty mocked usage responses.
Edge	Cost formatting preserved precision boundaries: very small non-zero values at 6 decimals and standard values at 2 decimals.
Edge	Conversation details remain visible when usage-events fails, and AI Usage & Cost degrades to a no-data message.
Edge	Cost event status chips correctly mapped `hasError` true and `'true'` to failed, and missing `hasError` to succeeded.
Mobile	Mobile project cost view remains usable at 390x844, and cost events remain reachable via horizontal scroll.
Happy-path	Sidebar navigation from Projects to org Cost page loaded the expected route and page header.
Happy-path	Project sidebar Cost link opened the correct project-scoped Cost page with expected descriptive content.
Happy-path	Project filter query-state behavior worked as expected, including removing `projectId` via clear action.
Happy-path	Time-range preset switching (`24h` → `7d` → `30d`) updated state and remained responsive.
Happy-path	Org-level cost events rendered without trace deep links, showing em dashes in Conversation when no project scope is selected.
Happy-path	AI Usage & Cost aggregates mocked usage events and renders generation type/model rows on conversation details.
Happy-path	Conversation timeline renders a visible cost badge and generation subtitle for successful AI generation items.
Happy-path	Org AI-calls requests included usage-scoped generationType IN filters and the AI Calls Breakdown page rendered correctly.

Commit: 1df0a2a

View Full Run

Tell us how we did: Give Ito Feedback

tim-inkeep and others added 20 commits March 19, 2026 14:16

[US-003] Create usage events data access layer

a004ff5

Insert, query (paginated), and summary aggregation functions for usage_events table. Supports groupBy model/agent/day/generation_type. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

added new token and cost tracking

b77c288

updated all to use real token counts

fbc060d

update pricing server

725d4cc

updated changset

7267edd

Merge branch 'main' into implement/usage-tracker

daea46d

style: auto-format with biome

b675151

vercel bot temporarily deployed to Preview – agents-docs March 23, 2026 19:24 Inactive

vercel bot had a problem deploying to Preview – agents-manage-ui March 23, 2026 19:25 Failure

pullfrog bot reviewed Mar 23, 2026

View reviewed changes

removed manage-ui from security filter, we will just not track compon…

026772a

…ent render events on this dashboard

vercel bot temporarily deployed to Preview – agents-docs March 24, 2026 19:56 Inactive

vercel bot deployed to Preview – agents-manage-ui March 24, 2026 19:57 View deployment

claude bot reviewed Mar 24, 2026

View reviewed changes

vercel bot deployed to Preview – agents-api March 24, 2026 19:59 View deployment

github-actions bot deleted a comment from claude bot Mar 24, 2026

knip

2eb1617

vercel bot temporarily deployed to Preview – agents-docs March 24, 2026 20:06 Inactive

vercel bot deployed to Preview – agents-manage-ui March 24, 2026 20:07 View deployment

github-actions bot deleted a comment from claude bot Mar 24, 2026

vercel bot deployed to Preview – agents-api March 24, 2026 20:12 View deployment

tim-inkeep changed the title ~~Use actual AI SDK token usage for compression and fix pricing lookup~~ Route LLM calls through AI Gateway for per-request cost tracking Mar 25, 2026

tim-inkeep added 2 commits March 25, 2026 15:46

updated with no pricing server

df5ec74

Merge branch 'main' into implement/usage-tracker

1df0a2a

vercel bot deployed to Preview – agents-manage-ui March 25, 2026 19:51 View deployment

vercel bot deployed to Preview – agents-docs March 25, 2026 19:52 View deployment

vercel bot deployed to Preview – agents-api March 25, 2026 19:52 View deployment

claude bot reviewed Mar 25, 2026

View reviewed changes

github-actions bot deleted a comment from claude bot Mar 25, 2026

vercel bot deployed to Preview – agents-manage-ui March 25, 2026 20:09 View deployment

vercel bot deployed to Preview – agents-docs March 25, 2026 20:11 View deployment

vercel bot deployed to Preview – agents-api March 25, 2026 20:11 View deployment

claude bot reviewed Mar 25, 2026

View reviewed changes

github-actions bot deleted a comment from claude bot Mar 25, 2026

tim-inkeep added this pull request to the merge queue Mar 25, 2026

inkeep bot mentioned this pull request Mar 25, 2026

docs: Add AI_GATEWAY_API_KEY to Vercel deployment guide #2838

Merged

claude bot mentioned this pull request Mar 25, 2026

Bugfix/compressor bug #2833

Open

Conversation

tim-inkeep commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Usage Tracking

Gateway Routing & Cost Tracking

Manage UI

Docs

Key Technical Decisions

Files Changed

Core (packages/agents-core)

API (agents-api)

Manage UI (agents-manage-ui)

Uh oh!

vercel bot commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

changeset-bot bot commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦋 Changeset detected

Uh oh!

tim-inkeep commented Mar 23, 2026

Uh oh!

pullfrog bot left a comment

Choose a reason for hiding this comment

Uh oh!

pullfrog bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

pullfrog bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

pullfrog bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

pullfrog bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

pullfrog bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

pullfrog bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

pullfrog bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

pullfrog bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

pullfrog bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

pullfrog bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

pullfrog bot commented Mar 23, 2026

Key changes

Actual token usage for mid-generation compression

Pricing service and cost middleware

Enriched OTEL span attributes

Usage Dashboard UI

Uh oh!

pullfrog bot commented Mar 23, 2026

Key changes

Actual SDK token counts for compression decisions

Pricing service and cost middleware

OTel instrumentation and generation telemetry

Usage dashboard UI and SigNoz API integration

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

PR Review Summary

✅ All Prior Issues Resolved

Delta Changes Reviewed

Implementation Quality ✅

Test Coverage Verified

✅ APPROVE

Uh oh!

itoqa bot commented Mar 24, 2026

Ito Test Report ✅

tim-inkeep commented Mar 23, 2026 •

edited

Loading

Core (`packages/agents-core`)

API (`agents-api`)

Manage UI (`agents-manage-ui`)

vercel bot commented Mar 23, 2026 •

edited

Loading

changeset-bot bot commented Mar 23, 2026 •

edited

Loading