Skip to content

fix(gateway): avoid local fallback after accepted agent runs#11

Open
lionrooter wants to merge 7 commits intomainfrom
wip/gateway-agent-1006-fix-2026-03-22
Open

fix(gateway): avoid local fallback after accepted agent runs#11
lionrooter wants to merge 7 commits intomainfrom
wip/gateway-agent-1006-fix-2026-03-22

Conversation

@lionrooter
Copy link
Owner

Summary

  • avoid unsafe embedded fallback when the gateway already accepted an agent run
  • distinguish accepted-then-closed websocket failures from ordinary pre-accept gateway failures
  • keep one-shot gateway callers from attempting reconnects during callGateway() flows

Changes

  • add GatewayRequestAcceptedError in src/gateway/client.ts
  • track ackReceived for expectFinal:true requests
  • reject accepted-then-closed requests with a typed accepted error instead of a generic close error
  • disable reconnects for one-shot callGateway() clients
  • prevent agentCliCommand() from falling back to embedded execution after gateway acceptance
  • add focused tests for client, call, and agent CLI behavior

Validation

  • node_modules/.bin/vitest run src/gateway/client.test.ts src/gateway/call.test.ts src/commands/agent-via-gateway.test.ts
  • 77 / 77 tests passed
  • live smoke:
    • node dist/index.js agent --agent cody --message 'Final smoke after gateway 1006 fix. Reply only OK.' --json --timeout 30
    • returned status: "ok" and OK

Notes

  • This fixes the transport/fallback bug only.
  • It does not address unrelated budget.routing-status log spam currently visible in the gateway logs.

Bryan Fisher and others added 7 commits March 17, 2026 09:26
Preserve explicit agent IDs for isolated cron jobs instead of silently downgrading to main.

Harden wake routing for legacy unqualified session keys so the target session is not dropped.

Add regression coverage for isolated Maclern cron runs and guard against agent:main fallback.
Connect the existing provider-usage tracking infrastructure to the model
fallback pipeline so routing decisions consider real-time subscription
quota state. This closes the observation→action gap where usage was
tracked comprehensively but never influenced model selection.

New files:
- complexity-tier.ts: Lightweight prompt complexity classifier
  (SIMPLE/MEDIUM/COMPLEX/REASONING) that maps to model preferences
  (local qwen for simple, cheap CLIs for medium, premium for complex)
- provider-usage.cache.ts: 90s TTL cache for provider usage summary
  to avoid API call storms in the hot path

Changes:
- model-fallback.ts: reorderCandidatesByBudget() runs after
  resolveFallbackCandidates() and scores candidates by quota state,
  burn-rate projection, reset-time awareness, and complexity tier match
- provider-usage.shared.ts: resolveUsageProviderIdForRouting() maps
  CLI providers (claude-cli, codex) to their subscription quota providers
- provider-usage.ts: barrel re-exports for new functions

Feature flags: BUDGET_ROUTING_DISABLED=1, BUDGET_ROUTING_ENFORCEMENT=strict

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…atus RPC

Records recent model routing decisions (tier, scores, reorder, selected
model) in a 50-entry ring buffer and exposes them via a new gateway RPC
endpoint for Command Post dashboard visibility.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant