-
Notifications
You must be signed in to change notification settings - Fork 461
Combo falls through to next model on transient 503 instead of retrying #335
Description
Problem
When all API keys for a provider are temporarily locked due to transient errors (e.g., Anthropic 503 "No capacity available"), the combo handler immediately falls through to the next model in the chain instead of waiting for the short cooldown to expire and retrying.
The cooldown for these transient errors starts at just 1-2 seconds (exponential backoff: 1s → 2s → 4s...), but the combo handler treats "all accounts temporarily locked" the same as "provider permanently unavailable" and moves on.
This causes unnecessary fallthrough to providers that may not work (see #334), when simply waiting 1-2 seconds would have resolved the issue.
Example Flow
- Combo:
antigravity/claude-opus-4-6-thinking→github/claude-opus-4-6-thinking - Antigravity key
seifgets 503 → locked for 1s - Antigravity key
personalgets 503 → locked for 1s handleSingleModelChatreturnsallRateLimitedwithretryAfter= 1 second from now- Combo handler receives non-ok response → checks
shouldFallback→ moves togithub - GitHub fails → client gets a fatal error
- Meanwhile, both antigravity keys would have been available again in 1 second
Expected Behavior
When a provider returns allRateLimited with a short retryAfter (e.g., under 5-10 seconds), the combo handler should wait for the cooldown to expire and retry the same provider before falling through to the next combo model.
Suggested Fix
In handleComboChat (open-sse/services/combo.js), after receiving a failed response from handleSingleModel:
- Check if the error response contains a
Retry-Afterheader or the error body containsretryAfter - If the retry delay is short (under a configurable threshold, e.g., 5-10 seconds),
awaitthe delay and retry the same model - Limit retries to 1-2 attempts per model to avoid infinite loops
- Only apply this to transient/capacity errors, not permanent failures (401, 403, etc.)