Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions docs/getting-started.md
Original file line number Diff line number Diff line change
Expand Up @@ -205,6 +205,12 @@ pip uninstall -y framework tools
./quickstart.sh
```

#### Gemini 503 (High Demand)

If you see repeated `503 UNAVAILABLE` or `429 RESOURCE_EXHAUSTED` errors while using Gemini models, see:

- [Gemini 503 Troubleshooting Guide](./troubleshooting/gemini-503.md)

## Getting Help

- **Documentation**: Check the `/docs` folder
Expand Down
50 changes: 50 additions & 0 deletions docs/troubleshooting/gemini-503.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# Troubleshooting Gemini 503 (UNAVAILABLE / High Demand)

## Symptoms

When running an agent with Gemini (Vertex), execution may repeatedly retry and fail with an error similar to:

- `503 UNAVAILABLE`
- Message includes: "This model is currently experiencing high demand"

You may also see LiteLLM retry logs and messages like `MidStreamFallbackError`.

## Why this happens

This is a provider-side overload condition (temporary capacity or demand spike). Your environment can be correctly configured and still hit this error.

## Quick fixes

Try these in order:

1. **Retry later**
- Spikes are often temporary.

2. **Switch models**
- If using `gemini-3-flash-preview`, try `gemini-3.1-pro-preview` (often more stable during spikes).

3. **Reduce workload**
- Shorten the request scope (example: “5 items from last 7 days”).
- Ask for concise output.

4. **Avoid long streaming outputs**
- If a setting exists to disable streaming, try turning it off.
- Mid-stream failures can be more common under provider instability.

5. **Switch providers**
- If you have keys available, try another provider temporarily (OpenAI, Anthropic, Groq, Cerebras).

## How to confirm it is not a local misconfiguration

If you can:
- launch the Hive UI successfully,
- run other lightweight prompts sometimes,
- and the logs specifically show `503 UNAVAILABLE` with “high demand”,

then this is almost certainly provider-side overload, not a local setup issue.

## Suggested resilience behavior (future improvement)

When transient 503 errors exceed retry thresholds, consider:
- configurable fallback model routing, or
- returning partial results with a clear “degraded” status.