Centralized AI provider quota monitoring daemon. Tracks usage and rate-limit status across multiple AI coding providers, delivers real-time velocity analysis, exhaustion projections, per-instance budget allocation, and proactively switches OpenCode to fallback models before credits run out.
- Multi-provider monitoring -- Claude, GitHub Copilot, Z.ai, MiniMax, DeepSeek, and Alibaba Cloud (Tongyi Lingma)
- Proactive model switching -- automatically updates
opencode.jsonto fallback models when a provider approaches quota exhaustion, with hysteresis to prevent flapping and automatic recovery when credits are replenished - Velocity analysis -- linear regression over rolling samples to compute consumption rate (%/hour)
- Exhaustion projection -- predicts time-to-quota-exhaustion per provider and window
- Health status engine -- GREEN / YELLOW / RED based on utilization, velocity, and configurable safety margins
- Dynamic budget allocation -- divides provider caps across active instances with weighted fairness (active > idle > paused) and overcommit support
- Multi-framework recommendations -- tailored advice for Claude (strict stop/proceed) and OpenCode (fallback-aware) workflows
- Provider deduplication -- instances sharing the same API key pool into a single polling thread
- Heartbeat-based GC -- dead instances are automatically reaped after configurable timeout
- Python 3.11+
pipx install .quota-sentinel startBy default the daemon listens on 127.0.0.1:7878. Override with --host /
--port:
quota-sentinel start --host 0.0.0.0 --port 9090docker build -t quota-sentinel .
docker run -p 7878:7878 quota-sentinel| Command | Description |
|---|---|
quota-sentinel start |
Launch the daemon |
quota-sentinel status |
Global daemon status |
quota-sentinel health |
Health check |
quota-sentinel switch |
Proactively switch OpenCode models based on quota |
All endpoints live under /v1.
| Method | Path | Description |
|---|---|---|
POST |
/v1/instances |
Register returns instance_id AND api_key (save this!) |
DELETE |
/v1/instances/{id} |
Deregister instance |
PATCH |
/v1/instances/{id}/heartbeat |
Keep-alive with optional state update |
The POST /v1/instances endpoint accepts different auth formats:
{
"project_name": "my-project",
"framework": "opencode",
"auth": {
"opencode_auth": {
"PROVIDER_KEY": {"key": "api-key"}
}
}
}Valid provider keys:
| Auth Key | Provider | Notes |
|---|---|---|
zai-coding-plan or zai |
zai | Z.ai |
deepseek-coding-plan or deepseek |
deepseek | DeepSeek |
minimax-coding-plan or minimax |
minimax | MiniMax |
bailian-coding-plan or alibaba-coding-plan or dashscope or alibaba |
alibaba | Alibaba Cloud |
{
"project_name": "my-project",
"framework": "claude",
"auth": {
"claude_credentials": {
"accessToken": "...",
"refreshToken": "...",
"expiresAt": 1234567890000
}
}
}{
"project_name": "my-project",
"framework": "opencode",
"auth": {
"github_token": "gho_..."
}
}{
"project_name": "my-project",
"framework": "opencode",
"poll_interval": 300,
"provider_config": {
"minimax": {"group_id": "group-123"},
"copilot": {"github_username": "myuser", "plan": "pro"}
},
"hard_caps": {
"deepseek_default": 90.0,
"claude_five_hour": 70.0
},
"auth": {
"opencode_auth": {
"deepseek-coding-plan": {"key": "sk-..."},
"minimax-coding-plan": {"key": "sk-...", "group_id": "group-123"}
},
"claude_credentials": {
"accessToken": "...",
"refreshToken": "...",
"expiresAt": 1234567890000
},
"github_token": "gho_..."
}
}Optional fields:
| Field | Description |
|---|---|
poll_interval |
Poll frequency in seconds (default: 300, min: 30) |
provider_config |
Provider-specific settings (see below) |
hard_caps |
Override default hard caps per provider |
provider_config options:
| Provider | Options |
|---|---|
| minimax | group_id: string |
| copilot | github_username: string, plan: "pro" or "business" |
| Method | Path | Description |
|---|---|---|
GET |
/v1/status |
Instance-specific status (requires X-API-Key header) |
GET |
/v1/status/{id} |
Single-instance TOKEN_STATUS with recommendation |
GET |
/v1/providers |
Instance providers (requires X-API-Key header) |
GET |
/v1/providers/{name} |
Detailed provider info with subscriber list |
POST |
/v1/poll |
Trigger repoll (requires X-API-Key header) |
GET |
/v1/health |
Server health check |
Returns a dict of provider name → status. Empty {} if no instances registered.
{
"deepseek": {
"status": "GREEN",
"windows": {
"rolling": {
"utilization": 45.2,
"velocity_pct_per_hour": 5.3,
"resets_at": "2026-04-10T00:00:00+00:00",
"status": "GREEN",
"metadata": {
"total_balance": 2500.50,
"is_available": true,
"currency": "USD"
}
}
}
}
}{
"name": "deepseek",
"fingerprint": "abc123...",
"subscribers": ["instance-id-1", "instance-id-2"],
"windows": {
"rolling": {
"utilization": 45.2,
"resets_at": "2026-04-10T00:00:00+00:00"
}
}
}Or if no data:
{
"name": "deepseek",
"status": "UNKNOWN",
"error": "no data"
}The quota-sentinel switch command monitors Quota Sentinel and proactively updates your opencode.json before you hit rate limits:
quota-sentinel switch /path/to/opencode.json --recovery-hold 5How it works:
- Monitors providers and switches to fallback models when reaching YELLOW/RED status
- Uses hysteresis to prevent flapping (only restores after GREEN consecutive polls)
- Remembers original models across restarts via
.opencode-switcher.json - Works with any
opencode.jsonfile and supports multiple instances
The model switcher expects fallback_models to be defined in your opencode.json using the Agent Caddy format:
Basic Structure:
{
"agent": {
"YOUR_AGENT_NAME": {
"model": "original-provider/original-model",
"fallback_models": [
"fallback-provider/fallback-model",
"backup-provider/backup-model"
]
}
}
}Key Requirements:
-
Model Identifier Format: Must be
provider/model-name(e.g.,deepseek/deepseek-v3.2) -
Provider Matching: The switcher uses the prefix before
/to match against Quota Sentinel provider names -
Selection Priority: Fallback models are tried in order - first one with GREEN status is selected
Complete Example:
{
"agent": {
"build": {
"model": "github-copilot/gpt-5-mini",
"fallback_models": [
"deepseek/deepseek-v3.2",
"alibaba/qwen3-max",
"zai/glm-5.1"
]
},
"production": {
"model": "deepseek/deepseek-v3.2",
"fallback_models": [
"alibaba/qwen3-max",
"zai/glm-5.1"
]
}
}
}Selection Logic:
- When original provider reaches YELLOW/RED status
- Iterates through
fallback_modelsin order - Selects first model whose provider is currently GREEN
- If all fallbacks are unhealthy, stays on current model
- Recovery occurs only after original provider has been GREEN for
--recovery-holdconsecutive polls
If you use Agent Caddy to scaffold your projects,
fallback_models are generated automatically per agent based on model scoring and provider
diversity. No manual configuration needed — just run agentcaddy generate and the resulting
opencode.json will include fallbacks.
To integrate quota-sentinel switching into an Agent Caddy project, add a skill file
(e.g. .opencode/skills/quota-sentinel.md) so your agents know how to use it:
## Quota Sentinel Management
Quota Sentinel monitors AI provider quotas and switches models proactively
before credits run out.
### Check if installed and running
```bash
command -v quota-sentinel # check if installed
quota-sentinel health # check if server is running
curl -s http://127.0.0.1:7878/v1/healthpipx install git+https://github.com/millaguie/quota-sentinel.gitquota-sentinel switch opencode.json # daemon mode
quota-sentinel switch opencode.json --once # single check and exit
quota-sentinel switch opencode.json --restore # restore original models- Check if installed:
command -v quota-sentinel - If not installed:
pipx install git+https://github.com/millaguie/quota-sentinel.git - Start the switcher:
quota-sentinel switch opencode.json & - Confirm it started:
quota-sentinel health
And define your agents with fallbacks in `data/catalog/agents.yaml`:
```yaml
agents:
build:
description: "Development agent - writes code, runs tests, builds the project"
temperature: 0.2
color: "#4CAF50"
role: implementer
domain: development
test:
description: "Test specialist - writes tests, validates coverage"
temperature: 0.1
color: "#8BC34A"
role: tester
domain: development
Agent Caddy's generator will automatically produce an opencode.json with
fallback_models for each agent, selecting the best alternative model from
different providers:
{
"agent": {
"build": {
"model": "github-copilot/gpt-5-mini",
"fallback_models": [
"deepseek/deepseek-v3.2",
"alibaba/qwen3-max",
"zai/glm-5.1"
]
},
"test": {
"model": "github-copilot/gpt-5-mini",
"fallback_models": [
"deepseek/deepseek-v3.2",
"alibaba/qwen3-max"
]
}
}
}When quota-sentinel switch detects a provider approaching quota exhaustion,
it rewrites the model field to the first healthy fallback — transparently,
with no manual intervention.
All endpoints except /v1/health and /v1/instances require authentication
via the X-API-Key header.
curl -X POST http://127.0.0.1:7878/v1/instances \
-H "Content-Type: application/json" \
-d '{
"project_name": "my-project",
"framework": "opencode",
"auth": {
"opencode_auth": {
"deepseek-coding-plan": {"key": "sk-..."}
}
}
}'Response:
{
"instance_id": "abc123...",
"api_key": "qs_AbCdEfGhIjKlMnOpQrStUvWx",
"providers": ["deepseek"],
"poll_interval": 300
}Save the api_key - you'll need it for all subsequent requests.
# Get status for your instance
curl -H "X-API-Key: qs_AbCdEfGhIjKlMnOpQrStUvWx" http://127.0.0.1:7878/v1/status
# Get your providers
curl -H "X-API-Key: qs_AbCdEfGhIjKlMnOpQrStUvWx" http://127.0.0.1:7878/v1/providers
# Trigger poll for your providers
curl -X POST -H "X-API-Key: qs_AbCdEfGhIjKlMnOpQrStUvWx" http://127.0.0.1:7878/v1/poll| Provider | Rate-limit windows | Auth |
|---|---|---|
| Claude (Anthropic) | 5-hour, 7-day | OAuth token |
| GitHub Copilot | Monthly | GitHub token |
| Z.ai | Hours, days, MCP | API token |
| MiniMax | Per-model, weekly | API token + group ID |
| DeepSeek | Rolling balance | API token |
| Alibaba Cloud | 5-hour, weekly, monthly | Session cookie |
Default hard caps (% utilization before RED):
| Key | Default |
|---|---|
claude_five_hour |
80% |
claude_seven_day |
90% |
copilot_monthly |
85% |
zai_default |
80% |
minimax_default |
85% |
deepseek_default |
85% |
alibaba_default |
80% |
Other defaults: poll interval 300s, safety margin 30s, velocity window 10 samples, overcommit factor 1.5x, heartbeat timeout 3x poll interval.
- Clients register via
POST /v1/instanceswith their providers, auth tokens, framework type, and desired poll interval. - Daemon polls each provider on the fastest registered interval, deduplicating by API key fingerprint.
- Velocity tracker maintains a rolling window of (timestamp, utilization) samples and computes linear regression slope.
- Engine evaluates each provider-window pair: compares utilization against dynamic threshold, projects minutes-to-exhaustion, and assigns health status.
- Allocator divides the hard cap budget across subscribed instances weighted by state (active 1.0, idle 0.3, paused 0.0) with configurable overcommit.
- Clients query with X-API-Key header for a recommendation
(
PROCEED/PROCEED_SMALL_ONLY/STOP) tailored to their framework.
This project is licensed under the GNU General Public License v3.0.