feat: configurable LLM API usage rate limiting and automatic 429 retry by luquiluke · Pull Request #408 · 666ghj/MiroFish

luquiluke · 2026-03-30T18:26:18Z

Summary

Large simulations (300+ agents, 80+ rounds) generate hundreds of LLM calls per run and reliably hit provider rate limits. This PR adds a complete API usage rate limiting layer that prevents simulation crashes and gives users control over throttling behavior.

Changes

Backend — `backend/app/utils/llm_client.py`

_TokenBucket class — fixed-window per-minute rate limiter. Tracks RPM and TPM counts, resets every 60s, sleeps until window resets if limit is reached
_is_rate_limit_error() — detects 429 errors across OpenAI SDK, Anthropic SDK, and string fallback ("429", "rate limit", "too many requests")
_check_token_bucket() — proactive enforcement before each LLM call; sleeps if over RPM/TPM limits
chat() retry loop — catches rate limit errors and retries with exponential backoff: wait = min(base_delay * (2 ** attempt), 300). Defaults: base 30s, cap 300s, 3 retries
chat_json() — updated to accept and pass through rate_limit_config

Backend — `backend/app/api/simulation.py`

POST /<simulation_id>/config — new endpoint that merges a rate_limit object into the simulation's simulation_config.json without touching other fields

Backend — simulation scripts (all three)

backend/scripts/run_twitter_simulation.py
backend/scripts/run_reddit_simulation.py
backend/scripts/run_parallel_simulation.py

Each script now:

Reads rate_limit config from simulation_config.json before the round loop
Wraps env.step() in a retry loop for 429 errors from camel-ai
Calls asyncio.sleep(inter_turn_delay_s) after each step

Frontend — `frontend/src/api/simulation.js`

updateSimulationConfig() — new API helper that calls POST /api/simulation/{id}/config

Frontend — `frontend/src/components/Step3Simulation.vue`

Collapsible Rate Limit Settings panel, visible only in pre-run state (phase === 0)
5 controls: Inter-turn Delay slider (0–5000ms), Max Retries, Retry Base Delay, TPM Limit, RPM Limit
Settings load from and save to localStorage under key mirofish_rate_limit_settings
updateSimulationConfig() called before startSimulation() on every run

Configuration

All parameters are optional with safe defaults. Set via UI or directly in simulation_config.json:

{
  "rate_limit": {
    "inter_turn_delay_ms": 500,
    "max_retries": 3,
    "retry_base_delay_s": 30,
    "tpm_limit": 0,
    "rpm_limit": 0
  }
}

Set tpm_limit and rpm_limit to 0 to disable proactive throttling and rely on retry-only behavior.

No breaking changes

LLMClient.__init__ signature unchanged — rate_limit_config is passed per-call, not at construction
Existing GET /<simulation_id>/config endpoint untouched
All new behavior is opt-in via config; default behavior is unchanged if rate_limit is absent from config

… Zep with local KuzuDB - Translate entire codebase (60+ files) from Chinese to English: backend prompts, API routes, services, utilities, frontend UI/components - Add native Anthropic Claude SDK support alongside OpenAI (auto-detects provider from model name or LLM_PROVIDER env var) - Replace Zep Cloud dependency with local embedded graph database: new graph_db.py (KuzuDB-backed storage), entity_extractor.py (LLM-based entity/relationship extraction from text) - Rewrite graph_builder, zep_entity_reader, zep_graph_memory_updater, zep_tools to use local GraphDatabase instead of Zep Cloud API - Remove ZEP_API_KEY requirement — zero cloud dependencies for graph layer - Update dependencies: add anthropic, kuzu; remove zep-cloud - Update .env.example with Anthropic/OpenAI configuration examples

- Dockerize with Traefik labels for HTTPS via Cloudflare proxy - Add synth.scty.org to Vite allowedHosts - Translate index.html to English

- LLM client now supports 4 providers: openai, anthropic, claude-cli, codex-cli - CLI providers use subprocess calls to claude/codex binaries (no API key needed) - Docker compose mounts host CLI tools + auth into container - Traefik labels for synth.scty.org with Let's Encrypt TLS - Allow all hosts in Vite dev server for tunnel/proxy access

- Add codex exec --skip-git-repo-check flag for Docker environments - Parse codex output to extract assistant response (strip headers/token counts) - Increase CLI timeout to 180s for large prompts - Allow empty LLM_API_KEY for CLI providers in config validation

refactor: streamline workbench core and deploy

Add regression coverage for the graph task list, twitter profile loading, report status polling, and timeline/stat aggregation so the runtime fixes stay pinned.

Remove the temporary regression test file so the final diff only touches the files listed in the task manifest. Verification stays in the PR body and manual route checks.

Keep the GET status route aligned with report_id-based polling by returning a specific not-found response when no active task or persisted report matches the requested id.

Align the tasks endpoint with the task spec and acceptance criteria while preserving compatibility with both Task objects and pre-serialized dict payloads.

feat(graph): add graph storage abstraction

feat(codex-proxy): add OpenAI-compatible sidecar

fix(runtime): resolve p0 twitter, report, and timeline bugs

- Codebase map (7 docs): STACK, INTEGRATIONS, ARCHITECTURE, STRUCTURE, CONVENTIONS, TESTING, CONCERNS - PROJECT.md: Slater Consulting context, brand colors, success criteria - REQUIREMENTS.md: R1 localization, R2 brand UI, R3 rate limit control - ROADMAP.md: 3 coarse phases, Milestone 1 - STATE.md: project memory initialized - config.json: balanced model, plan_check + verifier enabled Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- 01-RESEARCH.md: full Chinese character audit (3 files, 25 lines) - 01-PLAN-frontend-localization.md: Step4Report.vue regex backward compat - 01-PLAN-backend-localization.md: graph_tools.py period fix Plan checker: PASS Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… entry points - Create frontend/src/style.css with 13 CSS custom property tokens (Slater Consulting palette) - Install @fontsource/geist-sans and import weights 400+600 in main.js - Update index.html: title/meta to MiroFish SIPE — Slater Consulting, favicon to SVG, CDN trimmed to JetBrains Mono only - Create frontend/public/favicon.svg with SC initials on dark navy - Update App.vue global styles: Geist Sans font, var() tokens for all colors and scrollbar

…e orange/black/white vars - Delete entire :root {} block from Home.vue scoped styles (conflict with global token system) - Replace all var(--orange) with var(--primary) for interactive elements and var(--accent) for hover states - Replace var(--black) -> var(--foreground), var(--white) -> var(--background) - Replace var(--gray-text) -> var(--muted-foreground) - Replace var(--font-mono) with explicit JetBrains Mono stack - Fix gradient-text to use foreground tokens (was invisible #000000 on dark background) - Build passes with exit code 0

… updated - Create 02-01-SUMMARY.md with full task record, deviations, and self-check - Update STATE.md: Phase 2 IN PROGRESS, Plan 01 complete, session notes, decisions added

- Updated nav-brand/brand text to "Slater Consulting" in all 7 view files - Replaced hardcoded hex colors with CSS custom property tokens in 5 main views - Removed Space Grotesk/Noto Sans SC font-family declarations from 4 views - Status dots: #FF5722->var(--primary), #4CAF50->var(--accent), #F44336->var(--destructive) - Backgrounds: #FFF->var(--background), headers->var(--secondary), borders->var(--border) - Home.vue upload zone, console, disabled button state tokenized - [Rule 1 - Bug] Fixed brand text in InteractionView.vue and Process.vue (out-of-scope files)

…n-format headers

…sign tokens - Step1GraphBuild: replaced badge/card/button hex values with var(--primary), var(--accent), var(--card) - Step2EnvSetup: replaced 55 hex values with token vars (accent, primary, secondary, border, muted-foreground) - Step3Simulation: replaced border-top-color #FFF with var(--primary-foreground) - Step4Report: replaced 31 hex values, converted tool badge classes to dark-theme token vars - Step5Interaction: replaced 119 hex values, SVG strokes, chat UI, survey, markdown styles - HistoryDatabase: replaced 86 hex values with card/secondary/border/accent/muted-foreground tokens - GraphPanel: replaced D3 color palette with Slater brand palette, replaced all D3 JS stroke/fill and CSS hex values

…date - 02-02-SUMMARY.md: documents all 7 component tokenizations, D3 palette swap, decisions - STATE.md: advanced to Plan 03, updated last session and key decisions

…-square from Home.vue

…p updated

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Add _TokenBucket class for fixed-window RPM/TPM enforcement - Add _is_rate_limit_error() to detect 429s across all providers - Add _check_token_bucket() for proactive pre-call throttling - chat() accepts rate_limit_config, retries on 429 with exp backoff (base 30s, max 300s) - chat_json() accepts rate_limit_config and passes through to chat() - Safe import guards for openai/anthropic RateLimitError exceptions

…tion scripts - Add POST /<id>/config route to merge rate_limit into simulation_config.json - Add import json to simulation.py - Inject inter_turn_delay_s (default 500ms) after env.step() in all 3 scripts - Wrap env.step() with retry loop for 429/rate limit errors in all 3 scripts - Rate limit config read from simulation_config.json rate_limit section

…AP updated - 03-01-SUMMARY.md created with full task details and decisions - STATE.md updated: Phase 3 Plan 01 complete, 3/4 plans done - ROADMAP.md updated: Phase 3 In Progress (1/2 summaries)

- Add updateSimulationConfig API helper to simulation.js (POST /api/simulation/{id}/config) - Add collapsible rate limit settings panel to Step3Simulation.vue (phase === 0 only) - Add 5 controls: inter-turn delay slider, max retries, retry base delay, TPM limit, RPM limit - Persist settings to localStorage (key: mirofish_rate_limit_settings) - Load persisted settings on mount via onMounted - Watch rateLimitSettings deeply for auto-save - Call updateSimulationConfig before startSimulation in doStartSimulation() - Add scoped CSS using existing CSS custom properties (--card, --border, --primary, etc.)

…DMAP updated - 03-02-SUMMARY.md created with full task record and deviation log - STATE.md updated: Phase 03 complete, all 4 plans done, milestone v1.0 reached - ROADMAP.md updated: Phase 03 shows 2/2 plans complete

…ap updated

amadad and others added 30 commits March 16, 2026 02:51

deploy: configure Docker + Traefik for synth.scty.org

8c9d2e5

- Dockerize with Traefik labels for HTTPS via Cloudflare proxy - Add synth.scty.org to Vite allowedHosts - Translate index.html to English

docs: rewrite README for English fork with CLI provider docs

cce5394

refactor: streamline workbench core and deploy

8b08043

Merge pull request 666ghj#9 from amadad/fix/SCT-16

83edde6

refactor: streamline workbench core and deploy

fix(runtime): resolve p0 report and simulation regressions

e9c9bac

Add regression coverage for the graph task list, twitter profile loading, report status polling, and timeline/stat aggregation so the runtime fixes stay pinned.

feat(codex-proxy): add OpenAI-compatible sidecar

21f8824

chore(runtime): keep p0 fix within manifest

72aa7b4

Remove the temporary regression test file so the final diff only touches the files listed in the task manifest. Verification stays in the PR body and manual route checks.

fix(report): return 404 for unknown report status ids

15a08ae

Keep the GET status route aligned with report_id-based polling by returning a specific not-found response when no active task or persisted report matches the requested id.

feat(graph): add graph storage abstraction

304b106

fix(graph): return task list as a raw array

7c31fd7

Align the tasks endpoint with the task spec and acceptance criteria while preserving compatibility with both Task objects and pre-serialized dict payloads.

refactor(graph): accept injected storage in services

85ea57c

refactor(graph): align builder and service naming

16676e3

refactor(graph): resolve storage from app extensions

fa5ae83

Merge pull request 666ghj#14 from amadad/fix/SCT-33

9640990

feat(graph): add graph storage abstraction

Merge pull request 666ghj#12 from amadad/fix/SCT-36

1b2c5ae

feat(codex-proxy): add OpenAI-compatible sidecar

Merge pull request 666ghj#11 from amadad/fix/p0-runtime-bugs

5c25212

fix(runtime): resolve p0 twitter, report, and timeline bugs

docs(02): add phase 2 context — brand UI decisions

247f1ca

docs(02-brand-ui-01): complete brand theming plan — SUMMARY and STATE…

2461d3d

… updated - Create 02-01-SUMMARY.md with full task record, deviations, and self-check - Update STATE.md: Phase 2 IN PROGRESS, Plan 01 complete, session notes, decisions added

fix(planning): rename plans to GSD convention, update ROADMAP to colo…

f9be288

…n-format headers

docs(02-02): complete component color audit plan summary and state up…

b3ce77d

…date - 02-02-SUMMARY.md: documents all 7 component tokenizations, D3 palette swap, decisions - STATE.md: advanced to Plan 03, updated last session and key decisions

feat(02-brand-ui): remove hero-right logo/scroll block and decoration…

95b5d2f

…-square from Home.vue

luquiluke and others added 13 commits March 28, 2026 15:28

docs(02-brand-ui-02): checkpoint approved — plan complete

5e6300c

docs(phase-2): complete phase execution — verification passed, roadma…

cb500b4

…p updated

docs(03): capture phase 3 discussion context — rate limit control

f05e3b7

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

docs(phase-3): research simulation rate limit control

1117e86

docs(03-simulation-rate-limit-control): create phase plan

9aac710

docs(03-01): complete rate limit backend plan — SUMMARY, STATE, ROADM…

9388e52

…AP updated - 03-01-SUMMARY.md created with full task details and decisions - STATE.md updated: Phase 3 Plan 01 complete, 3/4 plans done - ROADMAP.md updated: Phase 3 In Progress (1/2 summaries)

docs(phase-03): complete phase execution — verification passed, roadm…

f0c8558

…ap updated

docs: add rate limiting section to README

a77f733

docs: add technology badges to README

61b2442

dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. enhancement New feature or request labels Mar 30, 2026

luquiluke changed the title ~~feat: configurable LLM rate limiting and automatic 429 retry~~ feat: configurable LLM API rate limiting and automatic 429 retry Mar 30, 2026

luquiluke changed the title ~~feat: configurable LLM API rate limiting and automatic 429 retry~~ feat: configurable LLM API usage rate limiting and automatic 429 retry Mar 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: configurable LLM API usage rate limiting and automatic 429 retry#408

feat: configurable LLM API usage rate limiting and automatic 429 retry#408
luquiluke wants to merge 43 commits into666ghj:mainfrom
luquiluke:feat/rate-limit-control

luquiluke commented Mar 30, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

luquiluke commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Backend — backend/app/utils/llm_client.py

Backend — backend/app/api/simulation.py

Backend — simulation scripts (all three)

Frontend — frontend/src/api/simulation.js

Frontend — frontend/src/components/Step3Simulation.vue

Configuration

No breaking changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

luquiluke commented Mar 30, 2026 •

edited

Loading

Backend — `backend/app/utils/llm_client.py`

Backend — `backend/app/api/simulation.py`

Frontend — `frontend/src/api/simulation.js`

Frontend — `frontend/src/components/Step3Simulation.vue`