Skip to content

feat: configurable LLM API usage rate limiting and automatic 429 retry#408

Open
luquiluke wants to merge 43 commits into666ghj:mainfrom
luquiluke:feat/rate-limit-control
Open

feat: configurable LLM API usage rate limiting and automatic 429 retry#408
luquiluke wants to merge 43 commits into666ghj:mainfrom
luquiluke:feat/rate-limit-control

Conversation

@luquiluke
Copy link
Copy Markdown

@luquiluke luquiluke commented Mar 30, 2026

Summary

Large simulations (300+ agents, 80+ rounds) generate hundreds of LLM calls per run and reliably hit provider rate limits. This PR adds a complete API usage rate limiting layer that prevents simulation crashes and gives users control over throttling behavior.

Changes

Backend — backend/app/utils/llm_client.py

  • _TokenBucket class — fixed-window per-minute rate limiter. Tracks RPM and TPM counts, resets every 60s, sleeps until window resets if limit is reached
  • _is_rate_limit_error() — detects 429 errors across OpenAI SDK, Anthropic SDK, and string fallback ("429", "rate limit", "too many requests")
  • _check_token_bucket() — proactive enforcement before each LLM call; sleeps if over RPM/TPM limits
  • chat() retry loop — catches rate limit errors and retries with exponential backoff: wait = min(base_delay * (2 ** attempt), 300). Defaults: base 30s, cap 300s, 3 retries
  • chat_json() — updated to accept and pass through rate_limit_config

Backend — backend/app/api/simulation.py

  • POST /<simulation_id>/config — new endpoint that merges a rate_limit object into the simulation's simulation_config.json without touching other fields

Backend — simulation scripts (all three)

  • backend/scripts/run_twitter_simulation.py
  • backend/scripts/run_reddit_simulation.py
  • backend/scripts/run_parallel_simulation.py

Each script now:

  1. Reads rate_limit config from simulation_config.json before the round loop
  2. Wraps env.step() in a retry loop for 429 errors from camel-ai
  3. Calls asyncio.sleep(inter_turn_delay_s) after each step

Frontend — frontend/src/api/simulation.js

  • updateSimulationConfig() — new API helper that calls POST /api/simulation/{id}/config

Frontend — frontend/src/components/Step3Simulation.vue

  • Collapsible Rate Limit Settings panel, visible only in pre-run state (phase === 0)
  • 5 controls: Inter-turn Delay slider (0–5000ms), Max Retries, Retry Base Delay, TPM Limit, RPM Limit
  • Settings load from and save to localStorage under key mirofish_rate_limit_settings
  • updateSimulationConfig() called before startSimulation() on every run

Configuration

All parameters are optional with safe defaults. Set via UI or directly in simulation_config.json:

{
  "rate_limit": {
    "inter_turn_delay_ms": 500,
    "max_retries": 3,
    "retry_base_delay_s": 30,
    "tpm_limit": 0,
    "rpm_limit": 0
  }
}

Set tpm_limit and rpm_limit to 0 to disable proactive throttling and rely on retry-only behavior.

No breaking changes

  • LLMClient.__init__ signature unchanged — rate_limit_config is passed per-call, not at construction
  • Existing GET /<simulation_id>/config endpoint untouched
  • All new behavior is opt-in via config; default behavior is unchanged if rate_limit is absent from config

amadad and others added 30 commits March 16, 2026 02:51
… Zep with local KuzuDB

- Translate entire codebase (60+ files) from Chinese to English:
  backend prompts, API routes, services, utilities, frontend UI/components
- Add native Anthropic Claude SDK support alongside OpenAI
  (auto-detects provider from model name or LLM_PROVIDER env var)
- Replace Zep Cloud dependency with local embedded graph database:
  new graph_db.py (KuzuDB-backed storage), entity_extractor.py
  (LLM-based entity/relationship extraction from text)
- Rewrite graph_builder, zep_entity_reader, zep_graph_memory_updater,
  zep_tools to use local GraphDatabase instead of Zep Cloud API
- Remove ZEP_API_KEY requirement — zero cloud dependencies for graph layer
- Update dependencies: add anthropic, kuzu; remove zep-cloud
- Update .env.example with Anthropic/OpenAI configuration examples
- Dockerize with Traefik labels for HTTPS via Cloudflare proxy
- Add synth.scty.org to Vite allowedHosts
- Translate index.html to English
- LLM client now supports 4 providers: openai, anthropic, claude-cli, codex-cli
- CLI providers use subprocess calls to claude/codex binaries (no API key needed)
- Docker compose mounts host CLI tools + auth into container
- Traefik labels for synth.scty.org with Let's Encrypt TLS
- Allow all hosts in Vite dev server for tunnel/proxy access
- Add codex exec --skip-git-repo-check flag for Docker environments
- Parse codex output to extract assistant response (strip headers/token counts)
- Increase CLI timeout to 180s for large prompts
- Allow empty LLM_API_KEY for CLI providers in config validation
refactor: streamline workbench core and deploy
Add regression coverage for the graph task list, twitter profile loading, report status polling, and timeline/stat aggregation so the runtime fixes stay pinned.
Remove the temporary regression test file so the final diff only touches the files listed in the task manifest. Verification stays in the PR body and manual route checks.
Keep the GET status route aligned with report_id-based polling by returning a specific not-found response when no active task or persisted report matches the requested id.
Align the tasks endpoint with the task spec and acceptance criteria while preserving compatibility with both Task objects and pre-serialized dict payloads.
feat(graph): add graph storage abstraction
feat(codex-proxy): add OpenAI-compatible sidecar
fix(runtime): resolve p0 twitter, report, and timeline bugs
- Codebase map (7 docs): STACK, INTEGRATIONS, ARCHITECTURE, STRUCTURE, CONVENTIONS, TESTING, CONCERNS
- PROJECT.md: Slater Consulting context, brand colors, success criteria
- REQUIREMENTS.md: R1 localization, R2 brand UI, R3 rate limit control
- ROADMAP.md: 3 coarse phases, Milestone 1
- STATE.md: project memory initialized
- config.json: balanced model, plan_check + verifier enabled

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- 01-RESEARCH.md: full Chinese character audit (3 files, 25 lines)
- 01-PLAN-frontend-localization.md: Step4Report.vue regex backward compat
- 01-PLAN-backend-localization.md: graph_tools.py period fix

Plan checker: PASS

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… entry points

- Create frontend/src/style.css with 13 CSS custom property tokens (Slater Consulting palette)
- Install @fontsource/geist-sans and import weights 400+600 in main.js
- Update index.html: title/meta to MiroFish SIPE — Slater Consulting, favicon to SVG, CDN trimmed to JetBrains Mono only
- Create frontend/public/favicon.svg with SC initials on dark navy
- Update App.vue global styles: Geist Sans font, var() tokens for all colors and scrollbar
…e orange/black/white vars

- Delete entire :root {} block from Home.vue scoped styles (conflict with global token system)
- Replace all var(--orange) with var(--primary) for interactive elements and var(--accent) for hover states
- Replace var(--black) -> var(--foreground), var(--white) -> var(--background)
- Replace var(--gray-text) -> var(--muted-foreground)
- Replace var(--font-mono) with explicit JetBrains Mono stack
- Fix gradient-text to use foreground tokens (was invisible #000000 on dark background)
- Build passes with exit code 0
… updated

- Create 02-01-SUMMARY.md with full task record, deviations, and self-check
- Update STATE.md: Phase 2 IN PROGRESS, Plan 01 complete, session notes, decisions added
- Updated nav-brand/brand text to "Slater Consulting" in all 7 view files
- Replaced hardcoded hex colors with CSS custom property tokens in 5 main views
- Removed Space Grotesk/Noto Sans SC font-family declarations from 4 views
- Status dots: #FF5722->var(--primary), #4CAF50->var(--accent), #F44336->var(--destructive)
- Backgrounds: #FFF->var(--background), headers->var(--secondary), borders->var(--border)
- Home.vue upload zone, console, disabled button state tokenized
- [Rule 1 - Bug] Fixed brand text in InteractionView.vue and Process.vue (out-of-scope files)
…sign tokens

- Step1GraphBuild: replaced badge/card/button hex values with var(--primary), var(--accent), var(--card)
- Step2EnvSetup: replaced 55 hex values with token vars (accent, primary, secondary, border, muted-foreground)
- Step3Simulation: replaced border-top-color #FFF with var(--primary-foreground)
- Step4Report: replaced 31 hex values, converted tool badge classes to dark-theme token vars
- Step5Interaction: replaced 119 hex values, SVG strokes, chat UI, survey, markdown styles
- HistoryDatabase: replaced 86 hex values with card/secondary/border/accent/muted-foreground tokens
- GraphPanel: replaced D3 color palette with Slater brand palette, replaced all D3 JS stroke/fill and CSS hex values
…date

- 02-02-SUMMARY.md: documents all 7 component tokenizations, D3 palette swap, decisions
- STATE.md: advanced to Plan 03, updated last session and key decisions
luquiluke and others added 13 commits March 28, 2026 15:28
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add _TokenBucket class for fixed-window RPM/TPM enforcement
- Add _is_rate_limit_error() to detect 429s across all providers
- Add _check_token_bucket() for proactive pre-call throttling
- chat() accepts rate_limit_config, retries on 429 with exp backoff (base 30s, max 300s)
- chat_json() accepts rate_limit_config and passes through to chat()
- Safe import guards for openai/anthropic RateLimitError exceptions
…tion scripts

- Add POST /<id>/config route to merge rate_limit into simulation_config.json
- Add import json to simulation.py
- Inject inter_turn_delay_s (default 500ms) after env.step() in all 3 scripts
- Wrap env.step() with retry loop for 429/rate limit errors in all 3 scripts
- Rate limit config read from simulation_config.json rate_limit section
…AP updated

- 03-01-SUMMARY.md created with full task details and decisions
- STATE.md updated: Phase 3 Plan 01 complete, 3/4 plans done
- ROADMAP.md updated: Phase 3 In Progress (1/2 summaries)
- Add updateSimulationConfig API helper to simulation.js (POST /api/simulation/{id}/config)
- Add collapsible rate limit settings panel to Step3Simulation.vue (phase === 0 only)
- Add 5 controls: inter-turn delay slider, max retries, retry base delay, TPM limit, RPM limit
- Persist settings to localStorage (key: mirofish_rate_limit_settings)
- Load persisted settings on mount via onMounted
- Watch rateLimitSettings deeply for auto-save
- Call updateSimulationConfig before startSimulation in doStartSimulation()
- Add scoped CSS using existing CSS custom properties (--card, --border, --primary, etc.)
…DMAP updated

- 03-02-SUMMARY.md created with full task record and deviation log
- STATE.md updated: Phase 03 complete, all 4 plans done, milestone v1.0 reached
- ROADMAP.md updated: Phase 03 shows 2/2 plans complete
@dosubot dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. enhancement New feature or request labels Mar 30, 2026
@luquiluke luquiluke changed the title feat: configurable LLM rate limiting and automatic 429 retry feat: configurable LLM API rate limiting and automatic 429 retry Mar 30, 2026
@luquiluke luquiluke changed the title feat: configurable LLM API rate limiting and automatic 429 retry feat: configurable LLM API usage rate limiting and automatic 429 retry Mar 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants