feat: configurable LLM API usage rate limiting and automatic 429 retry#408
Open
luquiluke wants to merge 43 commits into666ghj:mainfrom
Open
feat: configurable LLM API usage rate limiting and automatic 429 retry#408luquiluke wants to merge 43 commits into666ghj:mainfrom
luquiluke wants to merge 43 commits into666ghj:mainfrom
Conversation
… Zep with local KuzuDB - Translate entire codebase (60+ files) from Chinese to English: backend prompts, API routes, services, utilities, frontend UI/components - Add native Anthropic Claude SDK support alongside OpenAI (auto-detects provider from model name or LLM_PROVIDER env var) - Replace Zep Cloud dependency with local embedded graph database: new graph_db.py (KuzuDB-backed storage), entity_extractor.py (LLM-based entity/relationship extraction from text) - Rewrite graph_builder, zep_entity_reader, zep_graph_memory_updater, zep_tools to use local GraphDatabase instead of Zep Cloud API - Remove ZEP_API_KEY requirement — zero cloud dependencies for graph layer - Update dependencies: add anthropic, kuzu; remove zep-cloud - Update .env.example with Anthropic/OpenAI configuration examples
- Dockerize with Traefik labels for HTTPS via Cloudflare proxy - Add synth.scty.org to Vite allowedHosts - Translate index.html to English
- LLM client now supports 4 providers: openai, anthropic, claude-cli, codex-cli - CLI providers use subprocess calls to claude/codex binaries (no API key needed) - Docker compose mounts host CLI tools + auth into container - Traefik labels for synth.scty.org with Let's Encrypt TLS - Allow all hosts in Vite dev server for tunnel/proxy access
- Add codex exec --skip-git-repo-check flag for Docker environments - Parse codex output to extract assistant response (strip headers/token counts) - Increase CLI timeout to 180s for large prompts - Allow empty LLM_API_KEY for CLI providers in config validation
refactor: streamline workbench core and deploy
Add regression coverage for the graph task list, twitter profile loading, report status polling, and timeline/stat aggregation so the runtime fixes stay pinned.
Remove the temporary regression test file so the final diff only touches the files listed in the task manifest. Verification stays in the PR body and manual route checks.
Keep the GET status route aligned with report_id-based polling by returning a specific not-found response when no active task or persisted report matches the requested id.
Align the tasks endpoint with the task spec and acceptance criteria while preserving compatibility with both Task objects and pre-serialized dict payloads.
feat(graph): add graph storage abstraction
feat(codex-proxy): add OpenAI-compatible sidecar
fix(runtime): resolve p0 twitter, report, and timeline bugs
- Codebase map (7 docs): STACK, INTEGRATIONS, ARCHITECTURE, STRUCTURE, CONVENTIONS, TESTING, CONCERNS - PROJECT.md: Slater Consulting context, brand colors, success criteria - REQUIREMENTS.md: R1 localization, R2 brand UI, R3 rate limit control - ROADMAP.md: 3 coarse phases, Milestone 1 - STATE.md: project memory initialized - config.json: balanced model, plan_check + verifier enabled Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- 01-RESEARCH.md: full Chinese character audit (3 files, 25 lines) - 01-PLAN-frontend-localization.md: Step4Report.vue regex backward compat - 01-PLAN-backend-localization.md: graph_tools.py period fix Plan checker: PASS Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… entry points - Create frontend/src/style.css with 13 CSS custom property tokens (Slater Consulting palette) - Install @fontsource/geist-sans and import weights 400+600 in main.js - Update index.html: title/meta to MiroFish SIPE — Slater Consulting, favicon to SVG, CDN trimmed to JetBrains Mono only - Create frontend/public/favicon.svg with SC initials on dark navy - Update App.vue global styles: Geist Sans font, var() tokens for all colors and scrollbar
…e orange/black/white vars
- Delete entire :root {} block from Home.vue scoped styles (conflict with global token system)
- Replace all var(--orange) with var(--primary) for interactive elements and var(--accent) for hover states
- Replace var(--black) -> var(--foreground), var(--white) -> var(--background)
- Replace var(--gray-text) -> var(--muted-foreground)
- Replace var(--font-mono) with explicit JetBrains Mono stack
- Fix gradient-text to use foreground tokens (was invisible #000000 on dark background)
- Build passes with exit code 0
… updated - Create 02-01-SUMMARY.md with full task record, deviations, and self-check - Update STATE.md: Phase 2 IN PROGRESS, Plan 01 complete, session notes, decisions added
- Updated nav-brand/brand text to "Slater Consulting" in all 7 view files - Replaced hardcoded hex colors with CSS custom property tokens in 5 main views - Removed Space Grotesk/Noto Sans SC font-family declarations from 4 views - Status dots: #FF5722->var(--primary), #4CAF50->var(--accent), #F44336->var(--destructive) - Backgrounds: #FFF->var(--background), headers->var(--secondary), borders->var(--border) - Home.vue upload zone, console, disabled button state tokenized - [Rule 1 - Bug] Fixed brand text in InteractionView.vue and Process.vue (out-of-scope files)
…sign tokens - Step1GraphBuild: replaced badge/card/button hex values with var(--primary), var(--accent), var(--card) - Step2EnvSetup: replaced 55 hex values with token vars (accent, primary, secondary, border, muted-foreground) - Step3Simulation: replaced border-top-color #FFF with var(--primary-foreground) - Step4Report: replaced 31 hex values, converted tool badge classes to dark-theme token vars - Step5Interaction: replaced 119 hex values, SVG strokes, chat UI, survey, markdown styles - HistoryDatabase: replaced 86 hex values with card/secondary/border/accent/muted-foreground tokens - GraphPanel: replaced D3 color palette with Slater brand palette, replaced all D3 JS stroke/fill and CSS hex values
…date - 02-02-SUMMARY.md: documents all 7 component tokenizations, D3 palette swap, decisions - STATE.md: advanced to Plan 03, updated last session and key decisions
…-square from Home.vue
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add _TokenBucket class for fixed-window RPM/TPM enforcement - Add _is_rate_limit_error() to detect 429s across all providers - Add _check_token_bucket() for proactive pre-call throttling - chat() accepts rate_limit_config, retries on 429 with exp backoff (base 30s, max 300s) - chat_json() accepts rate_limit_config and passes through to chat() - Safe import guards for openai/anthropic RateLimitError exceptions
…tion scripts - Add POST /<id>/config route to merge rate_limit into simulation_config.json - Add import json to simulation.py - Inject inter_turn_delay_s (default 500ms) after env.step() in all 3 scripts - Wrap env.step() with retry loop for 429/rate limit errors in all 3 scripts - Rate limit config read from simulation_config.json rate_limit section
…AP updated - 03-01-SUMMARY.md created with full task details and decisions - STATE.md updated: Phase 3 Plan 01 complete, 3/4 plans done - ROADMAP.md updated: Phase 3 In Progress (1/2 summaries)
- Add updateSimulationConfig API helper to simulation.js (POST /api/simulation/{id}/config)
- Add collapsible rate limit settings panel to Step3Simulation.vue (phase === 0 only)
- Add 5 controls: inter-turn delay slider, max retries, retry base delay, TPM limit, RPM limit
- Persist settings to localStorage (key: mirofish_rate_limit_settings)
- Load persisted settings on mount via onMounted
- Watch rateLimitSettings deeply for auto-save
- Call updateSimulationConfig before startSimulation in doStartSimulation()
- Add scoped CSS using existing CSS custom properties (--card, --border, --primary, etc.)
…DMAP updated - 03-02-SUMMARY.md created with full task record and deviation log - STATE.md updated: Phase 03 complete, all 4 plans done, milestone v1.0 reached - ROADMAP.md updated: Phase 03 shows 2/2 plans complete
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Large simulations (300+ agents, 80+ rounds) generate hundreds of LLM calls per run and reliably hit provider rate limits. This PR adds a complete API usage rate limiting layer that prevents simulation crashes and gives users control over throttling behavior.
Changes
Backend —
backend/app/utils/llm_client.py_TokenBucketclass — fixed-window per-minute rate limiter. Tracks RPM and TPM counts, resets every 60s, sleeps until window resets if limit is reached_is_rate_limit_error()— detects 429 errors across OpenAI SDK, Anthropic SDK, and string fallback ("429","rate limit","too many requests")_check_token_bucket()— proactive enforcement before each LLM call; sleeps if over RPM/TPM limitschat()retry loop — catches rate limit errors and retries with exponential backoff:wait = min(base_delay * (2 ** attempt), 300). Defaults: base 30s, cap 300s, 3 retrieschat_json()— updated to accept and pass throughrate_limit_configBackend —
backend/app/api/simulation.pyPOST /<simulation_id>/config— new endpoint that merges arate_limitobject into the simulation'ssimulation_config.jsonwithout touching other fieldsBackend — simulation scripts (all three)
backend/scripts/run_twitter_simulation.pybackend/scripts/run_reddit_simulation.pybackend/scripts/run_parallel_simulation.pyEach script now:
rate_limitconfig fromsimulation_config.jsonbefore the round loopenv.step()in a retry loop for 429 errors from camel-aiasyncio.sleep(inter_turn_delay_s)after each stepFrontend —
frontend/src/api/simulation.jsupdateSimulationConfig()— new API helper that callsPOST /api/simulation/{id}/configFrontend —
frontend/src/components/Step3Simulation.vuephase === 0)localStorageunder keymirofish_rate_limit_settingsupdateSimulationConfig()called beforestartSimulation()on every runConfiguration
All parameters are optional with safe defaults. Set via UI or directly in
simulation_config.json:{ "rate_limit": { "inter_turn_delay_ms": 500, "max_retries": 3, "retry_base_delay_s": 30, "tpm_limit": 0, "rpm_limit": 0 } }Set
tpm_limitandrpm_limitto0to disable proactive throttling and rely on retry-only behavior.No breaking changes
LLMClient.__init__signature unchanged —rate_limit_configis passed per-call, not at constructionGET /<simulation_id>/configendpoint untouchedrate_limitis absent from config