Claude/polymarket accuracy improvements by ethernetabc1-source · Pull Request #433 · 666ghj/MiroFish

ethernetabc1-source · 2026-04-01T15:59:04Z

No description provided.

… LLM retry - Remove `"traceback": traceback.format_exc()` from all 51 HTTP API responses in graph.py, simulation.py, report.py; add exc_info=True to server-side logs so full stack traces are still captured without leaking to clients - Replace CORS wildcard origins="*" with a configurable allowlist read from the new CORS_ORIGINS env var (defaults to localhost dev ports) - Change FLASK_DEBUG default from True → False to prevent accidental debug mode in production deployments - Add MIME magic-byte validation to FileParser._validate_file: blocks PE/ELF/ZIP files disguised with a .pdf/.txt extension, and enforces %PDF header on PDFs - Rewrite LLMClient.chat with exponential-backoff retry (max 3, 2/4/8 s) for RateLimitError/APITimeoutError/APIConnectionError; re-raise APIStatusError (4xx) immediately without wasting retries; surface specific openai exceptions instead of bare Exception - Add comparison_demo.py: self-contained before/after simulation of all four improvements, runnable without real API credentials https://claude.ai/code/session_01TSufK4MuqeYHvT6m3855CE

…use case Translate all service-layer system prompts and user prompt templates from Chinese to English, and remove all China-specific hardcoding: - ontology_generator: rewrite ONTOLOGY_SYSTEM_PROMPT in English; add geopolitical entity type examples (GovernmentOfficial, Military, ThinkTank, Diplomat, Trader, Analyst) for prediction market scenarios; translate user message template - oasis_profile_generator: change system prompt from '使用中文' to 'Write in English'; translate both individual and group persona prompt templates; change country field from '国家（使用中文，如"中国"）' to English country names - simulation_config_generator: replace Beijing/Chinese timezone assumptions in time config, event config, and agent config prompts with global news cycle patterns (analyst business hours, trader market hours, evening citizen peaks); update default time config peak hours from [19,20,21,22] to [14..21] UTC range; translate all system prompts and user prompts to English - report_agent: translate PLAN_SYSTEM_PROMPT, PLAN_USER_PROMPT_TEMPLATE, SECTION_SYSTEM_PROMPT_TEMPLATE, SECTION_USER_PROMPT_TEMPLATE, CHAT_SYSTEM_PROMPT_TEMPLATE, and all ReACT loop message strings to English; add 'predicted_probability' (0-100) field to the report outline JSON output; add mandatory "Prediction Verdict" final section with probability estimate and top factors driving the prediction up/down - zep_tools: translate sub-query generation, agent selection, interview question generation, and interview summary prompts to English; replace Chinese quote marks 「」 with standard double quotes; translate fallback strings - .env.example: add CORS_ORIGINS variable; update comments to English https://claude.ai/code/session_01TSufK4MuqeYHvT6m3855CE

…rdict Report outline now outputs a richer probability object: - predicted_probability: integer point estimate (0-100) - probability_low / probability_high: 80% confidence interval bounds - key_upside_factors / key_downside_factors: string arrays PLAN_SYSTEM_PROMPT updated with three-step calibration rule: 1. Base rate anchor (historical frequency of this event type) 2. Simulation signal (did agents escalate or de-escalate?) 3. Market anchor (compare to Polymarket/Metaculus price if provided) This forces probability to be grounded in base rates rather than pure LLM intuition — the main cause of overconfidence in single-run estimates. SECTION_SYSTEM_PROMPT for "Prediction Verdict" section now requires: a) Explicit base-rate statement b) Simulation signal summary c) Market comparison (if price given in scenario) d) Probability verdict with range e) Upside / downside risk bullet lists f) Confidence note explaining main uncertainty source New script: backend/scripts/ensemble_predict.py - Runs N independent simulations for the same simulation_id - Collects predicted_probability + range from each report - Aggregates: mean point estimate, stdev-widened confidence interval - Extracts consensus factors (mentioned in ≥2 runs) - Prints a formatted verdict table and optionally writes JSON output - Usage: python ensemble_predict.py --simulation-id sim_xxx --runs 3 https://claude.ai/code/session_01TSufK4MuqeYHvT6m3855CE

melevsky

Hermes Agent Code Review

Automated review completed ✅

✅ Approved

Security: Proper credential handling, removed traceback exposure
Reliability: LLM API retry mechanism for transient errors
Features: Well-implemented ensemble prediction for accuracy improvement
Internationalization: English prompt support for global use

💡 Notes

Large PR (1310+ lines) but changes are well-structured
Print statements in scripts/demos are acceptable
Consider adding tests for ensemble functionality in future PRs

Recommendation: Ready for merge. This enhances both security and prediction accuracy.

Reviewed by Hermes Agent at $(date)

claude added 3 commits March 28, 2026 08:43

dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. enhancement New feature or request labels Apr 1, 2026

melevsky approved these changes Apr 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Claude/polymarket accuracy improvements#433

Claude/polymarket accuracy improvements#433
ethernetabc1-source wants to merge 3 commits into666ghj:mainfrom
ethernetabc1-source:claude/polymarket-accuracy-improvements

ethernetabc1-source commented Apr 1, 2026

Uh oh!

melevsky left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ethernetabc1-source commented Apr 1, 2026

Uh oh!

melevsky left a comment

Choose a reason for hiding this comment

Hermes Agent Code Review

✅ Approved

💡 Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants