A sophisticated probabilistic forecasting system that leverages multi-agent LLM reasoning, prompt chaining, web scraping, and statistical aggregation to generate accurate predictions for Metaculus questions.
This forecasting bot competes in the Metaculus Bot Tournament by predicting outcomes for various binary, multiple-choice and numeric questions. It employs a multi-stage retrieval-augmented architecture with parallel LLM agent processing, real-time news analysis, and ensemble aggregation to generate well-calibrated probability estimates.
- Multi-agent reasoning system utilizing both Claude 3.7 Sonnet and OpenAI GPT-O3/O4-mini models
- Asynchronous workflow for concurrent API calls and efficient processing
- Robust web content extraction from diverse sources using a multi-strategy approach
- Intelligent search aggregation with Google Search, AskNews, and Perplexity APIs
- Statistical ensemble methods for probability calibration and forecast aggregation
- Comprehensive benchmarking with normalized peer scoring and simulation
The system employs a multi-stage architecture:
- Question Analysis: Parses Metaculus questions to extract relevant context and parameters
- Search Query Generation: Uses LLMs to generate targeted search queries based on question content
- Information Retrieval: Asynchronously retrieves and extracts content from multiple sources
- Context Synthesis: Filters and consolidates retrieved information
- Parallel Forecast Generation: Multiple forecaster instances generate independent predictions
- Ensemble Aggregation: Combines forecasts with weighted averaging based on model reliability
- Benchmarking: Evaluates performance using normalized peer scoring metrics
Question → Query Generation → Parallel Retrieval → Content Extraction
↓
Final Prediction ← Ensemble Aggregation ← Parallel Forecasting ← Context Synthesis
The bot implements a forecasting committee with five specialized agents:
- Two Claude 3.7 Sonnet instances for robust reasoning about evidence
- Two GPT-o4-mini instances for complementary perspective
- One GPT-o3 instance with double weighting for ensemble diversity
Each agent approaches the problem differently, with prompt engineering directing some toward outside view (historical/reference class) reasoning and others toward inside view (mechanistic/causal) reasoning.
Implemented with Python's asyncio and aiohttp libraries:
- Concurrent API calls with intelligent retry mechanisms
- Configurable timeout handling and backoff strategies
- Fault-tolerant design with graceful degradation when services are unavailable
The FastContentExtractor system combines multiple extraction strategies:
- BeautifulSoup-based DOM analysis with site-specific selectors
- Trafilatura content extraction for structured content
- Readability algorithm implementation for article text
- Fallback mechanisms for handling diverse web content formats
- Metadata extraction and entity recognition
Integrates multiple information retrieval services:
- Google Search API for general web content
- Google News API for current events
- AskNews API for specialized news retrieval
- Perplexity API for research synthesis
Specialized forecast generation based on question type:
- Binary questions: probabilities as percentages (0-100%)
- Multiple-choice questions: probability distributions across options
- Numeric questions: 201-point CDF extrapolated from predicted percentile values
- Statistical calibration using historical performance data
Comprehensive evaluation system:
- Normalized peer scoring with simulated forecaster populations
- Monte Carlo simulation of good/bad forecaster behaviors
- Performance comparison against Metaculus community predictions
- Visualization of prediction distributions and calibration
- Python 3.9+
- API keys for:
- OpenAI (GPT)
- Anthropic (Claude)
- Google Serper
- Perplexity
- AskNews
- Bright Data (optional)
# Clone repository
git clone https://github.com/yourusername/metaculus-forecasting-bot.git
cd metaculus-forecasting-bot
# Install dependencies
pip install -r requirements.txt
# Configure environment variables
cp .env.example .env
# Edit .env with your API keys# Run a benchmark test
python benchmark.py
# Generate forecasts for specific questions
python forecaster.py --question_id 12345- Integration of structured numerical data sources (e.g., economic indicators, polls)
- UI for forecast exploration and explanation
- Customizable forecasting strategies based on question domain
- Fine-tuning of smaller LLMs on curated forecasting dataset
This project is licensed under the GNU Affero General Public License v3.0 (AGPLv3).
You may use, modify, and distribute this code under the terms of the license, provided that:
- Any derivative works are also licensed under AGPLv3
- If used as part of a hosted service (e.g., deployed forecasting interface), source code must be made available
- Proper attribution is maintained, including citation in research papers or public reports
For the full license text, see LICENSE.