karpathy · khuynh22 · Dec 24, 2025 · Dec 31, 2025
diff --git a/.env.example b/.env.example
@@ -0,0 +1,4 @@
+# OpenRouter API Key
+# Get your API key at: https://openrouter.ai/
+# Make sure to purchase credits or enable automatic top-up
+OPENROUTER_API_KEY=sk-or-v1-...
diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md
@@ -0,0 +1,115 @@
+# LLM Council - AI Coding Agent Instructions
+
+## Project Overview
+
+LLM Council is a 3-stage deliberation system where multiple LLMs collaboratively answer questions via OpenRouter. Think "ChatGPT but with a panel of experts who debate and synthesize answers."
+
+**Available in Two Modes:**
+1. **Web Application**: Interactive React UI with FastAPI backend
+2. **MCP Server**: Model Context Protocol server for Claude Desktop, VS Code, etc.
+
+**Core Flow:**
+1. **Stage 1**: Parallel queries to all council models (defined in [backend/config.py](../backend/config.py))
+2. **Stage 2**: Each model ranks anonymized responses (prevents favoritism)
+3. **Stage 3**: Chairman model synthesizes final answer from all inputs + rankings
+
+## Critical Architecture Decisions
+
+### Anonymization Strategy
+- Stage 2 uses "Response A", "Response B" labels to prevent model bias
+- Backend creates `label_to_model` mapping for de-anonymization
+- Frontend displays real model names **client-side** in bold (e.g., "**gpt-5.1**")
+- This is intentional: models judge anonymously, users see transparently
+
+### Metadata NOT Persisted
+Key gotcha: `label_to_model` and `aggregate_rankings` are computed per-request but **NOT saved to JSON**. They exist only in API responses and frontend state. See [backend/main.py](../backend/main.py) POST endpoint and [backend/storage.py](../backend/storage.py).
+
+### Port Configuration
+Backend runs on **port 8001** (not 8000). This was a deliberate user choice due to port conflicts. CORS allows localhost:5173 (Vite) and localhost:3000.
+
+## Development Workflow
+
+### Running the App
+```bash
+# Web App: Use start script
+./start.sh
+
+# Web App: Manual (two terminals)
+# Terminal 1: uv run llm-council-web
+# Terminal 2: cd frontend && npm run dev
+
+# MCP Server: For Claude Desktop / VS Code
+uv run llm-council-mcp
+```
+
+### Dependencies
+- **Backend**: `uv sync` (Python 3.10+, FastAPI, httpx)
+- **Frontend**: `cd frontend && npm install` (React 19, Vite, react-markdown)
+
+### Environment Setup
+Create `.env` in root with:
+```
+OPENROUTER_API_KEY=sk-or-v1-...
+```
+
+## Code Patterns & Conventions
+
+### MCP Server Architecture (backend/mcp_server.py)
+The MCP server exposes council deliberation as tools and resources:
+- **Tools**: `council_query`, `council_stage1`, `council_list_conversations`, `council_get_conversation`
+- **Resources**: `council://conversations/{id}` for accessing saved deliberations
+- **Progress Notifications**: Uses MCP's `send_log_message` to report stage progress
+- **Stateful**: Reuses existing [backend/storage.py](../backend/storage.py) for conversation persistence
+- **Model Customization**: Tools accept optional `council_models` and `chairman_model` parameters to override [backend/config.py](../backend/config.py) defaults
+
+### Stage 2 Prompt Format (Strict Parsing)
+See [backend/council.py](../backend/council.py) `stage2_collect_rankings()`. The prompt enforces:
+```
+FINAL RANKING:
+1. Response C
+2. Response A
+```
+No extra text after ranking. This enables `parse_ranking_from_text()` to reliably extract results.
+
+### Async Parallelism
+All model queries use `asyncio.gather()` for speed. See [backend/openrouter.py](../backend/openrouter.py) `query_models_parallel()`.
+
+### Graceful Degradation
+If a model fails in Stage 1, continue with successful responses. Returns `None` on failure, filters before proceeding.
+
+### React Component Structure
+- [frontend/src/App.jsx](../frontend/src/App.jsx): Conversation orchestration
+- [frontend/src/components/Stage1.jsx](../frontend/src/components/Stage1.jsx): Tab view of individual responses
+- [frontend/src/components/Stage2.jsx](../frontend/src/components/Stage2.jsx): Peer rankings + aggregate scores ("Street Cred")
+- [frontend/src/components/Stage3.jsx](../frontend/src/components/Stage3.jsx): Final synthesized answer (green background #f0fff0)
+
+### Styling
+- **Light mode** theme (not dark mode)
+- Primary blue: #4a90e2
+- All markdown wrapped in `.markdown-content` class with 12px padding (prevents cluttered look)
+- See [frontend/src/index.css](../frontend/src/index.css) for global markdown styles
+
+## Common Tasks
+
+### Changing Council Models
+Edit [backend/config.py](../backend/config.py) `COUNCIL_MODELS` and `CHAIRMAN_MODEL`. Use OpenRouter model identifiers (e.g., "anthropic/claude-sonnet-4.5").
+
+### Adding a New Stage
+1. Add async function to [backend/council.py](../backend/council.py)
+2. Update [backend/main.py](../backend/main.py) POST endpoint to call it
+3. Add response data to storage schema if persisting (else just return via API)
+4. Create React component in [frontend/src/components/](../frontend/src/components/)
+5. Import and render in [frontend/src/components/ChatInterface.jsx](../frontend/src/components/ChatInterface.jsx)
+
+### Debugging API Responses
+Check [backend/main.py](../backend/main.py) POST `/api/conversations/{id}/message`. Returns full response with `stage1`, `stage2`, `stage3` + metadata. Frontend stores metadata in state but backend doesn't persist it.
+
+## Project Philosophy
+
+Per [README.md](../README.md): "99% vibe coded as a fun Saturday hack." This means:
+- Prioritize working code over perfect architecture
+- Code is meant to be ephemeral - customize freely
+- No long-term support planned
+- JSON file storage is intentionally simple (not a database)
+
+When making changes, maintain the spirit: fast iterations, clear stage boundaries, and transparent multi-LLM deliberation.
diff --git a/MCP_QUICKSTART.md b/MCP_QUICKSTART.md
@@ -0,0 +1,157 @@
+# LLM Council MCP Server - Quick Start Guide
+
+## Installation
+
+1. **Install dependencies:**
+   ```bash
+   uv sync
+   ```
+
+2. **Configure your API key:**
+
+   Copy `.env.example` to `.env`:
+   ```bash
+   cp .env.example .env
+   ```
+
+   Edit `.env` and add your OpenRouter API key:
+   ```
+   OPENROUTER_API_KEY=sk-or-v1-your-actual-key-here
+   ```
+
+3. **Test the server:**
+   ```bash
+   uv run llm-council-mcp
+   ```
+
+   The server will start and wait for JSON-RPC input (this is normal - MCP clients communicate via stdin/stdout).
+
+## Claude Desktop Configuration
+
+Add this to your Claude Desktop config:
+
+**macOS:** `~/Library/Application Support/Claude/claude_desktop_config.json`
+**Windows:** `%APPDATA%\Claude\claude_desktop_config.json`
+
+```json
+{
+  "mcpServers": {
+    "llm-council": {
+      "command": "uv",
+      "args": ["--directory", "C:\\path\\to\\llm-council", "run", "llm-council-mcp"],
+      "env": {
+        "OPENROUTER_API_KEY": "sk-or-v1-your-actual-key"
+      }
+    }
+  }
+}
+```
+
+**Important:** Replace `C:\\path\\to\\llm-council` with the actual path to this directory.
+- Windows: Use double backslashes `C:\\src\\llm-council`
+- macOS/Linux: Use forward slashes `/Users/you/llm-council`
+
+## VS Code Configuration
+
+Add this to your VS Code settings (`.vscode/settings.json` or User Settings):
+
+```json
+{
+  "mcp.servers": {
+    "llm-council": {
+      "command": "uv",
+      "args": ["--directory", "/path/to/llm-council", "run", "llm-council-mcp"],
+      "env": {
+        "OPENROUTER_API_KEY": "sk-or-v1-your-actual-key"
+      }
+    }
+  }
+}
+```
+
+## Usage Examples
+
+Once configured, you can use these tools in your MCP client:
+
+### 1. Full Council Query
+
+In Claude Desktop, ask:
+
+> "Use the council_query tool to answer: What are the key differences between supervised and unsupervised learning?"
+
+This will:
+- Stage 1: Get responses from all 4 council models
+- Stage 2: Each model ranks the others' responses (anonymized)
+- Stage 3: Chairman synthesizes the final answer
+- Save the conversation to history
+
+### 2. Quick Model Comparison (Stage 1 Only)
+
+> "Use council_stage1 to compare how different models explain recursion"
+
+This skips ranking and synthesis for faster results.
+
+### 3. Custom Models
+
+> "Use council_query with these models: ['openai/gpt-4', 'anthropic/claude-3-opus', 'google/gemini-pro'] to answer: What is the future of quantum computing?"
+
+### 4. Access Past Conversations
+
+> "List all my saved council conversations using council_list_conversations"
+
+> "Show me the full details of conversation ID abc-123 using council_get_conversation"
+
+## Available Tools
+
+| Tool | Purpose | Key Parameters |
+|------|---------|----------------|
+| `council_query` | Full 3-stage deliberation | `question`, `council_models` (optional), `chairman_model` (optional), `save_conversation` (default: true) |
+| `council_stage1` | Individual responses only | `question`, `council_models` (optional) |
+| `council_list_conversations` | List all saved conversations | None |
+| `council_get_conversation` | Get conversation details | `conversation_id` |
+
+## Available Resources
+
+- `council://conversations/{id}` - Access any saved conversation as a resource
+
+## Customizing Default Models
+
+Edit `backend/config.py` to change the default council members:
+
+```python
+COUNCIL_MODELS = [
+    "openai/gpt-5.1",
+    "google/gemini-3-pro-preview",
+    "anthropic/claude-sonnet-4.5",
+    "x-ai/grok-4",
+]
+
+CHAIRMAN_MODEL = "google/gemini-3-pro-preview"
+```
+
+These can be overridden per-query using tool parameters.
+
+## Troubleshooting
+
+### Server doesn't start
+- Check that your `.env` file exists with a valid `OPENROUTER_API_KEY`
+- Verify dependencies are installed: `uv sync`
+- Test imports: `uv run python -c "from backend.mcp_server import main; print('OK')"`
+
+### No response from tools
+- Council queries take 30-120 seconds to complete (multiple LLM calls)
+- Check OpenRouter API key has sufficient credits
+- Look for progress notifications in your MCP client logs
+
+### Permission errors on Windows
+- Ensure the path in config uses double backslashes: `C:\\src\\llm-council`
+- Run VS Code or Claude Desktop with appropriate permissions
+
+## Cost Considerations
+
+Each `council_query` makes 9 LLM API calls:
+- Stage 1: 4 council models in parallel
+- Stage 2: 4 models ranking in parallel
+- Stage 3: 1 chairman model
+
+Using cheaper models or `council_stage1` can reduce costs.