Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .env.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# OpenRouter API Key
# Get your API key at: https://openrouter.ai/
# Make sure to purchase credits or enable automatic top-up
OPENROUTER_API_KEY=sk-or-v1-...
115 changes: 115 additions & 0 deletions .github/copilot-instructions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
# LLM Council - AI Coding Agent Instructions

## Project Overview

LLM Council is a 3-stage deliberation system where multiple LLMs collaboratively answer questions via OpenRouter. Think "ChatGPT but with a panel of experts who debate and synthesize answers."

**Available in Two Modes:**
1. **Web Application**: Interactive React UI with FastAPI backend
2. **MCP Server**: Model Context Protocol server for Claude Desktop, VS Code, etc.

**Core Flow:**
1. **Stage 1**: Parallel queries to all council models (defined in [backend/config.py](../backend/config.py))
2. **Stage 2**: Each model ranks anonymized responses (prevents favoritism)
3. **Stage 3**: Chairman model synthesizes final answer from all inputs + rankings

## Critical Architecture Decisions

### Anonymization Strategy
- Stage 2 uses "Response A", "Response B" labels to prevent model bias
- Backend creates `label_to_model` mapping for de-anonymization
- Frontend displays real model names **client-side** in bold (e.g., "**gpt-5.1**")
- This is intentional: models judge anonymously, users see transparently

### Metadata NOT Persisted
Key gotcha: `label_to_model` and `aggregate_rankings` are computed per-request but **NOT saved to JSON**. They exist only in API responses and frontend state. See [backend/main.py](../backend/main.py) POST endpoint and [backend/storage.py](../backend/storage.py).

### Port Configuration
Backend runs on **port 8001** (not 8000). This was a deliberate user choice due to port conflicts. CORS allows localhost:5173 (Vite) and localhost:3000.

## Development Workflow

### Running the App
```bash
# Web App: Use start script
./start.sh

# Web App: Manual (two terminals)
# Terminal 1: uv run llm-council-web
# Terminal 2: cd frontend && npm run dev

# MCP Server: For Claude Desktop / VS Code
uv run llm-council-mcp
```

### Dependencies
- **Backend**: `uv sync` (Python 3.10+, FastAPI, httpx)
- **Frontend**: `cd frontend && npm install` (React 19, Vite, react-markdown)

### Environment Setup
Create `.env` in root with:
```
OPENROUTER_API_KEY=sk-or-v1-...
```

## Code Patterns & Conventions

### MCP Server Architecture (backend/mcp_server.py)
The MCP server exposes council deliberation as tools and resources:
- **Tools**: `council_query`, `council_stage1`, `council_list_conversations`, `council_get_conversation`
- **Resources**: `council://conversations/{id}` for accessing saved deliberations
- **Progress Notifications**: Uses MCP's `send_log_message` to report stage progress
- **Stateful**: Reuses existing [backend/storage.py](../backend/storage.py) for conversation persistence
- **Model Customization**: Tools accept optional `council_models` and `chairman_model` parameters to override [backend/config.py](../backend/config.py) defaults

### Stage 2 Prompt Format (Strict Parsing)
See [backend/council.py](../backend/council.py) `stage2_collect_rankings()`. The prompt enforces:
```
FINAL RANKING:
1. Response C
2. Response A
```
No extra text after ranking. This enables `parse_ranking_from_text()` to reliably extract results.

### Async Parallelism
All model queries use `asyncio.gather()` for speed. See [backend/openrouter.py](../backend/openrouter.py) `query_models_parallel()`.

### Graceful Degradation
If a model fails in Stage 1, continue with successful responses. Returns `None` on failure, filters before proceeding.

### React Component Structure
- [frontend/src/App.jsx](../frontend/src/App.jsx): Conversation orchestration
- [frontend/src/components/Stage1.jsx](../frontend/src/components/Stage1.jsx): Tab view of individual responses
- [frontend/src/components/Stage2.jsx](../frontend/src/components/Stage2.jsx): Peer rankings + aggregate scores ("Street Cred")
- [frontend/src/components/Stage3.jsx](../frontend/src/components/Stage3.jsx): Final synthesized answer (green background #f0fff0)

### Styling
- **Light mode** theme (not dark mode)
- Primary blue: #4a90e2
- All markdown wrapped in `.markdown-content` class with 12px padding (prevents cluttered look)
- See [frontend/src/index.css](../frontend/src/index.css) for global markdown styles

## Common Tasks

### Changing Council Models
Edit [backend/config.py](../backend/config.py) `COUNCIL_MODELS` and `CHAIRMAN_MODEL`. Use OpenRouter model identifiers (e.g., "anthropic/claude-sonnet-4.5").

### Adding a New Stage
1. Add async function to [backend/council.py](../backend/council.py)
2. Update [backend/main.py](../backend/main.py) POST endpoint to call it
3. Add response data to storage schema if persisting (else just return via API)
4. Create React component in [frontend/src/components/](../frontend/src/components/)
5. Import and render in [frontend/src/components/ChatInterface.jsx](../frontend/src/components/ChatInterface.jsx)

### Debugging API Responses
Check [backend/main.py](../backend/main.py) POST `/api/conversations/{id}/message`. Returns full response with `stage1`, `stage2`, `stage3` + metadata. Frontend stores metadata in state but backend doesn't persist it.

## Project Philosophy

Per [README.md](../README.md): "99% vibe coded as a fun Saturday hack." This means:
- Prioritize working code over perfect architecture
- Code is meant to be ephemeral - customize freely
- No long-term support planned
- JSON file storage is intentionally simple (not a database)

When making changes, maintain the spirit: fast iterations, clear stage boundaries, and transparent multi-LLM deliberation.
157 changes: 157 additions & 0 deletions MCP_QUICKSTART.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,157 @@
# LLM Council MCP Server - Quick Start Guide

## Installation

1. **Install dependencies:**
```bash
uv sync
```

2. **Configure your API key:**

Copy `.env.example` to `.env`:
```bash
cp .env.example .env
```

Edit `.env` and add your OpenRouter API key:
```
OPENROUTER_API_KEY=sk-or-v1-your-actual-key-here
```

3. **Test the server:**
```bash
uv run llm-council-mcp
```

The server will start and wait for JSON-RPC input (this is normal - MCP clients communicate via stdin/stdout).

## Claude Desktop Configuration

Add this to your Claude Desktop config:

**macOS:** `~/Library/Application Support/Claude/claude_desktop_config.json`
**Windows:** `%APPDATA%\Claude\claude_desktop_config.json`

```json
{
"mcpServers": {
"llm-council": {
"command": "uv",
"args": ["--directory", "C:\\path\\to\\llm-council", "run", "llm-council-mcp"],
"env": {
"OPENROUTER_API_KEY": "sk-or-v1-your-actual-key"
}
}
}
}
```

**Important:** Replace `C:\\path\\to\\llm-council` with the actual path to this directory.
- Windows: Use double backslashes `C:\\src\\llm-council`
- macOS/Linux: Use forward slashes `/Users/you/llm-council`

## VS Code Configuration

Add this to your VS Code settings (`.vscode/settings.json` or User Settings):

```json
{
"mcp.servers": {
"llm-council": {
"command": "uv",
"args": ["--directory", "/path/to/llm-council", "run", "llm-council-mcp"],
"env": {
"OPENROUTER_API_KEY": "sk-or-v1-your-actual-key"
}
}
}
}
```

## Usage Examples

Once configured, you can use these tools in your MCP client:

### 1. Full Council Query

In Claude Desktop, ask:

> "Use the council_query tool to answer: What are the key differences between supervised and unsupervised learning?"

This will:
- Stage 1: Get responses from all 4 council models
- Stage 2: Each model ranks the others' responses (anonymized)
- Stage 3: Chairman synthesizes the final answer
- Save the conversation to history

### 2. Quick Model Comparison (Stage 1 Only)

> "Use council_stage1 to compare how different models explain recursion"

This skips ranking and synthesis for faster results.

### 3. Custom Models

> "Use council_query with these models: ['openai/gpt-4', 'anthropic/claude-3-opus', 'google/gemini-pro'] to answer: What is the future of quantum computing?"

### 4. Access Past Conversations

> "List all my saved council conversations using council_list_conversations"

> "Show me the full details of conversation ID abc-123 using council_get_conversation"

## Available Tools

| Tool | Purpose | Key Parameters |
|------|---------|----------------|
| `council_query` | Full 3-stage deliberation | `question`, `council_models` (optional), `chairman_model` (optional), `save_conversation` (default: true) |
| `council_stage1` | Individual responses only | `question`, `council_models` (optional) |
| `council_list_conversations` | List all saved conversations | None |
| `council_get_conversation` | Get conversation details | `conversation_id` |

## Available Resources

- `council://conversations/{id}` - Access any saved conversation as a resource

## Customizing Default Models

Edit `backend/config.py` to change the default council members:

```python
COUNCIL_MODELS = [
"openai/gpt-5.1",
"google/gemini-3-pro-preview",
"anthropic/claude-sonnet-4.5",
"x-ai/grok-4",
]

CHAIRMAN_MODEL = "google/gemini-3-pro-preview"
```

These can be overridden per-query using tool parameters.

## Troubleshooting

### Server doesn't start
- Check that your `.env` file exists with a valid `OPENROUTER_API_KEY`
- Verify dependencies are installed: `uv sync`
- Test imports: `uv run python -c "from backend.mcp_server import main; print('OK')"`

### No response from tools
- Council queries take 30-120 seconds to complete (multiple LLM calls)
- Check OpenRouter API key has sufficient credits
- Look for progress notifications in your MCP client logs

### Permission errors on Windows
- Ensure the path in config uses double backslashes: `C:\\src\\llm-council`
- Run VS Code or Claude Desktop with appropriate permissions

## Cost Considerations

Each `council_query` makes 9 LLM API calls:
- Stage 1: 4 council models in parallel
- Stage 2: 4 models ranking in parallel
- Stage 3: 1 chairman model

Using cheaper models or `council_stage1` can reduce costs.
Loading