Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 3 additions & 23 deletions docs/benchmarks.md → docs/evals.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Running Benchmarks
# Running Evals

<em class="wags-brand">wags</em> includes evaluation support for the [Berkeley Function Call Leaderboard (BFCL)](https://gorilla.cs.berkeley.edu/leaderboard.html), enabling systematic testing of LLM function calling capabilities across multi-turn conversations.
Here's how to run BFCL multi-turn evaluations with <span class="wags-brand">wags</span>.

## Setup

Expand Down Expand Up @@ -33,7 +33,7 @@ git submodule update --remote
# Run specific test
.venv/bin/pytest 'tests/benchmarks/bfcl/test_bfcl.py::test_bfcl[multi_turn_base_121]'

# Run test category
# Run test category (multi_turn_base, multi_turn_miss_func, multi_turn_miss_param, multi_turn_long_context)
.venv/bin/pytest tests/benchmarks/bfcl/test_bfcl.py -k "multi_turn_miss_func"
```

Expand Down Expand Up @@ -69,27 +69,7 @@ Validate existing logs without running new tests:
.venv/bin/pytest tests/benchmarks/bfcl/test_bfcl.py --validate-only --log-dir outputs/experiment1/raw
```

## Test Categories

- **multi_turn_base**: Standard multi-turn function calling (800 tests)
- **multi_turn_miss_func**: Tests handling of missing function scenarios
- **multi_turn_miss_param**: Tests handling of missing parameters
- **multi_turn_long_context**: Context window stress tests with overwhelming information
- **Memory tests**: Tests with key-value, vector, or recursive summarization backends


## Developer Guide

1. **Discovery**: pytest collects tests from `loader.find_all_test_ids()`
2. **Setup**: Creates MCP servers wrapping BFCL API classes using `uv run python`
3. **Execution**: Runs multi-turn conversations with FastAgent
4. **Serialization**: Saves complete message history to `complete.json`
5. **Extraction**: Extracts tool calls from JSON (preserves what FastAgent drops)
6. **Validation**: Uses BFCL validators to check correctness
7. **Result**: Pass/fail based on `validation["valid"]`

## Further Reading

- **Test organization and patterns**: See [tests/README.md](../tests/README.md)
- **BFCL leaderboard**: Visit [gorilla.cs.berkeley.edu](https://gorilla.cs.berkeley.edu/leaderboard.html)
- **Official BFCL repository**: [github.com/ShishirPatil/gorilla](https://github.com/ShishirPatil/gorilla)
14 changes: 11 additions & 3 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,12 +10,20 @@ The <span class="wags-brand">wags</span> toolkit is based on state-of-the-art re

<div class="grid cards" markdown>

- **[Getting Started](quickstart.md)**
- **[Quick Start](quickstart.md)**

Install <span class="wags-brand">wags</span> and create your first <span class="wags-brand">wags</span> enhanced server in minutes. See [`servers/github/`](https://github.com/chughtapan/wags/tree/main/servers/github) for a complete example.
Connect to existing <span class="wags-brand">wags</span> servers and start using enhanced MCP features immediately.

- **[Onboarding New Servers](onboarding.md)**

Create your first <span class="wags-brand">wags</span> enhanced server with middleware. See [`servers/github/`](https://github.com/chughtapan/wags/tree/main/servers/github) for a complete example.

- **[Middleware](middleware/overview.md)**

Learn about available middleware and how to use them
Learn about available middleware and how to use them.

- **[Running Evals](evals.md)**

Execute BFCL benchmarks to evaluate agent performance with <span class="wags-brand">wags</span> servers.

</div>
51 changes: 20 additions & 31 deletions docs/middleware/todo.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,40 +11,29 @@
proxy = create_proxy(server, enable_todos=True)
```

## Example Workflow
## How it works?

```python
# Agent receives task: "Build project and fix errors"

# 1. Agent calls TodoWrite
TodoWrite(todos=[
{"content": "Build project", "status": "pending"},
{"content": "Fix errors", "status": "pending"}
])

# 2. Agent starts first task
TodoWrite(todos=[
{"content": "Build project", "status": "in_progress"},
{"content": "Fix errors", "status": "pending"}
])

# 3. Agent runs build tool, finds 3 errors

# 4. Agent updates todos
TodoWrite(todos=[
{"content": "Build project", "status": "completed"},
{"content": "Fix error in utils.py", "status": "in_progress"},
{"content": "Fix error in api.py", "status": "pending"},
{"content": "Fix error in models.py", "status": "pending"}
])

# 5. Agent fixes each error, updating status after each one
# ... continues until all completed
```
When todo integration is enabled, the target MCP server is provided TodoWrite tools which help track the detailed tasks to be done and the current progress. Additionally, detailed instructions are provided to break down task into actionable steps, update status before and after each task, maintain exactly one task as `in_progress`, mark completed immediately after finishing, etc.

## How it works?
For example, when an agent receives the task "Build project and fix errors":

1. Agent calls `TodoWrite` to create initial todos:
- "Build project" (pending)
- "Fix errors" (pending)

2. Agent starts first task by updating status to `in_progress`

3. Agent runs build tool, finds 3 errors

4. Agent updates todos to reflect discovered errors:
- "Build project" (completed)
- "Fix error in utils.py" (in_progress)
- "Fix error in api.py" (pending)
- "Fix error in models.py" (pending)

5. Agent fixes each error, updating status after each one until all completed

When todo integration is enabled, the target MCP server is provided TodoWrite tools which help track the detailed tasks to be done and the current progress. Additionally, detailed instructions are provided to break down task into actionable steps, update status before and after each task, maintain exactly one task as `in_progress`, mark completed immediately after finishing, etc. See `src/wags/middleware/todo.py` for the full instruction text.
See `src/wags/middleware/todo.py` for the full instruction text.

**Note:** Instructions from proxy server must be included in the agent prompt. For `fast-agent` the `{{serverInstructions}}` macro enables this feature.

Expand Down
98 changes: 98 additions & 0 deletions docs/onboarding.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
# Onboarding New Servers

Learn how to create a <span class="wags-brand">wags</span> proxy server that wraps existing MCP servers with middleware.

## Prerequisites

- <span class="wags-brand">wags</span> installed (see [Quick Start](quickstart.md) for installation)
- Basic understanding of [MCP (Model Context Protocol)](https://modelcontextprotocol.io/docs/getting-started/intro)
- An existing MCP server to work with

## Creating a <span class="wags-brand">wags</span> Proxy Server

<span class="wags-brand">wags</span> provides the `quickstart` command to generate proxy servers that wrap existing MCP servers with middleware.

!!! tip "Complete Example Available"
The complete implementation for the [GitHub MCP Server](https://github.com/github/github-mcp-server) is in `servers/github/`.

### Step 1: Prepare Your MCP Server Configuration

Create a configuration file that describes your MCP server. Save it as `config.json`:

```json title="config.json"
--8<-- "snippets/quickstart/config.json"
```

### Step 2: Generate the Proxy Server

Use the `quickstart` command to generate middleware handlers and main file:

```bash
# Generate both handlers and main files
wags quickstart config.json

# Or with custom file names
wags quickstart config.json \
--handlers-file github_handlers.py \
--main-file github_proxy.py
```

### Step 3: Add Middleware Decorators

Edit the generated handlers file to add middleware decorators:

```python linenums="1" title="handlers.py"
--8<-- "snippets/quickstart/handlers.py"
```

### Step 4: Attach Middleware to your MCP Server

The automatically generated main.py includes (commented) code to attach <span class="wags-brand">wags</span> middleware to your MCP server. You should edit the file to uncomment the middleware you need:

```python linenums="1" title="main.py"
--8<-- "snippets/quickstart/main.py"
```

### Step 5: Run Your Proxy Server

```bash
# Run directly
python main.py

# Or use wags start-server
wags start-server servers/your-server
```

Your proxy server is now running!

### Step 6 (Optional): Add to Shared Configuration

To use your server with `wags run`, add it to `servers/fastagent.config.yaml`:

```yaml
mcp:
servers:
your-server:
transport: stdio
command: wags
args:
- start-server
- servers/your-server
env:
API_KEY: ${YOUR_API_KEY}
roots:
- uri: https://example.com/allowed
name: "Allowed Resources"
```

Now you can connect to your server with:

```bash
wags run --servers your-server
```

## Learn More

- **[Middleware Overview](middleware/overview.md)** - Understand how middleware works
- **[Roots](middleware/roots.md)** - Access control with URI templates
- **[Elicitation](middleware/elicitation.md)** - Parameter review and collection
69 changes: 15 additions & 54 deletions docs/quickstart.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,15 @@
# Quick Start Guide

Get up and running with <span class="wags-brand">wags</span> in just a few minutes. This guide will walk you through installation and creating a proxy server with middleware for existing MCP servers.
Get started with <span class="wags-brand">wags</span> by connecting to existing wags servers.

## Prerequisites

- Python 3.13.5 or higher
- [`uv` package manager](https://docs.astral.sh/uv/getting-started/installation/) (recommended) or `pip`
- Basic understanding of [MCP (Model Context Protocol)](https://modelcontextprotocol.io/docs/getting-started/intro)
- An existing MCP server to work with
- [`uv` package manager](https://docs.astral.sh/uv/getting-started/installation/)

## Installation

> ⚠️ **Warning**: <span class="wags-brand">wags</span> is based on ongoing research and is under active development. Features and APIs may change.
> ⚠️ **Warning**: <span class="wags-brand">wags</span> is based on ongoing research and is under active development. Features and APIs may change. Some experimental MCP features are only supported in our fork of [fast-agent](https://github.com/chughtapan/fast-agent) included with <span class="wags-brand">wags</span>.

```bash
# Clone the repository
Expand All @@ -38,61 +36,24 @@ WAGS version 0.1.0
FastMCP version x.x.x
```

## Creating a <span class="wags-brand">wags</span> Proxy Server
## Getting Started

<span class="wags-brand">wags</span> provides the `quickstart` command to generate proxy servers that wrap existing MCP servers with middleware.

!!! tip "Complete Example Available"
The complete implementation for the [GitHub MCP Server](https://github.com/github/github-mcp-server) is in `servers/github/`.

### Step 1: Prepare Your MCP Server Configuration

Create a configuration file that describes your MCP server. Save it as `config.json`:

```json title="config.json"
--8<-- "snippets/quickstart/config.json"
```

### Step 2: Generate the Proxy Server

Use the `quickstart` command to generate middleware handlers and main file:
The easiest way to connect to wags servers is using the `wags run` command:

```bash
# Generate both handlers and main files
wags quickstart config.json

# Or with custom file names
wags quickstart config.json \
--handlers-file github_handlers.py \
--main-file github_proxy.py
```
# Connect to all configured servers
wags run

### Step 3: Add Middleware Decorators
# Connect to specific servers only
wags run --servers github

Edit the generated handlers file to add middleware decorators:

```python linenums="1" title="handlers.py"
--8<-- "snippets/quickstart/handlers.py"
```

### Step 4: Attach Middleware to your MCP Server

The automatically generated main.py includes (commented) code to attach <span class="wags-brand">wags</span> middleware to your MCP server. You should edit the file to uncomment the middleware you need:

```python linenums="1" title="main.py"
--8<-- "snippets/quickstart/main.py"
```

### Step 5: Run Your Proxy Server

```bash
python main.py
# Use a different model
wags run --model claude-3-5-sonnet-20241022
```

Your proxy server is now running!
`wags run` invokes fast-agent with a configuration file ([`servers/fastagent.config.yaml`](https://github.com/chughtapan/wags/blob/main/servers/fastagent.config.yaml)) and basic instructions ([`src/wags/utils/agent_instructions.txt`](https://github.com/chughtapan/wags/blob/main/src/wags/utils/agent_instructions.txt)), and connects to all servers by default. You can configure which servers to connect to using the `--servers` flag or create your own configuration and instruction files - see the [fast-agent documentation](https://github.com/chughtapan/fast-agent) for more details.

## Learn More
## Next Steps

- **[Middleware Overview](middleware/overview.md)** - Understand how middleware works
- **[Roots](middleware/roots.md)** - Access control with URI templates
- **[Elicitation](middleware/elicitation.md)** - Parameter review and collection
- **[Onboarding New Servers](onboarding.md)** - Create your own wags server with middleware
- **[Middleware Overview](middleware/overview.md)** - Understand available middleware features
3 changes: 2 additions & 1 deletion mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -38,12 +38,13 @@ theme:
nav:
- Home: index.md
- Quick Start: quickstart.md
- Onboarding Servers: onboarding.md
- Middleware:
- Overview: middleware/overview.md
- TodoList: middleware/todo.md
- Roots: middleware/roots.md
- Elicitation: middleware/elicitation.md
- Benchmarks: benchmarks.md
- Running Evals: evals.md

plugins:
- search
Expand Down
15 changes: 15 additions & 0 deletions servers/fastagent.config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
mcp:
servers:
github:
transport: stdio
command: wags
args:
- start-server
- servers/github
env:
GITHUB_PERSONAL_ACCESS_TOKEN: ${GITHUB_PERSONAL_ACCESS_TOKEN}
roots:
- uri: https://github.com/anthropics/courses
name: "Anthropics Courses"
- uri: https://github.com/modelcontextprotocol/
name: "MCP Organization"
Loading