Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 7 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,14 +22,14 @@ harbor run -p harbor_cookbook/recipes/<name> -a claude-code -m anthropic/claude-

| Name | Description |
|:--|:--|
| [simple&#8209;task](harbor_cookbook/recipes/simple-task/) | Minimal single-container task with a pytest verifier. |
| [multi&#8209;container](harbor_cookbook/recipes/multi-container/) | Docker Compose setup where the agent interacts with a companion service over the network. |
| [mcp&#8209;tools](harbor_cookbook/recipes/mcp-tools/) | Giving the agent custom tools via a FastMCP server in a companion container. |
| [simulated&#8209;user](harbor_cookbook/recipes/simulated-user/) | Agent discovers requirements by talking to a simulated user exposed as an MCP tool. |
| [computer&#8209;use&#8209;ubuntu](harbor_cookbook/recipes/computer-use-ubuntu/) | GUI interaction on an Ubuntu (XFCE4) virtual desktop via screenshot/click/type MCP tools. |
| [computer&#8209;use&#8209;windows](harbor_cookbook/recipes/computer-use-windows/) | GUI interaction on a remote Windows desktop (Daytona) via MCP tools. |
| [dns&#8209;blacklisting](harbor_cookbook/recipes/dns-blacklisting/) | Network-level hostname blacklisting with exact, wildcard, and regex rules. |
| [simple&#8209;task](harbor_cookbook/recipes/simple-task/) | Minimal single-container task. |
| [multi&#8209;container](harbor_cookbook/recipes/multi-container/) | Docker Compose task where the agent interacts with a locally hosted REST API. |
| [mcp&#8209;tools](harbor_cookbook/recipes/mcp-tools/) | Giving the agent custom tools via a locally hosted FastMCP server. |
| [multi&#8209;reward](harbor_cookbook/recipes/multi-reward/) | Multiple independent verifiers each producing their own score. |
| [simulated&#8209;user](harbor_cookbook/recipes/simulated-user/) | Agent discovers requirements by talking to a simulated user. |
| [computer&#8209;use&#8209;ubuntu](harbor_cookbook/recipes/computer-use-ubuntu/) | Computer use reference implementation on an Ubuntu virtual desktop. |
| [computer&#8209;use&#8209;windows](harbor_cookbook/recipes/computer-use-windows/) | Computer use reference implementation on a remote Windows desktop (Daytona). |
| [dns&#8209;blacklisting](harbor_cookbook/recipes/dns-blacklisting/) | Network-level hostname blacklisting with exact, wildcard, and regex rules. |

## Optimization Examples

Expand Down
9 changes: 4 additions & 5 deletions harbor_cookbook/gepa/optimize.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
# ///
"""GEPA prompt optimization for MedAgentBench.

Evolves a prompt template that wraps each task's instruction.md via
Evolves the agent prompt template that wraps each task's instruction.md via
Harbor's prompt_template_path mechanism. Evaluated by running a coding
agent (codex by default) on medagentbench@1.0 Harbor Trials.
"""
Expand Down Expand Up @@ -40,16 +40,15 @@
"""

OBJECTIVE = (
"Optimize a prompt template that wraps MedAgentBench task instructions. "
"Optimize a prompt template that wraps medical task instructions. "
"The prompt guides a coding agent on how to query a FHIR server and "
"answer clinical EHR questions. "
"Graded by the official MedAgentBench verifier (binary: 1=correct, 0=wrong)."
)

BACKGROUND = (
"MedAgentBench tasks span 10 categories: patient lookup, lab results, "
"The medical tasks span 10 categories: patient lookup, lab results, "
"vitals, medications, conditions, procedures, service requests, and "
"clinical reasoning. Each task runs in a Docker container with a FHIR "
"clinical reasoning. Each task runs with a FHIR "
"server at http://localhost:8080/fhir/.\n\n"
"The agent receives the task's instruction.md wrapped by the prompt "
"template being optimized. The agent must query the FHIR server "
Expand Down
2 changes: 1 addition & 1 deletion harbor_cookbook/gepa/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@
DEFAULT_MODEL = "openai/gpt-5-nano"
DEFAULT_ENVIRONMENT = EnvironmentType.DOCKER

# Single event loop shared across GEPA worker threads (required by Daytona's async singleton).
# Single event loop shared across GEPA worker threads.
_loop = asyncio.new_event_loop()
threading.Thread(target=_loop.run_forever, daemon=True).start()

Expand Down
6 changes: 3 additions & 3 deletions harbor_cookbook/recipes/computer-use-ubuntu/README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
# computer-use-ubuntu

Computer-use task where an agent interacts with an Ubuntu (XFCE4) virtual desktop to solve a multi-step GUI challenge. The tool set mirrors the [Anthropic computer-use demo](https://github.com/anthropics/claude-quickstarts/tree/main/computer-use-demo). Desktop setup based on the [Harbor OSWorld adapter](https://github.com/Mascobot/harbor/tree/main/adapters/osworld).
Computer-use task where an agent interacts with an Ubuntu (XFCE4) virtual desktop to solve a multi-step GUI challenge. The tool set mirrors the [Anthropic computer-use demo](https://github.com/anthropics/claude-quickstarts/tree/main/computer-use-demo).

## How it works

A companion container runs an Ubuntu desktop (Xvfb + XFCE4) alongside a FastMCP server that exposes computer-use tools. The desktop and MCP server share a container because the tools need direct access to the X display.
The Ubuntu desktop (Xvfb + XFCE4) runs in a separate container alongside a FastMCP server that exposes computer-use tools. The desktop and MCP server share a container because the tools need direct access to the X display.

A tkinter application (`challenge.py`) presents a multi-step challenge that requires the agent to click buttons, type text, and read the result from the screen. The task cannot be solved without genuine GUI interaction.

Expand All @@ -16,4 +16,4 @@ harbor run -p harbor_cookbook/recipes/computer-use-ubuntu --agent claude-code --

## Limitations

Multi-container tasks require the **docker** environment provider because they rely on Docker Compose networking. They are not supported on cloud providers (Daytona, Modal, E2B, etc.).
Multi-container tasks require the **docker** environment provider because they rely on Docker Compose networking. They are not supported on some of the cloud providers in Harbor (Modal, E2B, etc.).
9 changes: 2 additions & 7 deletions harbor_cookbook/recipes/computer-use-windows/README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# computer-use-windows

Computer-use task on a remote Windows desktop. A companion MCP server creates a [Daytona](https://daytona.io) Windows sandbox, deploys a multi-step tkinter challenge, and proxies computer-use tools (screenshot, click, type) to the agent. Same tool set as `computer-use-ubuntu`, backed by the Daytona Computer Use API.
Computer-use task on a remote Windows desktop. A MCP server creates a [Daytona](https://daytona.io) Windows sandbox, deploys a multi-step tkinter challenge, and exposes computer-use tools (screenshot, click, type) to the agent. The tool set mirrors the [Anthropic computer-use demo](https://github.com/anthropics/claude-quickstarts/tree/main/computer-use-demo).

> **Note:** Daytona Windows Computer Use is currently in early preview. Access requires a beta account at [win.trydaytona.com](https://win.trydaytona.com/) with the `windows-base` snapshot available.
> **Note:** Daytona Windows Computer Use is currently in early preview. Access requires a beta account at [win.trydaytona.com](https://win.trydaytona.com/) with the `windows-base` snapshot available. [More info](https://www.daytona.io/docs/en/computer-use/)

## Run

Expand All @@ -17,8 +17,3 @@ Your `.env` needs:
DAYTONA_API_KEY=your_key_here
DAYTONA_API_URL=https://win.trydaytona.com/api
```

## Limitations

- Requires internet access (MCP server connects to the Daytona API)
- Windows sandbox takes ~30-60s to start after creation
2 changes: 1 addition & 1 deletion harbor_cookbook/recipes/dns-blacklisting/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,4 +40,4 @@ Supported pattern formats:

## Limitations

This is still a teaching recipe, not a hardened network control. An agent could inspect or modify the seeded files, or bypass the recipe's routing setup entirely. The verifier therefore checks the final runtime behavior of the candidate domains rather than trusting the seed files alone.
The blacklisting is not airtight. An agent could inspect or modify the seeded files, or bypass the task's routing setup entirely. The access denied message should therefore communicate to the agent that the domain is not available in this environment so that the agent does not try to sidestep it. Also the grader in `tests/` can do some checks whether the agent complied.
6 changes: 3 additions & 3 deletions harbor_cookbook/recipes/mcp-tools/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# mcp-tools

Giving agents access to custom MCP tools via a FastMCP server running in a companion container.
Giving agents access to custom MCP tools via a FastMCP server running in a separate container.

## Structure

Expand All @@ -17,7 +17,7 @@ mcp-tools/
│ └── server.py
├── tests/
│ ├── test.sh # Verifier entrypoint
│ └── test_outputs.py # Pytest assertions
│ └── test_outputs.py # Pytest tests
└── solution/
└── solve.sh # Reference solution
```
Expand All @@ -38,4 +38,4 @@ harbor run -p harbor_cookbook/recipes/mcp-tools --agent claude-code --model anth

## Limitations

Multi-container tasks require the **docker** environment provider because they rely on Docker Compose networking. They are not supported on cloud providers (Daytona, Modal, E2B, etc.).
Multi-container tasks require the **docker** environment provider because they rely on Docker Compose networking. They are not supported on some of the cloud providers in Harbor (Modal, E2B, etc.).
2 changes: 1 addition & 1 deletion harbor_cookbook/recipes/multi-container/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,4 +36,4 @@ harbor run -p harbor_cookbook/recipes/multi-container --agent claude-code --mode

## Limitations

Multi-container tasks require the **docker** environment provider because they rely on Docker Compose networking. They are not supported on cloud providers (Daytona, Modal, E2B, etc.).
Multi-container tasks require the **docker** environment provider because they rely on Docker Compose networking. They are not supported on some of the cloud providers in Harbor (Modal, E2B, etc.).
4 changes: 2 additions & 2 deletions harbor_cookbook/recipes/simulated-user/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# simulated-user

Example implementation of a task with a simulated user that the agent can ask questions.
Example implementation of a task with a simulated user that the agent can interact with and ask questions.

## Structure

Expand Down Expand Up @@ -46,4 +46,4 @@ harbor run -p harbor_cookbook/recipes/simulated-user --agent claude-code --model

## Limitations

Multi-container tasks require the **docker** environment provider because they rely on Docker Compose networking. They are not supported on cloud providers (Daytona, Modal, E2B, etc.).
Multi-container tasks require the **docker** environment provider because they rely on Docker Compose networking. They are not supported on some of the cloud providers in Harbor (Modal, E2B, etc.).
Loading