harbor-framework · benediktstroebl · Mar 24, 2026 · Mar 24, 2026
diff --git a/README.md b/README.md
@@ -22,14 +22,14 @@ harbor run -p harbor_cookbook/recipes/<name> -a claude-code -m anthropic/claude-
 
 | Name | Description |
 |:--|:--|
-| [simple&#8209;task](harbor_cookbook/recipes/simple-task/) | Minimal single-container task with a pytest verifier. |
-| [multi&#8209;container](harbor_cookbook/recipes/multi-container/) | Docker Compose setup where the agent interacts with a companion service over the network. |
-| [mcp&#8209;tools](harbor_cookbook/recipes/mcp-tools/) | Giving the agent custom tools via a FastMCP server in a companion container. |
-| [simulated&#8209;user](harbor_cookbook/recipes/simulated-user/) | Agent discovers requirements by talking to a simulated user exposed as an MCP tool. |
-| [computer&#8209;use&#8209;ubuntu](harbor_cookbook/recipes/computer-use-ubuntu/) | GUI interaction on an Ubuntu (XFCE4) virtual desktop via screenshot/click/type MCP tools. |
-| [computer&#8209;use&#8209;windows](harbor_cookbook/recipes/computer-use-windows/) | GUI interaction on a remote Windows desktop (Daytona) via MCP tools. |
-| [dns&#8209;blacklisting](harbor_cookbook/recipes/dns-blacklisting/) | Network-level hostname blacklisting with exact, wildcard, and regex rules. |
+| [simple&#8209;task](harbor_cookbook/recipes/simple-task/) | Minimal single-container task. |
+| [multi&#8209;container](harbor_cookbook/recipes/multi-container/) | Docker Compose task where the agent interacts with a locally hosted REST API. |
+| [mcp&#8209;tools](harbor_cookbook/recipes/mcp-tools/) | Giving the agent custom tools via a locally hosted FastMCP server. |
 | [multi&#8209;reward](harbor_cookbook/recipes/multi-reward/) | Multiple independent verifiers each producing their own score. |
+| [simulated&#8209;user](harbor_cookbook/recipes/simulated-user/) | Agent discovers requirements by talking to a simulated user. |
+| [computer&#8209;use&#8209;ubuntu](harbor_cookbook/recipes/computer-use-ubuntu/) | Computer use reference implementation on an Ubuntu virtual desktop. |
+| [computer&#8209;use&#8209;windows](harbor_cookbook/recipes/computer-use-windows/) | Computer use reference implementation on a remote Windows desktop (Daytona). |
+| [dns&#8209;blacklisting](harbor_cookbook/recipes/dns-blacklisting/) | Network-level hostname blacklisting with exact, wildcard, and regex rules. |
 
 ## Optimization Examples
 

diff --git a/harbor_cookbook/gepa/optimize.py b/harbor_cookbook/gepa/optimize.py
@@ -4,7 +4,7 @@
 # ///
 """GEPA prompt optimization for MedAgentBench.
 
-Evolves a prompt template that wraps each task's instruction.md via
+Evolves the agent prompt template that wraps each task's instruction.md via
 Harbor's prompt_template_path mechanism.  Evaluated by running a coding
 agent (codex by default) on medagentbench@1.0 Harbor Trials.
 """
@@ -40,16 +40,15 @@
 """
 
 OBJECTIVE = (
-    "Optimize a prompt template that wraps MedAgentBench task instructions. "
+    "Optimize a prompt template that wraps medical task instructions. "
     "The prompt guides a coding agent on how to query a FHIR server and "
     "answer clinical EHR questions. "
-    "Graded by the official MedAgentBench verifier (binary: 1=correct, 0=wrong)."
 )
 
 BACKGROUND = (
-    "MedAgentBench tasks span 10 categories: patient lookup, lab results, "
+    "The medical tasks span 10 categories: patient lookup, lab results, "
     "vitals, medications, conditions, procedures, service requests, and "
-    "clinical reasoning. Each task runs in a Docker container with a FHIR "
+    "clinical reasoning. Each task runs with a FHIR "
     "server at http://localhost:8080/fhir/.\n\n"
     "The agent receives the task's instruction.md wrapped by the prompt "
     "template being optimized. The agent must query the FHIR server "

diff --git a/harbor_cookbook/gepa/utils.py b/harbor_cookbook/gepa/utils.py
@@ -26,7 +26,7 @@
 DEFAULT_MODEL = "openai/gpt-5-nano"
 DEFAULT_ENVIRONMENT = EnvironmentType.DOCKER
 
-# Single event loop shared across GEPA worker threads (required by Daytona's async singleton).
+# Single event loop shared across GEPA worker threads.
 _loop = asyncio.new_event_loop()
 threading.Thread(target=_loop.run_forever, daemon=True).start()
 

diff --git a/harbor_cookbook/recipes/computer-use-ubuntu/README.md b/harbor_cookbook/recipes/computer-use-ubuntu/README.md
@@ -1,10 +1,10 @@
 # computer-use-ubuntu
 
-Computer-use task where an agent interacts with an Ubuntu (XFCE4) virtual desktop to solve a multi-step GUI challenge. The tool set mirrors the [Anthropic computer-use demo](https://github.com/anthropics/claude-quickstarts/tree/main/computer-use-demo). Desktop setup based on the [Harbor OSWorld adapter](https://github.com/Mascobot/harbor/tree/main/adapters/osworld).
+Computer-use task where an agent interacts with an Ubuntu (XFCE4) virtual desktop to solve a multi-step GUI challenge. The tool set mirrors the [Anthropic computer-use demo](https://github.com/anthropics/claude-quickstarts/tree/main/computer-use-demo).
 
 ## How it works
 
-A companion container runs an Ubuntu desktop (Xvfb + XFCE4) alongside a FastMCP server that exposes computer-use tools. The desktop and MCP server share a container because the tools need direct access to the X display.
+The Ubuntu desktop (Xvfb + XFCE4) runs in a separate container alongside a FastMCP server that exposes computer-use tools. The desktop and MCP server share a container because the tools need direct access to the X display.
 
 A tkinter application (`challenge.py`) presents a multi-step challenge that requires the agent to click buttons, type text, and read the result from the screen. The task cannot be solved without genuine GUI interaction.
 
@@ -16,4 +16,4 @@ harbor run -p harbor_cookbook/recipes/computer-use-ubuntu --agent claude-code --
 
 ## Limitations
 
-Multi-container tasks require the **docker** environment provider because they rely on Docker Compose networking. They are not supported on cloud providers (Daytona, Modal, E2B, etc.).
+Multi-container tasks require the **docker** environment provider because they rely on Docker Compose networking. They are not supported on some of the cloud providers in Harbor (Modal, E2B, etc.).
diff --git a/harbor_cookbook/recipes/computer-use-windows/README.md b/harbor_cookbook/recipes/computer-use-windows/README.md
@@ -1,8 +1,8 @@
 # computer-use-windows
 
-Computer-use task on a remote Windows desktop. A companion MCP server creates a [Daytona](https://daytona.io) Windows sandbox, deploys a multi-step tkinter challenge, and proxies computer-use tools (screenshot, click, type) to the agent. Same tool set as `computer-use-ubuntu`, backed by the Daytona Computer Use API.
+Computer-use task on a remote Windows desktop. A MCP server creates a [Daytona](https://daytona.io) Windows sandbox, deploys a multi-step tkinter challenge, and exposes computer-use tools (screenshot, click, type) to the agent. The tool set mirrors the [Anthropic computer-use demo](https://github.com/anthropics/claude-quickstarts/tree/main/computer-use-demo).
 
-> **Note:** Daytona Windows Computer Use is currently in early preview. Access requires a beta account at [win.trydaytona.com](https://win.trydaytona.com/) with the `windows-base` snapshot available.
+> **Note:** Daytona Windows Computer Use is currently in early preview. Access requires a beta account at [win.trydaytona.com](https://win.trydaytona.com/) with the `windows-base` snapshot available. [More info](https://www.daytona.io/docs/en/computer-use/)
 
 ## Run
 
@@ -17,8 +17,3 @@ Your `.env` needs:
 DAYTONA_API_KEY=your_key_here
 DAYTONA_API_URL=https://win.trydaytona.com/api
 ```
-
-## Limitations
-
-- Requires internet access (MCP server connects to the Daytona API)
-- Windows sandbox takes ~30-60s to start after creation
diff --git a/harbor_cookbook/recipes/dns-blacklisting/README.md b/harbor_cookbook/recipes/dns-blacklisting/README.md
@@ -40,4 +40,4 @@ Supported pattern formats:
 
 ## Limitations
 
-This is still a teaching recipe, not a hardened network control. An agent could inspect or modify the seeded files, or bypass the recipe's routing setup entirely. The verifier therefore checks the final runtime behavior of the candidate domains rather than trusting the seed files alone.
+The blacklisting is not airtight. An agent could inspect or modify the seeded files, or bypass the task's routing setup entirely. The access denied message should therefore communicate to the agent that the domain is not available in this environment so that the agent does not try to sidestep it. Also the grader in `tests/`  can do some checks whether the agent complied.
diff --git a/harbor_cookbook/recipes/mcp-tools/README.md b/harbor_cookbook/recipes/mcp-tools/README.md
@@ -1,6 +1,6 @@
 # mcp-tools
 
-Giving agents access to custom MCP tools via a FastMCP server running in a companion container.
+Giving agents access to custom MCP tools via a FastMCP server running in a separate container.
 
 ## Structure
 
@@ -17,7 +17,7 @@ mcp-tools/
 │       └── server.py
 ├── tests/
 │   ├── test.sh                # Verifier entrypoint
-│   └── test_outputs.py        # Pytest assertions
+│   └── test_outputs.py        # Pytest tests
 └── solution/
     └── solve.sh               # Reference solution
 ```
@@ -38,4 +38,4 @@ harbor run -p harbor_cookbook/recipes/mcp-tools --agent claude-code --model anth
 
 ## Limitations
 
-Multi-container tasks require the **docker** environment provider because they rely on Docker Compose networking. They are not supported on cloud providers (Daytona, Modal, E2B, etc.).
+Multi-container tasks require the **docker** environment provider because they rely on Docker Compose networking. They are not supported on some of the cloud providers in Harbor (Modal, E2B, etc.).
diff --git a/harbor_cookbook/recipes/multi-container/README.md b/harbor_cookbook/recipes/multi-container/README.md
@@ -36,4 +36,4 @@ harbor run -p harbor_cookbook/recipes/multi-container --agent claude-code --mode
 
 ## Limitations
 
-Multi-container tasks require the **docker** environment provider because they rely on Docker Compose networking. They are not supported on cloud providers (Daytona, Modal, E2B, etc.).
+Multi-container tasks require the **docker** environment provider because they rely on Docker Compose networking. They are not supported on some of the cloud providers in Harbor (Modal, E2B, etc.).
diff --git a/harbor_cookbook/recipes/simulated-user/README.md b/harbor_cookbook/recipes/simulated-user/README.md
@@ -1,6 +1,6 @@
 # simulated-user
 
-Example implementation of a task with a simulated user that the agent can ask questions.
+Example implementation of a task with a simulated user that the agent can interact with and ask questions.
 
 ## Structure
 
@@ -46,4 +46,4 @@ harbor run -p harbor_cookbook/recipes/simulated-user --agent claude-code --model
 
 ## Limitations
 
-Multi-container tasks require the **docker** environment provider because they rely on Docker Compose networking. They are not supported on cloud providers (Daytona, Modal, E2B, etc.).
+Multi-container tasks require the **docker** environment provider because they rely on Docker Compose networking. They are not supported on some of the cloud providers in Harbor (Modal, E2B, etc.).
Original file line number	Diff line number	Diff line change
Expand Up		@@ -40,4 +40,4 @@ Supported pattern formats:

		## Limitations

		This is still a teaching recipe, not a hardened network control. An agent could inspect or modify the seeded files, or bypass the recipe's routing setup entirely. The verifier therefore checks the final runtime behavior of the candidate domains rather than trusting the seed files alone.
		The blacklisting is not airtight. An agent could inspect or modify the seeded files, or bypass the task's routing setup entirely. The access denied message should therefore communicate to the agent that the domain is not available in this environment so that the agent does not try to sidestep it. Also the grader in `tests/` can do some checks whether the agent complied.
Original file line number	Diff line number	Diff line change
Expand Up		@@ -36,4 +36,4 @@ harbor run -p harbor_cookbook/recipes/multi-container --agent claude-code --mode

		## Limitations

		Multi-container tasks require the docker environment provider because they rely on Docker Compose networking. They are not supported on cloud providers (Daytona, Modal, E2B, etc.).
		Multi-container tasks require the docker environment provider because they rely on Docker Compose networking. They are not supported on some of the cloud providers in Harbor (Modal, E2B, etc.).