Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 10 additions & 28 deletions harbor_cookbook/recipes/computer-use-windows/README.md
Original file line number Diff line number Diff line change
@@ -1,42 +1,24 @@
# computer-use-windows

Computer-use task where an agent interacts with a Windows virtual desktop to solve a multi-step GUI challenge. Requires the [Mascobot/harbor fork](https://github.com/Mascobot/harbor) which adds Windows desktop support via Daytona.
Computer-use task on a remote Windows desktop. A companion MCP server creates a [Daytona](https://daytona.io) Windows sandbox, deploys a multi-step tkinter challenge, and proxies computer-use tools (screenshot, click, type) to the agent. Same tool set as `computer-use-ubuntu`, backed by the Daytona Computer Use API.

## How it works

Harbor creates a Daytona sandbox from a pre-built Windows snapshot (`windows-base`). A setup script deploys a tkinter challenge application and launches it on the desktop. CUA agents (anthropic-cua, openai-cua) interact with the Windows desktop through Harbor's `DaytonaWindowsDesktopInterface`, which executes pyautogui commands on the sandbox via the Daytona SDK.

The challenge presents a multi-step GUI task requiring the agent to click buttons, type text, and read the result from the screen. The task cannot be solved without genuine GUI interaction.

## Prerequisites

- [Mascobot/harbor fork](https://github.com/Mascobot/harbor) (not upstream harbor-framework/harbor)
- Daytona API key (`DAYTONA_API_KEY`)
- `windows-base` snapshot in your Daytona account (Windows Computer Use private alpha)
- Anthropic or OpenAI API key for the CUA agent
> **Note:** Daytona Windows Computer Use is currently in early preview. Access requires a beta account at [win.trydaytona.com](https://win.trydaytona.com/) with the `windows-base` snapshot available.

## Run

```bash
harbor run -p harbor_cookbook/recipes/computer-use-windows \
--agent anthropic-cua --model anthropic/claude-sonnet-4-6 \
--environment-type daytona \
--environment-kwarg windows_snapshot=windows-base \
--environment-kwarg windows_setup_script=harbor_cookbook/recipes/computer-use-windows/scripts/setup.py \
--environment-kwarg skip_osworld_setup=true
--agent claude-code --model anthropic/claude-sonnet-4-6 \
--env-file .env -y
```

## Docker oracle test

The oracle test validates the test/solution pipeline on Docker (no Windows desktop needed):

```bash
harbor trials start -p harbor_cookbook/recipes/computer-use-windows --agent oracle
Your `.env` needs:
```
DAYTONA_API_KEY=your_key_here
DAYTONA_API_URL=https://win.trydaytona.com/api
```

## Limitations

- Requires the Mascobot/harbor fork with CUA agent and Windows desktop support
- Requires the `daytona` environment provider with a Windows snapshot
- Windows Computer Use is currently in Daytona private alpha
- Standard agents (claude-code, codex) cannot interact with the desktop — use CUA agents
- Requires internet access (MCP server connects to the Daytona API)
- Windows sandbox takes ~30-60s to start after creation
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
FROM python:3.12-slim

COPY --from=ghcr.io/astral-sh/uv:latest /uv /uvx /bin/

ENV UV_PRERELEASE=allow

WORKDIR /app

COPY server.py .
COPY challenge.py .

EXPOSE 8000

CMD ["uv", "run", "server.py"]
Original file line number Diff line number Diff line change
@@ -0,0 +1,252 @@
# /// script
# requires-python = ">=3.12"
# dependencies = ["fastmcp", "daytona==0.131.0a1"]
# ///
"""MCP server exposing computer-use tools backed by a Daytona Windows sandbox.

On startup the server creates a Windows sandbox from the ``windows-base``
snapshot, deploys a tkinter challenge application, and launches it on the
desktop. Every MCP tool call is proxied to the sandbox via the Daytona
Computer Use API.

If the sandbox cannot be created (e.g. missing credentials), the server
still starts so that Docker healthchecks pass and oracle tests work.
"""

import atexit
import base64
import functools
import logging
import os
import time

from fastmcp import FastMCP
from fastmcp.utilities.types import Image

logging.basicConfig(level=logging.INFO)
log = logging.getLogger(__name__)

WIN_APP_DIR = "C:/Users/Administrator/harbor"

sandbox = None


def _setup_sandbox():
"""Create the Windows sandbox and deploy the challenge app."""
global sandbox

from daytona import CreateSandboxFromSnapshotParams, Daytona, DaytonaConfig

api_url = os.environ.get("DAYTONA_API_URL", "https://win.trydaytona.com/api")
log.info("Connecting to Daytona at %s", api_url)
daytona = Daytona(
DaytonaConfig(
api_key=os.environ["DAYTONA_API_KEY"],
api_url=api_url,
)
)

log.info("Creating Windows sandbox from 'windows-base' snapshot …")
sandbox = daytona.create(
CreateSandboxFromSnapshotParams(snapshot="windows-base"),
timeout=120,
)
log.info("Sandbox created: %s (state=%s)", sandbox.id, sandbox.state)

def _cleanup():
try:
log.info("Deleting sandbox %s …", sandbox.id)
daytona.delete(sandbox)
except Exception as exc:
log.warning("Sandbox cleanup failed: %s", exc)

atexit.register(_cleanup)

try:
sandbox.computer_use.start()
log.info("computer_use.start() succeeded")
except Exception as exc:
log.info("computer_use.start() skipped: %s", exc)

sandbox.process.exec(f"mkdir {WIN_APP_DIR}", timeout=5)

log.info("Deploying challenge app …")
with open(os.path.join(os.path.dirname(__file__) or ".", "challenge.py")) as f:
challenge_source = f.read()

sandbox.fs.upload_file(challenge_source.encode(), f"{WIN_APP_DIR}/challenge.py")

# DETACHED_PROCESS flag so the GUI outlives the launcher
launcher = (
"import subprocess\n"
"subprocess.Popen(\n"
f" ['python', r'{WIN_APP_DIR}/challenge.py'],\n"
" creationflags=0x00000008,\n"
")\n"
"print('launched')\n"
)
sandbox.fs.upload_file(launcher.encode(), f"{WIN_APP_DIR}/launch.py")

log.info("Launching challenge app on desktop …")
r = sandbox.process.exec(f"python {WIN_APP_DIR}/launch.py", timeout=15)
log.info("Launch result: %s", r.result)
time.sleep(5)

try:
windows = sandbox.computer_use.display.get_windows()
titles = [w.title for w in windows.windows] if windows.windows else []
log.info("Open windows: %s", titles)
if not any("Harbor" in t for t in titles):
log.warning("Challenge window not found, retrying …")
sandbox.process.exec(f"python {WIN_APP_DIR}/launch.py", timeout=15)
time.sleep(5)
except Exception as exc:
log.warning("Could not verify window list: %s", exc)

log.info("Setup complete")


try:
_setup_sandbox()
except Exception as exc:
log.warning("Sandbox setup failed (tools will be unavailable): %s", exc)

log.info("Starting MCP server (sandbox=%s)", "ready" if sandbox else "unavailable")

mcp = FastMCP("computer-use")


def requires_sandbox(fn):
"""Decorator that raises if the Windows sandbox is unavailable."""

@functools.wraps(fn)
def wrapper(*args, **kwargs):
if sandbox is None:
raise RuntimeError(
"Windows sandbox is not available. "
"Check DAYTONA_API_KEY and DAYTONA_API_URL."
)
return fn(*args, **kwargs)

return wrapper


@mcp.tool()
@requires_sandbox
def screenshot() -> Image:
"""Take a screenshot of the Windows desktop and return it as an image."""
resp = sandbox.computer_use.screenshot.take_full_screen()
return Image(data=base64.b64decode(resp.screenshot), format="png")


@mcp.tool()
@requires_sandbox
def mouse_move(x: int, y: int) -> str:
"""Move the mouse cursor to the given (x, y) pixel coordinate."""
result = sandbox.computer_use.mouse.move(x, y)
return f"Moved mouse to ({result.x}, {result.y})"


@mcp.tool()
@requires_sandbox
def left_click(x: int, y: int) -> str:
"""Left-click at the given (x, y) pixel coordinate."""
sandbox.computer_use.mouse.click(x, y)
return f"Left clicked at ({x}, {y})"


@mcp.tool()
@requires_sandbox
def right_click(x: int, y: int) -> str:
"""Right-click at the given (x, y) pixel coordinate."""
sandbox.computer_use.mouse.click(x, y, button="right")
return f"Right clicked at ({x}, {y})"


@mcp.tool()
@requires_sandbox
def middle_click(x: int, y: int) -> str:
"""Middle-click at the given (x, y) pixel coordinate."""
sandbox.computer_use.mouse.click(x, y, button="middle")
return f"Middle clicked at ({x}, {y})"


@mcp.tool()
@requires_sandbox
def double_click(x: int, y: int) -> str:
"""Double-click at the given (x, y) pixel coordinate."""
sandbox.computer_use.mouse.click(x, y, double=True)
return f"Double clicked at ({x}, {y})"


@mcp.tool()
@requires_sandbox
def triple_click(x: int, y: int) -> str:
"""Triple-click at the given (x, y) coordinate (e.g. to select a line)."""
sandbox.computer_use.mouse.click(x, y)
sandbox.computer_use.mouse.click(x, y)
sandbox.computer_use.mouse.click(x, y)
return f"Triple clicked at ({x}, {y})"


@mcp.tool()
@requires_sandbox
def left_click_drag(start_x: int, start_y: int, end_x: int, end_y: int) -> str:
"""Click and drag from (start_x, start_y) to (end_x, end_y)."""
sandbox.computer_use.mouse.drag(start_x, start_y, end_x, end_y)
return f"Dragged from ({start_x}, {start_y}) to ({end_x}, {end_y})"


@mcp.tool()
@requires_sandbox
def scroll(x: int, y: int, direction: str, clicks: int = 3) -> str:
"""Scroll at (x, y). direction is 'up' or 'down'. clicks controls amount."""
sandbox.computer_use.mouse.scroll(x, y, direction, clicks)
return f"Scrolled {direction} {clicks} clicks at ({x}, {y})"


@mcp.tool()
@requires_sandbox
def cursor_position() -> str:
"""Return the current (x, y) position of the mouse cursor."""
pos = sandbox.computer_use.mouse.get_position()
return f"Cursor position: ({pos.x}, {pos.y})"


@mcp.tool()
@requires_sandbox
def type_text(text: str) -> str:
"""Type the given text on the Windows desktop."""
sandbox.computer_use.keyboard.type(text)
return f"Typed: {text}"


@mcp.tool()
@requires_sandbox
def press_key(key: str) -> str:
"""Press a key or key combination (e.g. 'Return', 'ctrl+c', 'alt+F4')."""
if "+" in key:
sandbox.computer_use.keyboard.hotkey(key)
else:
sandbox.computer_use.keyboard.press(key)
return f"Pressed: {key}"


@mcp.tool()
@requires_sandbox
def hold_key(key: str, duration: float = 0.5) -> str:
"""Hold a key down for the given duration in seconds."""
sandbox.computer_use.keyboard.press(key)
time.sleep(duration)
return f"Held {key} for {duration}s"


@mcp.tool()
def wait(seconds: float = 1.0) -> str:
"""Wait for the specified number of seconds."""
time.sleep(seconds)
return f"Waited {seconds} seconds"


if __name__ == "__main__":
mcp.run(transport="streamable-http", host="0.0.0.0", port=8000)
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# This file is merged on top of Harbor's base docker-compose config.
# The `main` service is automatically configured by Harbor with the build
# context, image, command, volumes, and resource limits.
# You only need to specify overrides for `main` and define additional services.
services:
main:
depends_on:
desktop:
condition: service_healthy

desktop:
build:
context: ./desktop
expose:
- "8000"
environment:
- DAYTONA_API_KEY
- DAYTONA_API_URL
healthcheck:
test: ["CMD", "python", "-c", "import socket; s=socket.create_connection(('localhost',8000),timeout=2); s.close()"]
interval: 2s
timeout: 5s
retries: 15
start_period: 60s
7 changes: 1 addition & 6 deletions harbor_cookbook/recipes/computer-use-windows/instruction.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,6 @@ Your task:
1. Take screenshots and interact with the application on the desktop
2. The application has a multi-step challenge — you will need to navigate through it
3. Find the secret code revealed at the end of the challenge
4. Write **only the secret code value** to `C:\app\secret.txt`

You can open a command prompt or PowerShell and run:
```
echo SECRET_CODE_HERE > C:\app\secret.txt
```
4. Write **only the secret code value** to `/app/secret.txt`

Write the value exactly as displayed, with no additional formatting or whitespace.
Loading
Loading