Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
# Harbor Cookbook

[![](https://dcbadge.limes.pink/api/server/https://discord.gg/6xWPKhGDbA)](https://discord.gg/6xWPKhGDbA)
[![Docs](https://img.shields.io/badge/Docs-000000?style=for-the-badge&logo=mdbook&color=105864)](https://harborframework.com/docs)

Realistic examples of building evals and optimizing agents using [Harbor](https://github.com/harbor-framework/harbor).
Expand Down
9 changes: 6 additions & 3 deletions harbor_cookbook/recipes/dns-blacklisting/tests/test.sh
Original file line number Diff line number Diff line change
@@ -1,15 +1,18 @@
#!/bin/bash
set -uo pipefail

apt-get update
apt-get install -y curl

curl -LsSf https://astral.sh/uv/0.9.7/install.sh | sh

source $HOME/.local/bin/env

uvx \
--with pytest==8.4.1 \
--with pytest-json-ctrf==0.3.5 \
pytest --ctrf /logs/verifier/ctrf.json /tests/test_dns.py -rA || true
pytest --ctrf /logs/verifier/ctrf.json /tests/test_dns.py -rA

if [ "${PIPESTATUS[0]}" -eq 0 ]; then
if [ $? -eq 0 ]; then
echo 1 > /logs/verifier/reward.txt
else
echo 0 > /logs/verifier/reward.txt
Expand Down
12 changes: 5 additions & 7 deletions harbor_cookbook/recipes/multi-reward/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,16 +24,14 @@ multi-reward/

## Run

This recipe writes two reward dimensions (`correctness`, `performance`) to `reward.json`. Harbor's default `mean` metric only supports single-key rewards, so you must pass the included `config.yaml` which uses a custom per-dimension metric:

```bash
harbor trials start -p harbor_cookbook/recipes/multi-reward
harbor run -p harbor_cookbook/recipes/multi-reward -c harbor_cookbook/recipes/multi-reward/config.yaml
```

## Metrics note

Harbor's default `mean` metric only supports single-key `reward.json`. Since this recipe writes two keys (`correctness`, `performance`), running `harbor run` requires a custom metric config:
To run a single trial without metrics (useful for quick iteration):

```bash
harbor run -p harbor_cookbook/recipes/multi-reward -c harbor_cookbook/recipes/multi-reward/config.yaml
harbor trials start -p harbor_cookbook/recipes/multi-reward
```

The included `config.yaml` uses a `uv-script` metric (`metrics/per_dimension.py`) that computes mean reward per dimension.
4 changes: 2 additions & 2 deletions harbor_cookbook/recipes/multi-reward/task.toml
Original file line number Diff line number Diff line change
Expand Up @@ -9,5 +9,5 @@ timeout_sec = 120.0
[environment]
build_timeout_sec = 600.0
cpus = 1
memory = "2G"
storage = "10G"
memory_mb = 2048
storage_mb = 10240
4 changes: 2 additions & 2 deletions harbor_cookbook/recipes/simple-task/task.toml
Original file line number Diff line number Diff line change
Expand Up @@ -9,5 +9,5 @@ timeout_sec = 120.0
[environment]
build_timeout_sec = 600.0
cpus = 1
memory = "2G"
storage = "10G"
memory_mb = 2048
storage_mb = 10240
Loading