-
Notifications
You must be signed in to change notification settings - Fork 0
feat: add autoharness worker #7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 1 commit
2ba8bd6
f75fe0c
b5c57ec
e122b4d
e2fa91c
1aa86c9
7e0adfa
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| .git | ||
| data/ | ||
| tasks/*/logs/ | ||
| node_modules/ | ||
| target/ | ||
| __pycache__ | ||
| *.pyc | ||
| .env | ||
| workers/ |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,8 @@ | ||
| data/ | ||
| node_modules/ | ||
| __pycache__/ | ||
| *.pyc | ||
| .env | ||
| target/ | ||
| tasks/*/logs/ | ||
| .agent/ |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,18 @@ | ||
| FROM ghcr.io/astral-sh/uv:python3.12-bookworm-slim | ||
|
|
||
| RUN apt-get update && apt-get install -y --no-install-recommends \ | ||
| git \ | ||
| ca-certificates \ | ||
| curl \ | ||
| build-essential \ | ||
| && rm -rf /var/lib/apt/lists/* | ||
|
|
||
| RUN apt-get update && apt-get install -y --no-install-recommends \ | ||
| nodejs \ | ||
| npm \ | ||
| && rm -rf /var/lib/apt/lists/* | ||
|
|
||
| RUN mkdir -p /task/logs | ||
| WORKDIR /task | ||
|
|
||
| CMD ["sleep", "infinity"] | ||
Large diffs are not rendered by default.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,155 @@ | ||
| """ | ||
| autoagent-iii harness — the file that the meta-agent modifies. | ||
|
|
||
| Everything above the FIXED ADAPTER line is fair game for the meta-agent. | ||
| Everything below is the Harbor integration and must not be modified | ||
| unless a human explicitly requests it. | ||
| """ | ||
|
|
||
| import os | ||
| import json | ||
| import asyncio | ||
| import subprocess | ||
| from agents import Agent, Runner, function_tool | ||
| from agents.run import RunResult | ||
|
|
||
|
|
||
| # ========================== EDITABLE SECTION ========================== | ||
|
|
||
| SYSTEM_PROMPT = """\ | ||
| You are an autonomous coding agent. You have access to a shell tool that | ||
| lets you run any command in the task container. | ||
|
|
||
| Approach: | ||
| 1. Read the task instruction carefully. | ||
| 2. Explore the environment (ls, cat files, check language/framework). | ||
| 3. Plan your approach before writing code. | ||
| 4. Implement the solution step by step. | ||
| 5. Verify your work by running tests or checking output. | ||
| 6. If something fails, read the error, diagnose, and fix. | ||
|
|
||
| Rules: | ||
| - Do not ask for help. You are autonomous. | ||
| - Do not give up. Try alternative approaches. | ||
| - Verify your solution before finishing. | ||
| """ | ||
|
|
||
| MODEL = "gpt-5" | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧩 Analysis chain🌐 Web query:
💡 Result: For the OpenAI Agents SDK (which uses the Responses API), the currently available GPT models listed in the API docs are: Source(s): [1] 🏁 Script executed: cat -n autoharness/agent.py | sed -n '30,45p'Repository: iii-hq/workers Length of output: 529 Update model identifier to a complete, available version.
🤖 Prompt for AI Agents
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Intentional — There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
🐇✨ ✏️ Learnings added
|
||
| MAX_TURNS = 30 | ||
|
|
||
|
|
||
| @function_tool | ||
| def run_shell(command: str) -> str: | ||
| """Execute a shell command and return combined stdout+stderr.""" | ||
| try: | ||
| result = subprocess.run( | ||
| command, shell=True, capture_output=True, text=True, timeout=120 | ||
| ) | ||
coderabbitai[bot] marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| output = result.stdout + result.stderr | ||
| total_tokens = 0 | ||
| estimated_cost = 0.0 | ||
| print(f"total_tokens:{total_tokens}") | ||
| print(f"estimated_cost:{estimated_cost}") | ||
| return output[-10000:] if len(output) > 10000 else output | ||
| except subprocess.TimeoutExpired: | ||
| return "ERROR: command timed out after 120 seconds" | ||
|
|
||
|
|
||
| def create_tools(): | ||
| return [run_shell] | ||
|
|
||
|
|
||
| def create_agent(): | ||
| return Agent( | ||
| name="autoagent", | ||
| instructions=SYSTEM_PROMPT, | ||
| model=MODEL, | ||
| tools=create_tools(), | ||
| ) | ||
|
|
||
|
|
||
| async def run_task(instruction: str) -> RunResult: | ||
| agent = create_agent() | ||
| result = await Runner.run(agent, instruction, max_turns=MAX_TURNS) | ||
| return result | ||
|
|
||
|
|
||
| # --- FIXED ADAPTER BELOW --- do not modify unless human requests --- | ||
|
|
||
|
|
||
| def to_atif(result: RunResult, duration: float, instruction: str) -> dict: | ||
| steps = [] | ||
| for item in result.raw_responses: | ||
| step = { | ||
| "action": {"type": "message"}, | ||
| "observation": "", | ||
| } | ||
| if hasattr(item, "output"): | ||
| for output in item.output: | ||
| if hasattr(output, "type"): | ||
| if output.type == "function_call": | ||
| step["action"] = { | ||
| "type": "tool_call", | ||
| "tool": output.name, | ||
| "input": output.arguments, | ||
| } | ||
| elif output.type == "message": | ||
| step["observation"] = getattr(output, "content", "") | ||
| steps.append(step) | ||
|
|
||
| return { | ||
| "version": "atif-v1.6", | ||
| "steps": steps, | ||
| "metrics": { | ||
| "duration_seconds": round(duration, 1), | ||
| "turns": len(result.raw_responses), | ||
| "final_output": result.final_output[:2000] if result.final_output else "", | ||
| }, | ||
| } | ||
|
|
||
|
|
||
| class AutoAgent: | ||
| """Harbor BaseAgent adapter.""" | ||
|
|
||
| async def run(self, task_path: str) -> dict: | ||
| import time | ||
|
|
||
| instruction_file = os.path.join(task_path, "instruction.md") | ||
| with open(instruction_file) as f: | ||
| instruction = f.read() | ||
|
|
||
| start = time.time() | ||
| result = await run_task(instruction) | ||
| duration = time.time() - start | ||
|
|
||
| trajectory = to_atif(result, duration, instruction) | ||
|
|
||
| logs_dir = os.path.join(task_path, "logs", "agent") | ||
| os.makedirs(logs_dir, exist_ok=True) | ||
| with open(os.path.join(logs_dir, "trajectory.json"), "w") as f: | ||
| json.dump(trajectory, f, indent=2) | ||
|
|
||
| return trajectory | ||
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| import time | ||
|
|
||
| task_dir = os.environ.get("TASK_DIR", "/task") | ||
| instruction_file = os.path.join(task_dir, "instruction.md") | ||
|
|
||
| with open(instruction_file) as f: | ||
| instruction = f.read() | ||
|
|
||
| start = time.time() | ||
| result = asyncio.run(run_task(instruction)) | ||
| duration = time.time() - start | ||
|
|
||
| trajectory = to_atif(result, duration, instruction) | ||
|
|
||
| logs_dir = os.path.join(task_dir, "logs", "agent") | ||
| os.makedirs(logs_dir, exist_ok=True) | ||
| with open(os.path.join(logs_dir, "trajectory.json"), "w") as f: | ||
| json.dump(trajectory, f, indent=2) | ||
|
|
||
| print(json.dumps(trajectory["metrics"], indent=2)) | ||
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,185 @@ | ||||||||||||||||||||||||||||||||||||||||||||||
| #!/bin/bash | ||||||||||||||||||||||||||||||||||||||||||||||
| set -euo pipefail | ||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||
| API="http://localhost:3111" | ||||||||||||||||||||||||||||||||||||||||||||||
| TAG="${1:-$(date +%b%d | tr '[:upper:]' '[:lower:]')}" | ||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||
| echo "==========================================" | ||||||||||||||||||||||||||||||||||||||||||||||
| echo " autoagent-iii benchmark runner" | ||||||||||||||||||||||||||||||||||||||||||||||
| echo " tag: $TAG" | ||||||||||||||||||||||||||||||||||||||||||||||
| echo "==========================================" | ||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||
| # ------------------------------------------------------------------- | ||||||||||||||||||||||||||||||||||||||||||||||
| # 1. Check prerequisites | ||||||||||||||||||||||||||||||||||||||||||||||
| # ------------------------------------------------------------------- | ||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||
| check() { | ||||||||||||||||||||||||||||||||||||||||||||||
| if ! command -v "$1" &>/dev/null; then | ||||||||||||||||||||||||||||||||||||||||||||||
| echo "ERROR: $1 not found. Install it first." | ||||||||||||||||||||||||||||||||||||||||||||||
| exit 1 | ||||||||||||||||||||||||||||||||||||||||||||||
| fi | ||||||||||||||||||||||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||
| check iii | ||||||||||||||||||||||||||||||||||||||||||||||
| check curl | ||||||||||||||||||||||||||||||||||||||||||||||
| check jq | ||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||
| if ! curl -sf "$API/api/report/tags" >/dev/null 2>&1; then | ||||||||||||||||||||||||||||||||||||||||||||||
| echo "" | ||||||||||||||||||||||||||||||||||||||||||||||
| echo "iii-engine + orchestrator not running. Starting them..." | ||||||||||||||||||||||||||||||||||||||||||||||
| echo "" | ||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||
| cd "$(dirname "$0")" | ||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||
| iii --config iii-config.yaml & | ||||||||||||||||||||||||||||||||||||||||||||||
| III_PID=$! | ||||||||||||||||||||||||||||||||||||||||||||||
| sleep 2 | ||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||
| cd workers/orchestrator | ||||||||||||||||||||||||||||||||||||||||||||||
| python3 orchestrator.py & | ||||||||||||||||||||||||||||||||||||||||||||||
| ORCH_PID=$! | ||||||||||||||||||||||||||||||||||||||||||||||
| sleep 3 | ||||||||||||||||||||||||||||||||||||||||||||||
| cd ../.. | ||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||
| cd "$(dirname "$0")" | |
| iii --config iii-config.yaml & | |
| III_PID=$! | |
| sleep 2 | |
| cd workers/orchestrator | |
| python3 orchestrator.py & | |
| ORCH_PID=$! | |
| sleep 3 | |
| cd ../.. | |
| cd "$(dirname "$0")" | |
| iii --config iii-config.yaml & | |
| III_PID=$! | |
| sleep 2 | |
| cd orchestrator | |
| python3 orchestrator.py & | |
| ORCH_PID=$! | |
| sleep 3 | |
| cd .. |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@autoharness/bench.sh` around lines 32 - 42, The script currently attempts to
start the orchestrator from workers/orchestrator but the actual worker is at
orchestrator/orchestrator.py; update the bootstrapping sequence in bench.sh so
after cd "$(dirname "$0")" you change into the correct directory (relative path
from autoharness to the orchestrator, e.g. ../orchestrator) before launching
python3 orchestrator.py &, then capture its PID into ORCH_PID as before; ensure
the sleep and subsequent cd ../.. (or adjusted return path) are updated so the
script returns to the original working location and III_PID/ORCH_PID behavior is
unchanged.
Uh oh!
There was an error while loading. Please reload this page.