feat(swe): fix harness for Docker agent execution and add test results by alpha1122x · Pull Request #12 · CortexLM/swe-forge

alpha1122x · 2026-02-17T15:33:03Z

Summary

Fix SWE harness Docker execution issues and add baseagent-echo test results across multiple difficulty levels.

Changes

Harness fixes (`src/swe/harness.rs`)

Add DOCKER_AGENT_DIR env var support for overriding agent directory path in containerized environments
Pass OPENROUTER_API_KEY environment variable into Docker containers for LLM-based agents
Install python3, python3-pip, and python3-venv alongside system tools in container setup, with a fallback symlink for python
Increase system tools install timeout from 120s to 180s
Add --break-system-packages flag to pip install for compatibility with externally-managed Python environments

Agent test results (`agent-tests/`)

Add test results for 9 tasks across easy, medium, and hard difficulties
Include per-task JSON results and execution logs
Add summary.json with aggregated test outcomes
Add README.md documenting test methodology and results

Other

Add baseagent-echo submodule reference
Minor .cargo/config.toml update

Test the baseagent-echo agent (echobt/baseagent-echo) against 9 SWE-bench tasks from the validated-dataset (3 easy, 3 medium, 3 hard) using the swe-forge Docker harness. Results: 2/2 resolved on tasks with valid sanity checks (100% effective resolution rate). Harness fixes in src/swe/harness.rs: - Pass OPENROUTER_API_KEY env var into Docker containers so the agent can authenticate with OpenRouter for LLM calls - Install python3/pip/venv in all containers (not just Python-based ones) since the agent itself requires Python regardless of task language - Add --break-system-packages flag to pip install for requirements.txt to work on Debian-based images without virtualenv conflicts - Support DOCKER_AGENT_DIR env var override for agent directory path resolution in nested Docker environments - Increase system tools install timeout from 120s to 180s Build config change in .cargo/config.toml: - Switch linker from clang+mold to cc for broader build compatibility Test results in agent-tests/: - Per-task JSON results organized by difficulty (easy/medium/hard) - Execution logs for resolved tasks (batocera.linux-15418, happier-35) - summary.json with aggregate metrics - README.md documenting methodology and findings - 5 tasks failed sanity checks (dataset issues), 2 had setup errors Added baseagent-echo as embedded git repo for local testing.

…s-agent-tests

Test the baseagent-echo agent (echobt/baseagent-echo) against 9 SWE-bench tasks from the validated-dataset (3 easy, 3 medium, 3 hard) using the swe-forge Docker harness. Results: 2/2 resolved on tasks with valid sanity checks (100% effective resolution rate). Harness fixes in src/swe/harness.rs: - Pass OPENROUTER_API_KEY env var into Docker containers so the agent can authenticate with OpenRouter for LLM calls - Install python3/pip/venv in all containers (not just Python-based ones) since the agent itself requires Python regardless of task language - Add --break-system-packages flag to pip install for requirements.txt to work on Debian-based images without virtualenv conflicts - Support DOCKER_AGENT_DIR env var override for agent directory path resolution in nested Docker environments - Increase system tools install timeout from 120s to 180s Build config change in .cargo/config.toml: - Switch linker from clang+mold to cc for broader build compatibility Test results in agent-tests/: - Per-task JSON results organized by difficulty (easy/medium/hard) - Execution logs for resolved tasks (batocera.linux-15418, happier-35) - summary.json with aggregate metrics - README.md documenting methodology and findings - 5 tasks failed sanity checks (dataset issues), 2 had setup errors Added baseagent-echo as embedded git repo for local testing. Co-authored-by: echobt <mathis.massimino+echo@cortex.foundation>

echobt added 2 commits February 17, 2026 15:30

Merge remote-tracking branch 'origin/main' into feat/swe-harness-fixe…

a7e90e8

…s-agent-tests

alpha1122x merged commit ef19b21 into main Feb 17, 2026
9 checks passed

alpha1122x deleted the feat/swe-harness-fixes-agent-tests branch February 17, 2026 15:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(swe): fix harness for Docker agent execution and add test results#12

feat(swe): fix harness for Docker agent execution and add test results#12
alpha1122x merged 2 commits intomainfrom
feat/swe-harness-fixes-agent-tests

alpha1122x commented Feb 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

alpha1122x commented Feb 17, 2026

Summary

Changes

Harness fixes (src/swe/harness.rs)

Agent test results (agent-tests/)

Other

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Harness fixes (`src/swe/harness.rs`)

Agent test results (`agent-tests/`)