CCA-SWE-Bench: Agent Harness for SWE-bench

This repository contains a minimal but production-usable harness around Confucius Code Agent (CCA) for SWE-bench style coding tasks.

Why this repo exists

The core design keeps the agent inside the target runtime instead of building a heavy host-to-container abstraction layer.

Benefits:

simple execution path
low latency tool calls
fewer moving pieces when debugging
easier extension of agent capabilities

Tradeoff:

less platform abstraction compared with runtime-proxy frameworks

What is in this repository

confucius/: CCA framework core, orchestrators, extensions, IO, model managers
scripts/run_swebench.py: reference one-shot runner for SWE-bench style tasks
scripts/cca.py: unified runner (TUI + one-shot + provider runtime)
scripts/provider_runtime.py: provider auth/model discovery/validation runtime
tests/: unit and e2e tests for launcher/runtime behavior

Execution Modes

Reference CCA workflow

Entry points:
- python -m scripts.run_swebench --prompt <file>
- confucius code (interactive REPL)
Focus:
- SWE-bench style execution with CCA harness extensions

Unified `cca` workflow (recommended)

Entry point:
- cca
Features:
- one command for interactive TUI and one-shot execution
- dynamic model discovery and alias resolution
- live model ID validation
- provider runtime setup from existing local OAuth sessions

For a full architecture and workflow reference, see:

docs/cca-harness-workflow.md

Quick Start (Recommended)

1) Install dependencies

python -m venv .venv
.venv/bin/pip install -r requirements.txt

2) Install global `cca` command

python3 -m pip install --user -r requirements.txt
python3 -m pip install --user -e .

3) Run

cca

By default, cca enters interactive TUI mode when stdin is a TTY and no prompt is provided.

Provider Authentication

Codex provider

Use your existing Codex CLI login state:

codex login

cca reads:

~/.codex/auth.json
~/.codex/models_cache.json

It uses those credentials to call the Codex backend API via OpenAI SDK runtime settings.

OpenAI provider

export OPENAI_API_KEY=...

Gemini provider

Either:

API key (GOOGLE_API_KEY or GEMINI_API_KEY)

Or OAuth flow with local creds:

~/.gemini/oauth_creds.json

Bedrock provider

export AWS_REGION=us-east-1
# plus normal AWS credentials/profile

Common `cca` Usage

Interactive TUI

cca

Or force TUI mode:

cca --tui

One-shot with file prompt

cca --provider codex --prompt /path/to/problem.txt

One-shot with inline text

cca --provider codex --prompt "Fix failing tests" --raw-prompt

One-shot with stdin

echo "Fix failing tests" | cca --provider codex --raw-prompt

Show resolved runtime only

cca --tui --dry-run

Select provider/model explicitly

cca --provider codex --model gpt-5.3-codex-spark --tui
cca --provider openai --model gpt-5.2 --prompt "Update docs" --raw-prompt
cca --provider bedrock --model claude-sonnet-4-5 --prompt /path/to/task.txt

Model discovery and validation

List discovered models and aliases

cca --provider codex --list-models
cca --provider gemini --list-models

Validate discovered model IDs with live API calls

cca --provider codex --validate-models-live
cca --provider gemini --validate-models-live

`cca` mode behavior

Mode selection is automatic:

TTY + no prompt -> TUI
prompt provided -> one-shot
piped stdin + no prompt -> one-shot

Solo mode defaults:

one-shot: on
TUI: off

Override:

cca --solo-mode on
cca --solo-mode off

TUI commands

/help
/model
/exit
/quit

Reference run paths

One-shot SWE-bench style

.venv/bin/python -m scripts.run_swebench --prompt /path/to/problem.txt

REPL

.venv/bin/python -m confucius.cli.main code

Testing

Unit and behavior tests

.venv/bin/pytest tests/test_cca.py tests/test_tasks.py tests/test_provider_runtime.py tests/test_intent_guard.py -q

Real interactive TUI e2e

Requires valid Codex auth state.

.venv/bin/pytest tests/test_cca_tui_e2e.py -m e2e -q

SWE-Bench Pro (Reproducible)

Use the reproducible runner:

scripts/run_swebench_pro_codex.py

Full step-by-step guide:

docs/swebench-pro-repro.md

Docker workflow

This repo supports the packaged SWE-bench container workflow.

Build artifacts

conda activate confucius
conda install -c conda-forge conda-pack
conda-pack -n confucius -o cf_env.tar.gz

pex . \
  -r requirements.txt \
  -m scripts.run_swebench \
  -o app.pex \
  --python-shebang="/usr/bin/env python3"

Expected workspace layout

workspace/
|- app.pex
|- cf_env.tar.gz
|- solutions/
|- logs/
`- problem_statements/
   `- <task_id>.txt

Container run

docker run --rm \
  -e TASK_ID=<task_id> \
  -e AWS_BEARER_TOKEN_BEDROCK=<token> \
  -v <workspace>:/data \
  --network host \
  --userns=host \
  --entrypoint /data/run_sbp.sh

Troubleshooting

`Failed to prepare runtime: stdin prompt is empty`

You ran one-shot mode without prompt input. Use one of:

--prompt <file-or-text>
piped stdin
--tui

Codex auth not found

Run:

codex login

Then verify files exist:

~/.codex/auth.json
~/.codex/models_cache.json

Gemini OAuth project errors

Set one explicitly:

export GEMINI_OAUTH_PROJECT=<project-id>
export GEMINI_OAUTH_LOCATION=us-central1

Model not discovered

Use explicit model ID:

cca --provider codex --model <model-id> ...

License

MIT. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
confucius		confucius
docs		docs
scripts		scripts
tests		tests
.gitignore		.gitignore
CODE_OF_CONDUCT		CODE_OF_CONDUCT
CONTRIBUTING		CONTRIBUTING
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

License

GizAI/cca-swebench

Folders and files

Latest commit

History

Repository files navigation

CCA-SWE-Bench: Agent Harness for SWE-bench

Why this repo exists

What is in this repository

Execution Modes

Reference CCA workflow

Unified cca workflow (recommended)

Quick Start (Recommended)

1) Install dependencies

2) Install global cca command

3) Run

Provider Authentication

Codex provider

OpenAI provider

Gemini provider

Bedrock provider

Common cca Usage

Interactive TUI

One-shot with file prompt

One-shot with inline text

One-shot with stdin

Show resolved runtime only

Select provider/model explicitly

Model discovery and validation

List discovered models and aliases

Validate discovered model IDs with live API calls

cca mode behavior

TUI commands

Reference run paths

One-shot SWE-bench style

REPL

Testing

Unit and behavior tests

Real interactive TUI e2e

SWE-Bench Pro (Reproducible)

Docker workflow

Build artifacts

Expected workspace layout

Container run

Troubleshooting

Failed to prepare runtime: stdin prompt is empty

Codex auth not found

Gemini OAuth project errors

Model not discovered

License

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Unified `cca` workflow (recommended)

2) Install global `cca` command

Common `cca` Usage

`cca` mode behavior

`Failed to prepare runtime: stdin prompt is empty`

Packages