This repository contains a minimal but production-usable harness around Confucius Code Agent (CCA) for SWE-bench style coding tasks.
The core design keeps the agent inside the target runtime instead of building a heavy host-to-container abstraction layer.
Benefits:
- simple execution path
- low latency tool calls
- fewer moving pieces when debugging
- easier extension of agent capabilities
Tradeoff:
- less platform abstraction compared with runtime-proxy frameworks
confucius/: CCA framework core, orchestrators, extensions, IO, model managersscripts/run_swebench.py: reference one-shot runner for SWE-bench style tasksscripts/cca.py: unified runner (TUI + one-shot + provider runtime)scripts/provider_runtime.py: provider auth/model discovery/validation runtimetests/: unit and e2e tests for launcher/runtime behavior
- Entry points:
python -m scripts.run_swebench --prompt <file>confucius code(interactive REPL)
- Focus:
- SWE-bench style execution with CCA harness extensions
- Entry point:
cca
- Features:
- one command for interactive TUI and one-shot execution
- dynamic model discovery and alias resolution
- live model ID validation
- provider runtime setup from existing local OAuth sessions
For a full architecture and workflow reference, see:
docs/cca-harness-workflow.md
python -m venv .venv
.venv/bin/pip install -r requirements.txtpython3 -m pip install --user -r requirements.txt
python3 -m pip install --user -e .ccaBy default, cca enters interactive TUI mode when stdin is a TTY and no prompt is provided.
Use your existing Codex CLI login state:
codex logincca reads:
~/.codex/auth.json~/.codex/models_cache.json
It uses those credentials to call the Codex backend API via OpenAI SDK runtime settings.
export OPENAI_API_KEY=...Either:
- API key (
GOOGLE_API_KEYorGEMINI_API_KEY)
Or OAuth flow with local creds:
~/.gemini/oauth_creds.json
export AWS_REGION=us-east-1
# plus normal AWS credentials/profileccaOr force TUI mode:
cca --tuicca --provider codex --prompt /path/to/problem.txtcca --provider codex --prompt "Fix failing tests" --raw-promptecho "Fix failing tests" | cca --provider codex --raw-promptcca --tui --dry-runcca --provider codex --model gpt-5.3-codex-spark --tui
cca --provider openai --model gpt-5.2 --prompt "Update docs" --raw-prompt
cca --provider bedrock --model claude-sonnet-4-5 --prompt /path/to/task.txtcca --provider codex --list-models
cca --provider gemini --list-modelscca --provider codex --validate-models-live
cca --provider gemini --validate-models-liveMode selection is automatic:
- TTY + no prompt -> TUI
- prompt provided -> one-shot
- piped stdin + no prompt -> one-shot
Solo mode defaults:
- one-shot:
on - TUI:
off
Override:
cca --solo-mode on
cca --solo-mode off/help/model/exit/quit
.venv/bin/python -m scripts.run_swebench --prompt /path/to/problem.txt.venv/bin/python -m confucius.cli.main code.venv/bin/pytest tests/test_cca.py tests/test_tasks.py tests/test_provider_runtime.py tests/test_intent_guard.py -qRequires valid Codex auth state.
.venv/bin/pytest tests/test_cca_tui_e2e.py -m e2e -qUse the reproducible runner:
scripts/run_swebench_pro_codex.py
Full step-by-step guide:
docs/swebench-pro-repro.md
This repo supports the packaged SWE-bench container workflow.
conda activate confucius
conda install -c conda-forge conda-pack
conda-pack -n confucius -o cf_env.tar.gz
pex . \
-r requirements.txt \
-m scripts.run_swebench \
-o app.pex \
--python-shebang="/usr/bin/env python3"workspace/
|- app.pex
|- cf_env.tar.gz
|- solutions/
|- logs/
`- problem_statements/
`- <task_id>.txt
docker run --rm \
-e TASK_ID=<task_id> \
-e AWS_BEARER_TOKEN_BEDROCK=<token> \
-v <workspace>:/data \
--network host \
--userns=host \
--entrypoint /data/run_sbp.shYou ran one-shot mode without prompt input. Use one of:
--prompt <file-or-text>- piped stdin
--tui
Run:
codex loginThen verify files exist:
~/.codex/auth.json~/.codex/models_cache.json
Set one explicitly:
export GEMINI_OAUTH_PROJECT=<project-id>
export GEMINI_OAUTH_LOCATION=us-central1Use explicit model ID:
cca --provider codex --model <model-id> ...MIT. See LICENSE.