Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
203 changes: 199 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@ We also generate multimodal skin based on the rules of the same maze. Generated
- [ ] Add Multimodal Environment Generation Pipelines.
- [ ] Add Three-stage Verification Pipeline for both text and multimodal environments.
- [ ] Add Learning Experiments Scripts.
- [ ] Add Coding Agents Option: Codex, Gemini Cli.
- [x] Add Coding Agents Option: Codex, Claude Code SDK.
- [ ] Add 3D Environment Generation Pipelines.

## Repository Layout
Expand Down Expand Up @@ -129,6 +129,200 @@ python run_environment_skin_generation.py

Cost summaries are automatically saved to `workspace/costs/`.

## Coding Agents

AutoEnv supports multiple coding agent backends for environment code generation and fixing. The code agent is used in the pipeline's CodeFixNode, LevelGenNode, and MaxRewardNode stages.

### Backend Options

| Backend | Description | Best For |
| --------- | ---------------------------------------------------------------------------- | ----------------------------------- |
| `miniswe` | Default [mini-swe-agent](https://github.com/SWE-agent/mini-swe-agent) | General use, works with any LLM |
| `codex` | OpenAI [Codex CLI](https://github.com/openai/codex) | OpenAI API users, fast execution |
| `claude` | Anthropic [Claude Agent SDK](https://github.com/anthropics/claude-agent-sdk) | Anthropic API users, custom proxies |

### Configuration

Set the `code_agent_backend` field in `config/env_gen.yaml`:

```yaml
# Code agent backend: "miniswe" (default), "codex", "claude"
code_agent_backend: "codex" # or "claude" or "miniswe"
```

### MiniSWE Agent (Default)

The default agent using mini-swe-agent. Works with any LLM configured in `config/model_config.yaml`.

```yaml
# config/env_gen.yaml
code_agent_backend: "miniswe"
model: "gpt-4o" # or any configured LLM
```

No additional setup required beyond the standard model configuration.

### Codex Agent

Uses OpenAI's official Codex CLI for code generation.

#### Prerequisites

```bash
# Install Codex CLI
npm install -g @openai/codex
# or
brew install --cask codex

# Authenticate (recommended)
codex login

# Verify installation
codex whoami
```

#### Authentication Options

**Option 1: CLI Login (Recommended)**

```bash
codex login # Opens browser for OAuth
```

**Option 2: Environment Variable**

```bash
export OPENAI_API_KEY=your-api-key
```

**Option 3: Custom Base URL** (for proxies)

```bash
export OPENAI_API_KEY=your-api-key
export OPENAI_BASE_URL=https://your-proxy.example.com/v1
```

#### Configuration

```yaml
# config/env_gen.yaml
code_agent_backend: "codex"
```

### Claude Agent

Uses Anthropic's Claude Agent SDK for code generation.

#### Prerequisites

```bash
# Install Claude Agent SDK
pip install claude-agent-sdk
```

The Python SDK authenticates via environment variables (see Authentication Options below).

#### Authentication Options

**Option 1: Environment Variables (Recommended)**

```bash
export ANTHROPIC_API_KEY=your-api-key
```

**Option 2: Custom Base URL** (for proxies)

```bash
export ANTHROPIC_API_KEY=your-api-key
export ANTHROPIC_BASE_URL=https://your-proxy.example.com/api
```

#### Configuration

```yaml
# config/env_gen.yaml
code_agent_backend: "claude"
```

### Generate Environment with Code Agent

#### Quick Start

```bash
# 1. Copy example config
cp config/env_gen_example.yaml config/env_gen.yaml

# 2. Edit config (set code_agent_backend, theme, etc.)
vim config/env_gen.yaml

# 3. Run environment generation
python run_environment_generation.py
```

#### Example Configuration

```yaml
# config/env_gen.yaml
mode: "textual"
model: "gpt-4o"
concurrency: 1
theme: "A strategic puzzle game with resource management"
envs_root_path: "workspace/envs"
code_agent_backend: "codex" # Use Codex for code generation
```

#### Background Execution (Recommended for Long Tasks)

Code agents can take 10-30 minutes for complex environments. Run in background:

```bash
# Run in background with logging
nohup python run_environment_generation.py > /tmp/autoenv_gen.log 2>&1 &

# Monitor progress
tail -f /tmp/autoenv_gen.log

# Check if complete
ls workspace/envs/*/done.txt
```

### Troubleshooting

#### Codex CLI Issues

```bash
# Check if Codex is installed
codex --version

# Re-authenticate
codex logout && codex login

# Check current user
codex whoami
```

#### Claude Agent Issues

```bash
# Check if Python SDK is installed
pip show claude-agent-sdk

# Verify API key
echo $ANTHROPIC_API_KEY

# Test import
python -c "from claude_agent_sdk import query; print('SDK available')"
```

#### Timeout Issues

For complex environments, increase timeout in `autoenv/coder.py`:

```python
# Current default: 900 seconds (15 minutes)
agent = CodexAgent(timeout=1200) # 20 minutes
```

## Benchmarking AutoEnv-36

Evaluate agents on the 36 benchmark environments (scores for all; cost only for LLM branch). See `benchmarks/README.md` for details.
Expand Down Expand Up @@ -160,9 +354,10 @@ Programmatic APIs are available in `benchmarks/api.py` (`benchmark_llms`, `bench

## Acknowledgements

Thanks to
[mini-swe-agent](https://github.com/SWE-agent/mini-swe-agent),
[codex](https://github.com/openai/codex),
Thanks to
[mini-swe-agent](https://github.com/SWE-agent/mini-swe-agent),
[codex](https://github.com/openai/codex),
[claude-agent-sdk](https://github.com/anthropics/claude-agent-sdk-python),
[rembg](https://github.com/danielgatis/rembg),
for providing basic support for this project!

Expand Down
Loading