Skip to content

Commit 2010e95

Browse files
fix: add Group D (Level 1+2) to ablation results table
Group D was in the paper but missing from README. It shows Level 2 works without Level 1.5 (3.8x vs baseline), confirming Level 2 is the primary driver of improvement. Also corrected Group A/B std values to match paper. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent cde8568 commit 2010e95

File tree

1 file changed

+4
-3
lines changed

1 file changed

+4
-3
lines changed

README.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -41,13 +41,14 @@ If Level 2 discovers that a mechanism (e.g., parallel multi-agent debate, persis
4141

4242
## Key Result: Controlled Ablation Experiment
4343

44-
On Karpathy's GPT pretraining benchmark (val_bpb, 300s budget, RTX 5090), we ran a controlled ablation with **3 groups × 3 independent repeats × 30 iterations**, using the **same LLM (DeepSeek)** for all levels:
44+
On Karpathy's GPT pretraining benchmark (val_bpb, 300s budget, RTX 5090), we ran a controlled ablation with **4 groups × 3 independent repeats × 30 iterations**, using the **same LLM (DeepSeek)** for all levels:
4545

4646
| Group | What it does | Mean Δval_bpb | vs Group A |
4747
|-------|-------------|--------------|------------|
48-
| **A** — Level 1 | Standard autoresearch (propose → train → keep/discard) | -0.009 ± 0.001 ||
49-
| **B** — Level 1 + 1.5 | + Outer loop adjusts search config | -0.007 ± 0.006 | 0.8× |
48+
| **A** — Level 1 | Standard autoresearch (propose → train → keep/discard) | -0.009 ± 0.002 ||
49+
| **B** — Level 1 + 1.5 | + Outer loop adjusts search config | -0.006 ± 0.006 | 0.7× |
5050
| **C** — Level 1 + 1.5 + 2 | + Outer loop generates new mechanisms as code | **-0.045 ± 0.030** | **** |
51+
| **D** — Level 1 + 2 | + Mechanisms without config adjustment | -0.034 ± 0.031 | 3.8× |
5152

5253
*Baseline val_bpb ≈ 1.10. More negative = better. 3 independent repeats × 30 iterations each.* The outer loop autonomously generated Python code for new search mechanisms, dynamically loaded them via `importlib`, and injected them into the running inner loop.
5354

0 commit comments

Comments
 (0)