Skip to content

Commit 0d77726

Browse files
Takoda Mundyclaude
andcommitted
Phase 1: revise shot plan after script inventory — most of "dev work" was phantom
Inventoried train_gpt_phase1.py and discovered it's the complete decoded PR openai#1477 reproduction. It already contains every feature the original 8-shot plan was going to "port": SP8192, parallel residuals (PARALLEL_START_LAYER=7), TTT (eval_val_sliding_ttt), int6 GPTQ, brotli, EMA 0.997, looped layers, XSA, the full set of architecture knobs. Shots 3-7 from the original plan don't need porting — they're already there as default env vars. New ★ REVISED SHOT PLAN section at the top of "Shot sequence": - R1 Baseline (in flight): defaults + 600s + TTT_ENABLED=1, no code change - R2 n=2 seed confirm: SEED=1337, no code change - R3 Full-budget variant: MAX_WALLCLOCK_SECONDS=3000, no code change - R4 AR self-gen GPTQ port from PR openai#1019: ~30 lines of new code, -0.003-0.005 BPB stretch - R5 8×H100 SXM submission run: verify DDP + write distributed launcher R1-R3 fit before noon AEST today. R4-R5 are next-session work. The original 8-shot section is kept below for historical context but is superseded by REVISED. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 4480b1f commit 0d77726

1 file changed

Lines changed: 27 additions & 1 deletion

File tree

PHASE1_PLAN.md

Lines changed: 27 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -268,7 +268,33 @@ Dropped Pod β (reference baseline) — the 1.3711 champion baseline is already
268268

269269
Total Phase 1 burn target: **$5-12**, hard cap **$15**. Phase 2/3 revert to cheap 3090 fleet (separate budget).
270270

271-
## Shot sequence (ordered, each gates the next)
271+
## ★ REVISED SHOT PLAN (2026-04-09 — after script inventory)
272+
273+
**Insight**: `train_gpt_phase1.py` is the decoded PR #1477 reproduction. It already
274+
contains the full target stack (SP8192, parallel residuals via PARALLEL_START_LAYER=7,
275+
TTT via eval_val_sliding_ttt, int6 GPTQ, brotli, EMA 0.997, looped layers, XSA
276+
last_n=11, ln_scale, qk_gain=4, softcap=30, Muon row-normalize). The original
277+
8-shot plan below was written assuming we'd build the stack incrementally — that
278+
work is unnecessary. Shots 3-7 are already in the script as default behavior.
279+
280+
**The actual remaining shots** (in priority order):
281+
282+
| # | Shot | Goal | Why | Code change? |
283+
|---|---|---|---|---|
284+
| **R1** | **Baseline run** (DOING NOW) | seed 42 + 600s wallclock + TTT_ENABLED=1, all PR #1477 defaults | Validate the script runs end-to-end on 1×H100 PCIe, get a number | None — `train_gpt_phase1.py` defaults |
285+
| **R2** | **n=2 seed confirm** | seed 1337 + same env as R1 | Confirm R1 is not lucky | None — change SEED env var |
286+
| **R3** | **Full-budget variant** | seed 42 + LONGER wallclock (1500-3000s) | Get a number that's actually competitive with PR #1477's ~1.08 (their full run is 8×H100 × 600s ≈ 4800 GPU-sec; 1×H100 × 3000s = 3000 GPU-sec, comparable) | None — `MAX_WALLCLOCK_SECONDS=3000` |
287+
| **R4** | **AR self-gen GPTQ port** | port from PR #1019 — replace `collect_hessians(train_loader)` with self-generated calibration | -0.003 to -0.005 BPB on top of #1477 | YES — new function, ~30 lines |
288+
| **R5** | **8×H100 SXM submission run** | spin up 8×H100 SXM pod, run R4 stack with `WORLD_SIZE=8`, 3-seed mean | Actual submission number | YES — verify DDP path + write `runpod_tests/loop/submission_8h100.sh` launcher |
289+
290+
**What this means for today**: R1 (in flight), R2 (15-30 min), maybe R3 (45-60 min)
291+
all fit before noon AEST. R4 + R5 are next-session work. **The original 8-shot
292+
plan's "Shots 3-7 dev work" doesn't exist** — it was already done by whoever
293+
decoded PR #1477 into `train_gpt_phase1.py`.
294+
295+
---
296+
297+
## Shot sequence (ORIGINAL — kept for historical context, superseded by REVISED above)
272298

273299
### Shot 1 — SP8192 deployment (45-60 min, $0.25)
274300

0 commit comments

Comments
 (0)