Commit 0d77726
Phase 1: revise shot plan after script inventory — most of "dev work" was phantom
Inventoried train_gpt_phase1.py and discovered it's the complete decoded PR openai#1477
reproduction. It already contains every feature the original 8-shot plan was
going to "port": SP8192, parallel residuals (PARALLEL_START_LAYER=7), TTT
(eval_val_sliding_ttt), int6 GPTQ, brotli, EMA 0.997, looped layers, XSA, the
full set of architecture knobs. Shots 3-7 from the original plan don't need
porting — they're already there as default env vars.
New ★ REVISED SHOT PLAN section at the top of "Shot sequence":
- R1 Baseline (in flight): defaults + 600s + TTT_ENABLED=1, no code change
- R2 n=2 seed confirm: SEED=1337, no code change
- R3 Full-budget variant: MAX_WALLCLOCK_SECONDS=3000, no code change
- R4 AR self-gen GPTQ port from PR openai#1019: ~30 lines of new code, -0.003-0.005
BPB stretch
- R5 8×H100 SXM submission run: verify DDP + write distributed launcher
R1-R3 fit before noon AEST today. R4-R5 are next-session work.
The original 8-shot section is kept below for historical context but is
superseded by REVISED.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>1 parent 4480b1f commit 0d77726
1 file changed
Lines changed: 27 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
268 | 268 | | |
269 | 269 | | |
270 | 270 | | |
271 | | - | |
| 271 | + | |
| 272 | + | |
| 273 | + | |
| 274 | + | |
| 275 | + | |
| 276 | + | |
| 277 | + | |
| 278 | + | |
| 279 | + | |
| 280 | + | |
| 281 | + | |
| 282 | + | |
| 283 | + | |
| 284 | + | |
| 285 | + | |
| 286 | + | |
| 287 | + | |
| 288 | + | |
| 289 | + | |
| 290 | + | |
| 291 | + | |
| 292 | + | |
| 293 | + | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
272 | 298 | | |
273 | 299 | | |
274 | 300 | | |
| |||
0 commit comments