Commit 8daf3f1

and

committed

PR openai#462 base (1.0672 SOTA) + novel 25-epoch quant-sensitivity TTT

Base: PR openai#462's SwiGLU + XSA4 + U-Net architecture (1.0672 BPB) Novel additions (untried combination): 1. 25 TTT epochs (up from 10) - loss still dropping at epoch 10 2. Per-layer TTT LR by quantization sensitivity: - MLP output projections: 3x LR (highest quant damage) - MLP input projections: 0.5x LR - Everything else: 1x LR 3. DDP optimize_ddp fix for PyTorch 2.4 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

1 parent 869ddf8 commit 8daf3f1Copy full SHA for 8daf3f1

1 file changed

+525

-717

lines changed

train_gpt_frontier.py

1 file changed

+525

-717

lines changed

Comments

(0)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit 8daf3f1

1 file changed

1 file changed

File tree

1 file changed

1 file changed

0 commit comments