Skip to content

Commit 8daf3f1

Browse files
het4rkclaude
andcommitted
PR openai#462 base (1.0672 SOTA) + novel 25-epoch quant-sensitivity TTT
Base: PR openai#462's SwiGLU + XSA4 + U-Net architecture (1.0672 BPB) Novel additions (untried combination): 1. 25 TTT epochs (up from 10) - loss still dropping at epoch 10 2. Per-layer TTT LR by quantization sensitivity: - MLP output projections: 3x LR (highest quant damage) - MLP input projections: 0.5x LR - Everything else: 1x LR 3. DDP optimize_ddp fix for PyTorch 2.4 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 869ddf8 commit 8daf3f1

File tree

1 file changed

+525
-717
lines changed

1 file changed

+525
-717
lines changed

0 commit comments

Comments
 (0)