Skip to content

Record: SP8192 + Parallel Residuals + Coprime-Stride + TTT — val_bpb 1.08286 (3-seed mean)#10

Open
resouer wants to merge 1 commit intomainfrom
submission/sp8192-pr-coprime-ttt
Open

Record: SP8192 + Parallel Residuals + Coprime-Stride + TTT — val_bpb 1.08286 (3-seed mean)#10
resouer wants to merge 1 commit intomainfrom
submission/sp8192-pr-coprime-ttt

Conversation

@resouer
Copy link
Copy Markdown
Owner

@resouer resouer commented Apr 6, 2026

Summary

3-seed mean val_bpb: 1.08286 (std 0.00070) | 15.99 MB | 8xH100 SXM

Merged SOTA (PR openai#1019): 2.88218 nats. This run: 2.79714 nats. Delta: -0.0850 nats.

Results (3-seed)

Seed Post-TTT BPP val_loss (nats) Artifact
1337 1.08255 2.79633 15,988,547
42 1.08237 2.79588 15,990,325
2025 1.08366 2.79921 15,989,566
Mean 1.08286 2.79714

Changes from Base (PR openai#1394)

  1. Parallel Residuals (layers 7-10, PaLM-style) — -0.0016 BPP
  2. Coprime-Stride Loader (pseudo-random shard traversal) — -0.0016 BPP
  3. Legal Score-First TTT (SGD, LR=0.005, 3ep, PR Record: LeakyReLU² + Legal Score-First TTT + Parallel Muon — val_bpb 1.1194 (3-seed mean) openai/parameter-golf#549 pattern) — -0.0015 BPP

Compliance

No SLOT, no pre-quant TTT, no n-gram. Score-first TTT under inference_mode().

Reproduction

pip install brotli
pip install flash_attn_3 --no-deps --find-links https://windreamer.github.io/flash-attention3-wheels/cu128_torch291/
torchrun --standalone --nproc_per_node=8 train_gpt.py

Credits

Base: PR openai#1394 (@clarkkev). TTT: PR openai#549, openai#1413 (@abaybektursun, @dexhunter). PR: openai#1334 (@aryanbhosale).

…1.08286 (3-seed mean)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant