Record: SP8192 + 3-Layer Recurrence + Parallel Residuals + QK-Gain 5.25 + Legal TTT — val_bpb 1.0810 (3-seed mean) by bigbag · Pull Request #1492 · openai/parameter-golf

bigbag · 2026-04-09T07:11:33Z

Summary

val_bpb = 1.0810 (3-seed mean, std 0.0002) | ~15.99 MB | 8×H100 SXM
SP8192 + 3-layer depth recurrence (L3-5) + parallel residuals (L7+) + QK-Gain 5.25 + legal score-first TTT
No SLOT, no pre-quant TTT, no n-gram cache, no ETLB — fully compliant

3-Seed Results

Seed	Sliding BPP	TTT BPP	Artifact
42	1.0829	1.0808	15,991,930
314	1.0827	1.0810	15,992,919
999	1.0826	1.0812	15,992,919
Mean	1.0827	1.0810	15,992,589
Std	0.0002	0.0002

Merged SOTA (PR #1019): 1.1147 BPP. Delta: −0.0337 BPP.

Key Techniques

SP8192 + GPTQ SDClip — int6 matrices (k=12.85), int8 embeddings (k=20.0), zero pruning (PR Record: SP8192 + GPTQ Embeddings + Depth Recurrence + MuonEq-R + SDClip — val_bpb 1.08563 (5 seed mean) #1394 @clarkkev)
3-Layer Depth Recurrence (L3-5, activate at 0.35) — 17 virtual layers from 11 physical
Parallel Residuals (L7+) — GPT-J style (PR Record: SP8192 + Parallel Residuals + Hessian-Aware SDClip — val_bpb 1.08354 (3-seed mean) #1412 @Robby955, PR Record: ParallelResiduals + MiniDepthRecurrence, 1.1063 BPB / 1.8679 nats, -0.0072 vs PR #1179, -0.0143 vs merged SOTA #1204 @msisovic)
QK-Gain 5.25 — monotonic improvement from 4.0 → 5.0 → 5.25
Legal Score-First TTT — SGD (lr=0.005, mom=0.9), 3 epochs, cosine decay (PR Record: LeakyReLU² + Legal Score-First TTT + Parallel Muon — val_bpb 1.1194 (3-seed mean) #549 @abaybektursun, PR Record: SP8192 + QK-Gain 5 + Legal Score-First TTT — val_bpb 1.08279 (3-seed mean) #1413 @dexhunter)
Tuned Hyperparameters — WD=0.095, MLR=0.022, EMA=0.9965, warmdown=0.72 (PR [Record] 3-Layer Depth Recurrence + EMA 0.9965 + WD 0.095 — val_bpb 1.0889 #1445 @X-Abhishek-X)
LZMA code wrapper — 16.6KB code footprint

Compliance (Track B)

Per Issue #1017:

Condition 1 (Causality): Sliding-window eval, prefix only
Condition 2 (Normalized): Standard softmax, no n-gram/logit bias
Condition 3 (Score before update): Each chunk scored under torch.no_grad() BEFORE SGD
Condition 4 (Single pass): Each token scored once, no rescoring

No SLOT, no pre-quant TTT, no ETLB, no n-gram cache. All artifacts < 16MB, train < 600s, eval < 600s.

Credits

PR #1394 @clarkkev, PR #1413 @dexhunter, PR #549 @abaybektursun, PR #1412 @Robby955, PR #1204 @msisovic, PR #1445 @X-Abhishek-X, PR #1331 @dexhunter

Acknowledgements

Thanks to OpenAI's Advanced Competitor grant ($500 compute credit via RunPod) — this was instrumental in running 160+ experiments that led to this result.

Reproduction

SEED=42 QK_GAIN_INIT=5.25 TTT_ENABLED=1 TTT_LR=0.005 TTT_EPOCHS=3 \
  torchrun --standalone --nproc_per_node=8 train_gpt.py

Test plan

3-seed validation (42, 314, 999)
All artifacts under 16,000,000 bytes
Training under 600s (588s actual)
Eval (sliding + TTT) under 600s (~500s actual)
Score-first TTT: compliant with Issue A Field Guide to Valid Submissions #1017 conditions 1-4
No SLOT, no pre-quant TTT, no ETLB, no n-gram cache

🤖 Generated with Claude Code

…25 + Legal TTT — val_bpb 1.0810 (3-seed mean) 3-seed mean: 1.0810 (std 0.0002), seeds 42/314/999 All artifacts under 16MB, training under 600s, eval under 600s Score-first TTT (SGD 3ep, cosine decay), no SLOT, no pre-quant TTT Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

bigbag · 2026-04-09T07:11:44Z

Thanks to OpenAI's Advanced Competitor grant ($500 compute credit via RunPod) — this was essential for running the experiments that led to this result. The grant covered ~320 compute hours across 160+ experiments over Steps 1-22 of our optimization journey.

bigbag · 2026-04-09T07:16:02Z

Closing — PR includes unrelated files from working branch. Will resubmit clean.

Pavel Liashkov and others added 3 commits March 22, 2026 23:41

Step 1 + Step 2

8ae9a01

Step 4, 5, 6

d0d3a49

bigbag closed this Apr 9, 2026

bigbag deleted the submission/sp8192-3recur-parresid-qk525-ttt branch April 9, 2026 07:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: SP8192 + 3-Layer Recurrence + Parallel Residuals + QK-Gain 5.25 + Legal TTT — val_bpb 1.0810 (3-seed mean)#1492

Record: SP8192 + 3-Layer Recurrence + Parallel Residuals + QK-Gain 5.25 + Legal TTT — val_bpb 1.0810 (3-seed mean)#1492
bigbag wants to merge 3 commits intoopenai:mainfrom
bigbag:submission/sp8192-3recur-parresid-qk525-ttt

bigbag commented Apr 9, 2026 •

edited

Loading

Uh oh!

bigbag commented Apr 9, 2026

Uh oh!

bigbag commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bigbag commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

3-Seed Results

Key Techniques

Compliance (Track B)

Credits

Acknowledgements

Reproduction

Test plan

Uh oh!

bigbag commented Apr 9, 2026

Uh oh!

bigbag commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

bigbag commented Apr 9, 2026 •

edited

Loading