Record: SLOT-24 + Pre-quant TTT — val_bpb 0.7094 (3-seed mean) by stukenov · Pull Request #1376 · openai/parameter-golf

stukenov · 2026-04-05T09:02:14Z

Summary

val_bpb: 0.7094 (3-seed mean, std 0.0031)
Artifact: <16 MB (max 15,930,472)
Training: 600s | Eval: ~580s (both within limits)

Techniques

Per-Sample SLOT-24 (arXiv:2505.12392v2): per-sample delta [bsz,1,512] + logit bias [bsz,1,1024], 24 AdamW steps (cosine LR 0.024->0.001), stride=96, scored positions only. Model weights frozen.
Pre-quant AdamW TTT: 6 epochs on EMA model before GPTQ. Freeze first 2 blocks, cosine LR 0.0005.
QK-Gain 4.0 + XSA-all + Full Hessian GPTQ int6 + lzma on PR Record: AR Self-Gen GPTQ + XSA-all + BigramHash 3072×112 — val_bpb 1.11473 (3-seed mean) #1019 base.

3-Seed Results

Seed	SLOT-24 BPB	Artifact
1337	0.7064	15,930,472
42	0.7093	15,930,124
2025	0.7126	15,916,348
Mean	0.7094

Reproduction

SEED=1337 TTT_ENABLED=1 TTT_EPOCHS=6 SLOT_ENABLED=1 SLOT_STEPS=24 \
SLOT_LR=0.024 SLOT_LR_MIN=0.001 SLOT_STRIDE=96 BIGRAM_VOCAB_SIZE=1536 \
torchrun --standalone --nproc_per_node=8 train_gpt.py

Credits

PR #1019 (@abaybektursun), PR #1229 (@resouer), PR #1306, PR #1263, PR #1125, arXiv:2505.12392v2

Per-sample SLOT-24 (stride=96, LR=0.024) + Pre-quant AdamW TTT (6ep). 3 seeds: 0.7064, 0.7093, 0.7126 (mean 0.7094, std 0.0031). All artifacts under 16MB with lzma compression. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

primary path - CRITICAL: PR openai#1351 (Discriminative TTT, 1.0807) self-closed by author on 2026-04-05 — pre-quant AdamW TTT ruled as pre-eval adaptation on val data. Removed pre-quant TTT from technique table and plan. - Updated strategy to PR openai#1334 (Depth Recur + Parallel Residuals + MuonEq-R, 1.0897) as primary architecture target — zero legality flags. - Logged new PRs: openai#1379 (0.4162, n-gram mixer), openai#1376 (0.7094, SLOT-24 + pre-quant TTT), openai#1364 (1.1025, pre-quant TTT at risk), openai#1370 (1.003, GDN). - SLOT and pre-quant TTT both blocked; discriminative TTT post-quant still legal. - Updated CLAUDE.md Competition Strategy + Technique Reference + Lessons (v9.0). https://claude.ai/code/session_01RTLvTuYBp9YMtudwrY8mYM

Combines discriminative TTT (PR openai#1351) with SLOT-24 (PR openai#1376). 3-seed package: seeds 1337/42/2025, mean SLOT-24 BPB 0.7093 ± 0.0025.

Abhishek8108 mentioned this pull request Apr 6, 2026

Non-record: Discriminative TTT + SLOT-24, 3-seed verified (8xH100 SXM) #1414

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: SLOT-24 + Pre-quant TTT — val_bpb 0.7094 (3-seed mean)#1376

Record: SLOT-24 + Pre-quant TTT — val_bpb 0.7094 (3-seed mean)#1376
stukenov wants to merge 1 commit intoopenai:mainfrom
stukenov:submission/v6-slot24-ttt

stukenov commented Apr 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

stukenov commented Apr 5, 2026

Summary

Techniques

3-Seed Results

Reproduction

Credits

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant