Record: SLOT + QK-Gain 4.0 + XSA-11 — val_bpb 0.9462 (3-seed mean) by anthony-maio · Pull Request #1303 · openai/parameter-golf

anthony-maio · 2026-04-03T14:38:28Z

Summary

val_bpb: 0.9462 (3-seed mean, std 0.0030)
Artifact: 15.7-15.8 MB (all seeds < 16MB)
Training: 600s on 8xH100 SXM | Eval: ~384s (sliding + SLOT)

3-Seed Results

Seed	Sliding BPB	+ SLOT BPB	Artifact
1337	1.1222	0.9493	15,742,066
42	1.1209	0.9433	15,827,886
2024	1.1216	0.9458	15,757,370
Mean	1.1216	0.9462 +/- 0.0030

Beats merged SOTA (1.1147, PR #1019) by 0.169 BPB (33x the 0.005-nat threshold, p << 0.01).

Techniques

Built on the PR #175 VRL + LeakyReLU2 + lzma base with:

QK-Gain 4.0 — per-head query scaling (PR Non-record: XSA-All + QK Gain 4.0 + LN Scale — 45 Experiments on 1×RTX 5090 #1125 sweep)
XSA all 11 layers — expanded from 4 (PR Record: QK-Gain 4.0 + XSA-11 + Muon-TTT + SLOT — val_bpb 1.0914 (3-seed mean) #1176)
SLOT-16 — Scored-position Learned Output Tuning (arXiv:2505.12392v2, PR Record: Scored-Position SLOT + Per-Sample Delta + GPTQ (val_bpb: 0.9300) #1229)
- Per-sample hidden delta [bsz, 1, 512] + logit bias [bsz, 1, 1024]
- Scored-position masking (last stride=64 tokens per non-first window)
- 16 AdamW steps, cosine LR 0.008 -> 0.0008
- Model weights frozen, delta optimized through detached hidden states

Compliance

Score-first SLOT (frozen model, torch.no_grad() hidden states)
No n-gram cache, no two-pass rescoring, no eval-time GPTQ
Self-contained (no network calls, no env overrides required)
All seeds within time and size budgets

Reproduction

torchrun --standalone --nproc_per_node=8 train_gpt.py

Training: ~600s. Eval: ~384s. Total: ~16 min.

Credits

Base: PR Record: 11L LeakyReLU² + VRL + lzma — val_bpb 1.1229 (3-seed mean) #175 (@anthony-maio)
SLOT: Hu et al. arXiv:2505.12392v2, PR Record: QK-Gain 4.0 + XSA-11 + Muon-TTT + SLOT — val_bpb 1.0914 (3-seed mean) #1176 (@bigbag), PR Record: Scored-Position SLOT + Per-Sample Delta + GPTQ (val_bpb: 0.9300) #1229 (@resouer)
QK-Gain 4.0: PR Non-record: XSA-All + QK Gain 4.0 + LN Scale — 45 Experiments on 1×RTX 5090 #1125
XSA all-layers: PR Record: QK-Gain 4.0 + XSA-11 + Muon-TTT + SLOT — val_bpb 1.0914 (3-seed mean) #1176 (@bigbag)
VRL: ResFormer (arXiv:2410.17897)

Integrates four proven post-March-25 techniques: - QK-Gain 4.0 (PR openai#1125 sweep) - XSA all 11 layers (PR openai#1176) - SLOT per-sample delta + logit bias with scored-position masking (PR openai#1229) - forward_hidden/compute_logits refactor for SLOT compatibility

… size

… 16MB cap

3-seed results (1337: 0.9493, 42: 0.9433, 2024: 0.9458) Sliding window baseline: 1.1216. SLOT-16 improvement: -0.175 BPB. All artifacts under 16MB cap. Eval time ~384s.

Copilot

Pull request overview

Adds a new 10min/16mb record submission directory implementing SLOT-16 evaluation on top of the existing VRL + LeakyReLU² + XSA-all + QK-gain=4.0 stack, along with reproducibility artifacts.

Changes:

Introduces a full training/eval script (train_gpt.py) including sliding-window scoring and SLOT (per-sample hidden delta + logit bias) eval-time optimization.
Adds record metadata (submission.json) and documentation (README.md) describing the run and results.
Adds three seed training logs capturing training, quantization, sliding-window, and SLOT metrics.

Reviewed changes

Copilot reviewed 3 out of 6 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
records/track_10min_16mb/2026-04-02_SLOT_QKGain4_XSA11_TTT/train_gpt.py	End-to-end training + int6+lzma export + sliding-window eval + SLOT eval implementation.
records/track_10min_16mb/2026-04-02_SLOT_QKGain4_XSA11_TTT/submission.json	Submission metadata and aggregated 3-seed results.
records/track_10min_16mb/2026-04-02_SLOT_QKGain4_XSA11_TTT/README.md	Human-readable summary of techniques, results, and reproduction command.
records/track_10min_16mb/2026-04-02_SLOT_QKGain4_XSA11_TTT/train_seed42.log	Seed 42 run log for reproducibility/audit.
records/track_10min_16mb/2026-04-02_SLOT_QKGain4_XSA11_TTT/train_seed2024.log	Seed 2024 run log for reproducibility/audit.
records/track_10min_16mb/2026-04-02_SLOT_QKGain4_XSA11_TTT/train_seed1337.log	Seed 1337 run log for reproducibility/audit.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-03T14:43:24Z