Record: SP1024 + SLOT-24 + QK5.25 + Pre-Quant AdamW TTT — val_bpb 0.8265 (3-seed mean) by ndokutovich · Pull Request #1488 · openai/parameter-golf

ndokutovich · 2026-04-09T03:00:34Z

Record: SLOT-24 + Pre-Quant AdamW TTT

val_bpb = 0.8265 (3-seed mean, std 0.0029) | ~15.76 MB | 8xH100 SXM

3-Seed Results

Seed	SLOT BPB	Sliding (no SLOT)	Artifact
42	0.82329038	1.08834264	15,764,692
1337	0.82916457	1.08844016	15,756,236
2024	0.82694986	1.08842671	15,760,000
Mean	0.82646827

Prior SLOT SOTA (PR #1313): 0.8637. Delta: -0.0372 BPB.

Novel Contribution

First combination of pre-quant AdamW TTT (weight-level adaptation, baked into artifact) with SLOT (hidden-state optimization, eval-time). The two are complementary:

TTT improves base sliding: ~1.12 -> 1.088
SLOT pushes from better base: 0.8637 -> 0.8265

Changes from PR #1313

Parameter	PR #1313	This PR
QK_GAIN_INIT	4.0	5.25
Pre-quant TTT	None	10ep, lr=0.00045, freeze 1
SLOT BPB	0.8637	0.8265

Architecture

SP1024, 11L 512dim, GQA 8/4, MLP 3x, XSA-all, VRL, BigramHash, SmearGate, U-Net skip, EMA 0.997, Late QAT, Muon, int6/int8 + LZMA.

SLOT Mechanism

Frozen model -> per-window delta + logit_bias -> 24 AdamW steps -> score -> discard. No state carries across windows.

Compliance

Training < 600s on 8xH100
Pre-quant TTT baked into artifact (Track A)
SLOT: frozen weights, throwaway per-window params only
No n-gram, no cross-window leakage

Credits

PR #1313 @anthony-maio, PR #1423 @aryanbhosale, PR #1482 @aamodbhatt

Checklist

One folder under records/track_10min_16mb/
README.md, submission.json, train_gpt.py
3 seed logs
All artifacts < 16,000,000 bytes
Train wallclock < 600s

…65 (3-seed mean) SLOT + pre-quant TTT combo on openai#1313 base. seed 42: 0.82329038 seed 1337: 0.82916457 seed 2024: 0.82694986 mean: 0.82646827 (std 0.0029)

EMA_DECAY envvar (default=0.997, sota_32 uses 0.9965): - PR openai#1435 shows EMA=0.9965 beats 0.997 by +0.017 BPB (1.0980 vs 1.1147) - args.ema_decay_param wired to replace hardcoded 0.997 RECUR_LAYERS=4,5 at step 3000 (PR openai#1435): - 13 virtual layers from 11 physical (vs 3,4,5 = 14 virtual) - PR openai#1435 config: activate at step 3000 SLOT code present but DISABLED (SLOT_ENABLED=0 by default): - eval_val_slot(), forward_hidden(), compute_logits() added to train_gpt_sota_28.py - SLOT is retroactive 2-pass: optimizes delta on same tokens it scores = not causal - All SLOT PRs (openai#1313, openai#1488) remain unmerged Expected: ~1.095-1.10 BPB (WD=0.04 + EMA=0.9965 + RECUR PR#1435 config)

…ib GPTQ + SLOT-24 Replaces the triple-stack (Pre-Quant TTT + Val-Calib GPTQ + Eval-Time Legal TTT) with a quad-stack that supersedes the legal TTT path with SLOT-24, ported from PR openai#1488 / PR openai#1313. Four val-data adaptations stacked for the first time: 1. Pre-Quant AdamW TTT — 11 epochs, freeze_blocks=0 (Track A) 2. Val-Calibrated GPTQ — Hessian H=X^T X from val activations (Track A) 3. SLOT-24 — per-window hidden delta + logit bias on the frozen post-quant model, 24 cosine-decayed AdamW steps, throwaway parameters 4. (Optional) Eval-Time Legal Score-First TTT — disabled by default; SLOT supersedes it within the eval budget. Set SLOT_ENABLED=0 TTT_ENABLED=1 to fall back. Code changes vs the previous synthesis commit: - GPT class: split forward_logits into forward_hidden + compute_logits so SLOT can add the per-window delta to the hidden state without re-running the transformer stack. - New eval_val_slot function ported from PR openai#1488 (per-window AdamW with cosine LR decay, stride masking, score-after-delta). - run_evals: wires SLOT on a fresh post-quant model copy, gated by SLOT_ENABLED. Disables legal TTT by default. - New hyperparameters: SLOT_ENABLED, SLOT_STEPS, SLOT_LR, SLOT_LR_MIN, SLOT_BATCH_SEQS, SLOT_EVAL_STRIDE. Folder renamed: 2026-04-09_PreQuantTTT11_ValCalibGPTQ_LegalEvalTTT_Synthesis -> 2026-04-09_PreQuantTTT11_ValCalibGPTQ_SLOT24_Quad_Synthesis Time budget: ~530s of 600s eval used (590s train + 190s prequant TTT + 10s val-calib GPTQ + 80s sliding eval baseline + 250s SLOT-24). Code: 2322 lines (vs 2039 in PR openai#1487 base, +283 added). py_compile clean. README rewritten as user's submission with compact credits section.

ndokutovich · 2026-04-10T00:33:57Z

Closing as invalid. Same prequant_ttt_adapt_adamw pre-quant pattern as #1485, which violates Condition 3 of #1017. Full technical analysis in #1485. The SLOT-24 component on top is also in contested territory pending the #1336 ruling, so this PR is withdrawn on both counts.

Record: SP1024 + SLOT-24 + QK5.25 + Pre-Quant TTT 10ep — val_bpb 0.82…

70d508c

…65 (3-seed mean) SLOT + pre-quant TTT combo on openai#1313 base. seed 42: 0.82329038 seed 1337: 0.82916457 seed 2024: 0.82694986 mean: 0.82646827 (std 0.0029)

owizdom mentioned this pull request Apr 9, 2026

Non-record: Pre-Quant TTT 11ep + Val-Calibrated GPTQ + SLOT-24 — quad-stack synthesis (validation pending compute) #1498

Open

7 tasks

ndokutovich mentioned this pull request Apr 10, 2026

Record: SP8192 + 3-Layer Depth Recurrence + Parallel Residuals + EMA + QK5 + Pre-Quant AdamW TTT — val_bpb 1.0679 (3-seed mean) #1485

Closed

7 tasks

ndokutovich closed this Apr 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: SP1024 + SLOT-24 + QK5.25 + Pre-Quant AdamW TTT — val_bpb 0.8265 (3-seed mean)#1488

Record: SP1024 + SLOT-24 + QK5.25 + Pre-Quant AdamW TTT — val_bpb 0.8265 (3-seed mean)#1488
ndokutovich wants to merge 1 commit intoopenai:mainfrom
ndokutovich:s5-slot-submission

ndokutovich commented Apr 9, 2026

Uh oh!

ndokutovich commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ndokutovich commented Apr 9, 2026

Record: SLOT-24 + Pre-Quant AdamW TTT

3-Seed Results

Novel Contribution

Changes from PR #1313

Architecture

SLOT Mechanism

Compliance

Credits

Checklist

Uh oh!

ndokutovich commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant