openai · ndokutovich · Apr 9, 2026
diff --git a/records/track_10min_16mb/2026-04-09_SP1024_SLOT24_QK525_PreQuantTTT10/README.md b/records/track_10min_16mb/2026-04-09_SP1024_SLOT24_QK525_PreQuantTTT10/README.md
@@ -0,0 +1,62 @@
+# Record: SP1024 + SLOT-24 + QK5.25 + Pre-Quant TTT — val_bpb 0.8265 (3-seed mean)
+
+**val_bpb = 0.8265** (3-seed mean, std 0.0029) | **~15.76 MB** | 8xH100 SXM
+
+## 3-Seed Results
+
+| Seed | SLOT BPB | Sliding BPB (no SLOT) | Steps | Artifact |
+|------|----------|----------------------|-------|----------|
+| 42   | **0.82329038** | 1.08834264 | 6591 | 15,764,692 |
+| 1337 | **0.82916457** | 1.08844016 | 6591 | 15,756,236 |
+| 2024 | **0.82694986** | 1.08842671 | 6591 | 15,760,000 |
+| **Mean** | **0.82646827** | | | |
+
+Prior SLOT SOTA (PR #1313): **0.8637 BPB**. Delta: **-0.0372 BPP**.
+
+## Novel Contribution
+
+PR #1313 SLOT base enhanced with **pre-quant AdamW TTT** — first combination of weight-level TTT and hidden-state SLOT optimization.
+
+Pipeline: train -> EMA -> **pre-quant TTT (10ep)** -> GPTQ -> SLOT eval
+
+Pre-quant TTT improves the base model quality (1.12 -> 1.09 sliding), then SLOT pushes further from a stronger starting point. The two techniques are complementary: TTT modifies weights (baked into artifact), SLOT modifies hidden states (eval-time, discarded per window).
+
+## Changes from PR #1313
+
+| Parameter | PR #1313 | This PR |
+|-----------|----------|---------|
+| QK_GAIN_INIT | 4.0 | **5.25** |
+| Pre-quant TTT | None | **10ep, lr=0.00045, freeze 1 block** |
+| Sliding BPB (no SLOT) | ~1.12 | **1.088** |
+| **SLOT BPB** | **0.8637** | **0.8265** |
+
+## Architecture (inherited from PR #1313)
+
+SP1024, 11L 512dim, GQA 8/4, MLP 3x, squared LeakyReLU, XSA-all, VRL (Value Residual Learning), BigramHash (1024, dim 128), SmearGate, U-Net skip connections, EMA 0.997, Late QAT, Muon optimizer, mixed int6/int8 + LZMA.
+
+## SLOT Mechanism
+
+- Frozen model forward pass -> hidden states
+- Per-window learnable: delta (hidden perturbation) + logit_bias
+- 24 AdamW steps, cosine LR 0.012 -> 0.001
+- Optimizes on scored positions (stride=96 window)
+- Delta and logit_bias discarded after each window
+
+## Compliance
+
+- Training within 600s wallclock on 8xH100
+- Pre-quant TTT: trains on val data before quantization, baked into artifact
+- SLOT: frozen model weights, only throwaway per-window delta+logit_bias optimized
+- No n-gram cache, no data leakage across windows
+
+## Reproduction
+
+```bash
+pip install brotli sentencepiece kernels
+python3 data/cached_challenge_fineweb.py --variant sp1024
+SEED=42 QK_GAIN_INIT=5.25 PREQUANT_TTT_ENABLED=1 torchrun --standalone --nproc_per_node=8 train_gpt.py
+```
+
+## Credits
+
+PR #1313 @anthony-maio (SLOT architecture, base code), PR #1423 @aryanbhosale (TTT technique inspiration), PR #1482 @aamodbhatt (QK-Gain sweep)
diff --git a/records/track_10min_16mb/2026-04-09_SP1024_SLOT24_QK525_PreQuantTTT10/submission.json b/records/track_10min_16mb/2026-04-09_SP1024_SLOT24_QK525_PreQuantTTT10/submission.json
@@ -0,0 +1,25 @@
+{
+  "author": "ndokutovich",
+  "github_id": "ndokutovich",
+  "name": "SP1024 + SLOT-24 + QK5.25 + Pre-Quant AdamW TTT (10ep) + XSA + VRL + BigramHash + SmearGate",
+  "blurb": "PR #1313 SLOT base (frozen-model SLOT-24 with delta+logit_bias optimization) enhanced with QK-Gain 5.25 and pre-quant AdamW TTT (10 epochs, lr=0.00045, freeze 1 block, cosine decay). Architecture: 11L, 512dim, GQA 8/4, MLP 3x, squared LeakyReLU, XSA-all, VRL, BigramHash 1024, SmearGate, EMA 0.997, Late QAT, mixed int6/int8 + LZMA. SLOT: 24 AdamW steps per window, cosine LR 0.012->0.001, delta+logit_bias on scored positions.",
+  "date": "2026-04-09T02:00:00Z",
+  "val_loss": 1.39545390,
+  "val_bpb": 0.82646827,
+  "val_loss_std": 0.00497,
+  "val_bpb_std": 0.00294,
+  "seeds": [42, 1337, 2024],
+  "seed_results": {
+    "42":   {"val_loss": 1.39008818, "val_bpb": 0.82329038},
+    "1337": {"val_loss": 1.40000648, "val_bpb": 0.82916457},
+    "2024": {"val_loss": 1.39626703, "val_bpb": 0.82694986}
+  },
+  "pre_quant_val_loss": 1.87568620,
+  "pre_quant_val_bpb": 1.11088702,
+  "step_stop": 6591,
+  "wallclock_seconds": 600.041,
+  "eval_time_seconds": 253.325,
+  "bytes_total": 15764692,
+  "bytes_model_int6_lzma": 15698076,
+  "bytes_code": 66616
+}