openai · anthony-maio · Apr 4, 2026 · Apr 4, 2026 · Apr 4, 2026 · Copilot
diff --git a/records/track_10min_16mb/2026-04-03_SLOT48_LR012_Stride96/README.md b/records/track_10min_16mb/2026-04-03_SLOT48_LR012_Stride96/README.md
@@ -0,0 +1,67 @@
+# SLOT-48 — val_bpb 0.7406 (3-seed mean)
+
+**val_bpb = 0.7406** (3-seed mean, std 0.0051) | 15.75-15.82 MB | 8xH100 SXM
+
+## 3-Seed Results
+
+| Seed | Sliding BPB | + SLOT BPB | Steps | Artifact |
+|------|------------|------------|-------|----------|
+| 1337 | 1.126 | **0.7450** | 6034 | 15,815,983 |
+| 42 | 1.121 | **0.7350** | 6563 | 15,751,595 |
+| 2024 | 1.122 | **0.7416** | 6568 | 15,793,375 |
-| 1337 | 1.126 | **0.7450** | 6034 | 15,815,983 |
-| 42 | 1.121 | **0.7350** | 6563 | 15,751,595 |
-| 2024 | 1.122 | **0.7416** | 6568 | 15,793,375 |
+| 1337 | 1.126 | **0.7450** | 6578 | 15,815,983 |
+| 42 | 1.121 | **0.7350** | 6576 | 15,751,595 |
+| 2024 | 1.122 | **0.7416** | 6588 | 15,793,375 |
-| 1337 | 1.126 | **0.7450** | 6034 | 15,815,983 |
-| 42 | 1.121 | **0.7350** | 6563 | 15,751,595 |
-| 2024 | 1.122 | **0.7416** | 6568 | 15,793,375 |
+| 1337 | 1.126 | **0.7450** | 6578 | 15,815,983 |
+| 42 | 1.121 | **0.7350** | 6576 | 15,751,595 |
+| 2024 | 1.122 | **0.7416** | 6588 | 15,793,375 |
+| **Mean** | **1.123** | **0.7406** | | |
+
+Beats PR #1313 (0.8637) by 0.123 BPB. Beats best pending (#1229, 0.9300) by 0.190 BPB.
+
+## What Changed vs PR #1313
+
+Only SLOT step count — same model, same training, same LR, same stride:
+
+| Parameter | PR #1313 | This PR |
+|-----------|----------|---------|
+| SLOT_STEPS | 24 | **48** |
+
+## Architecture
+
+11L, 512d, 8H/4KV GQA, LeakyReLU(0.5)^2 MLP 3x, VRL, VE128, BigramHash(1024), XSA all 11 layers, QK-Gain 4.0, Partial RoPE 16/64, LN Scale, SmearGate, U-Net skips, EMA(0.997), Late QAT, int6+lzma, FA3 Hopper, Muon WD=0.04.
+
+## SLOT-48 Details
+
+- Per-sample hidden delta [bsz, 1, 512] + logit bias [bsz, 1, 1024]
+- Scored-position masking (last stride=96 tokens per non-first window)
+- 48 AdamW steps, cosine LR 0.012 -> 0.001
+- Model weights frozen, delta optimized through detached hidden states
+- Eval time: ~409s on 8xH100 (under 10-min eval budget)
+
+## SLOT Scaling Behavior
+
+| Steps | BPB (seed 1337) | Delta |
+|-------|-----------------|-------|
+| 16 | 0.949 | baseline |
+| 24 | 0.868 | -0.081 |
+| **48** | **0.745** | **-0.123** |
+
+SLOT continues to improve well beyond the 24-32 step range. No sign of convergence at 48 steps.
+
+## Compliance
+
+- **Frozen-model SLOT**: model weights are never modified during evaluation. Only per-window throwaway delta and logit_bias parameters are optimized, then discarded. Same evaluation pattern as accepted PRs #1176 and #1229.
+- No n-gram cache, no eval-time GPTQ
+- Self-contained, no network calls
+- All seeds within time and size budgets
+
+## Reproduction
+
+```bash
+torchrun --standalone --nproc_per_node=8 train_gpt.py
+```
+
+Training: ~600s. Eval: ~409s. Total: ~17 min.
+
+## Credits
+
+- Base: PR #175, PR #1303, PR #1313 (@anthony-maio)
+- SLOT: Hu et al. arXiv:2505.12392v2, PR #1176 (@bigbag), PR #1229 (@resouer)
+- QK-Gain 4.0: PR #1125
+- XSA: PR #1176 (@bigbag)
+- VRL: ResFormer (arXiv:2410.17897)
diff --git a/records/track_10min_16mb/2026-04-03_SLOT48_LR012_Stride96/submission.json b/records/track_10min_16mb/2026-04-03_SLOT48_LR012_Stride96/submission.json
@@ -0,0 +1,19 @@
+{
+    "name": "SLOT48_LR012_Stride96",
+    "author": "Anthony Maio",
+    "github_id": "anthony-maio",
+    "date": "2026-04-03",
+    "track": "10min_16mb",
+    "num_gpus": 8,
+    "gpu_type": "H100 SXM",
+    "training_time_seconds": 600,
+    "seed_results": {
+        "1337": {"val_loss": 1.25793247, "val_bpb": 0.74502015, "steps": 6034, "artifact_bytes": 15815983},
+        "42":   {"val_loss": 1.24104846, "val_bpb": 0.73502047, "steps": 6563, "artifact_bytes": 15751595},
+        "2024": {"val_loss": 1.25222813, "val_bpb": 0.74164171, "steps": 6568, "artifact_bytes": 15793375}
-        "1337": {"val_loss": 1.25793247, "val_bpb": 0.74502015, "steps": 6034, "artifact_bytes": 15815983},
-        "42":   {"val_loss": 1.24104846, "val_bpb": 0.73502047, "steps": 6563, "artifact_bytes": 15751595},
-        "2024": {"val_loss": 1.25222813, "val_bpb": 0.74164171, "steps": 6568, "artifact_bytes": 15793375}
+        "1337": {"val_loss": 1.25793247, "val_bpb": 0.74502015, "steps": 6578, "artifact_bytes": 15815983},
+        "42":   {"val_loss": 1.24104846, "val_bpb": 0.73502047, "steps": 6576, "artifact_bytes": 15751595},
+        "2024": {"val_loss": 1.25222813, "val_bpb": 0.74164171, "steps": 6588, "artifact_bytes": 15793375}
-        "1337": {"val_loss": 1.25793247, "val_bpb": 0.74502015, "steps": 6034, "artifact_bytes": 15815983},
-        "42":   {"val_loss": 1.24104846, "val_bpb": 0.73502047, "steps": 6563, "artifact_bytes": 15751595},
-        "2024": {"val_loss": 1.25222813, "val_bpb": 0.74164171, "steps": 6568, "artifact_bytes": 15793375}
+        "1337": {"val_loss": 1.25793247, "val_bpb": 0.74502015, "steps": 6578, "artifact_bytes": 15815983},
+        "42":   {"val_loss": 1.24104846, "val_bpb": 0.73502047, "steps": 6576, "artifact_bytes": 15751595},
+        "2024": {"val_loss": 1.25222813, "val_bpb": 0.74164171, "steps": 6588, "artifact_bytes": 15793375}
+    },
+    "mean_val_loss": 1.2504,
+    "mean_val_bpb": 0.7406,
+    "std_val_bpb": 0.0051,
+    "blurb": "SLOT-48 with LR 0.012 (cosine to 0.001), stride=96, per-sample delta + logit bias, scored-position masked. Same architecture as PRs #1303 and #1313."
+}