openai · anthony-maio · Apr 6, 2026 · Copilot · Apr 6, 2026 · Copilot
diff --git a/records/track_10min_16mb/2026-04-06_Scylla_GPTQ_BH3072/README.md b/records/track_10min_16mb/2026-04-06_Scylla_GPTQ_BH3072/README.md
@@ -0,0 +1,48 @@
+# Scylla + GPTQ + BH3072 — val_bpb 1.0856 (3-seed mean)
+
+**val_bpb = 1.0856** (3-seed mean) | 15.3-15.8 MB | 8xH100 SXM | No SLOT, No TTT
+
+## 3-Seed Results
+
+| Seed | Sliding BPB | Artifact |
+|------|------------|----------|
+| 1337 | 1.1009 | 15,267,156 |
+| 42 | **1.0782** | 15,813,568 |
+| 2024 | **1.0777** | 15,807,116 |
+| **Mean** | **1.0856** | |
+
+Beats merged SOTA (1.1147, PR #1019) by 0.029 BPB (14x significance threshold).
+
+## Key Techniques
+
+- **Scylla tokenizer** (998-vocab TokenMonster, PR #1143 @simon-marcus): 37% fewer tokens per byte vs SentencePiece 1024
+- **AR self-gen Full Hessian GPTQ** (int6, Cholesky error compensation): 64 self-generated sequences for calibration
+- **BigramHash 3072x112**: matching #1019's configuration
+- **Architecture**: 11L 512d 8H/4KV GQA, LeakyReLU(0.5)^2 MLP 3x, VRL, VE128, XSA all 11 layers, QK-Gain 4.0, Partial RoPE 16/64, LN Scale, SmearGate, U-Net skips, EMA(0.997) + SWA, Late QAT, LZMA-9, FA3
+
+## Compliance
+
+- No SLOT (no eval-time delta optimization)
+- No TTT (no eval-time weight updates)
+- No n-gram cache
+- No network calls
+- Tokenizer byte accounting via validated metadata (candidate.meta.npz)
+- All artifacts under 16MB, all training under 600s
+
+## Reproduction
+
+```bash
+VOCAB_SIZE=998 BIGRAM_VOCAB_SIZE=3072 BIGRAM_DIM=112 WARMDOWN_ITERS=4000 \
+DATA_PATH=./data/datasets/fineweb10B_scylla \
+TOKENIZER_PATH=./candidate.vocab TOKENIZER_META_PATH=./candidate.meta.npz \
+SEED=1337 torchrun --standalone --nproc_per_node=8 train_gpt.py
+```
+
+Requires Scylla-retokenized FineWeb shards (see anthonym21/fineweb10B-scylla on HuggingFace).
+
+## Credits
+
+- Scylla tokenizer: @simon-marcus (PR #1143)
+- Training stack lineage: PR #175 (@anthony-maio), PR #1019 (@abaybektursun)
+- GPTQ: PR #1019 (@abaybektursun)
+- VRL: ResFormer (arXiv:2410.17897)
diff --git a/records/track_10min_16mb/2026-04-06_Scylla_GPTQ_BH3072/candidate.meta.npz b/records/track_10min_16mb/2026-04-06_Scylla_GPTQ_BH3072/candidate.meta.npz
diff --git a/records/track_10min_16mb/2026-04-06_Scylla_GPTQ_BH3072/candidate.vocab b/records/track_10min_16mb/2026-04-06_Scylla_GPTQ_BH3072/candidate.vocab
diff --git a/records/track_10min_16mb/2026-04-06_Scylla_GPTQ_BH3072/submission.json b/records/track_10min_16mb/2026-04-06_Scylla_GPTQ_BH3072/submission.json
@@ -0,0 +1,18 @@
+{
+    "name": "Scylla_GPTQ_BH3072",
+    "author": "Anthony Maio",
+    "github_id": "anthony-maio",
+    "date": "2026-04-06",
+    "track": "10min_16mb",
+    "num_gpus": 8,
+    "gpu_type": "H100 SXM",
+    "training_time_seconds": 600,
+    "seed_results": {
+        "1337": {"val_loss": 1.95894579, "val_bpb": 1.10089760, "artifact_bytes": 15267156},
+        "42":   {"val_loss": 1.91853397, "val_bpb": 1.07818677, "artifact_bytes": 15813568},
+        "2024": {"val_loss": 1.91764714, "val_bpb": 1.07768838, "artifact_bytes": 15807116}
+    },
+    "mean_val_bpb": 1.0856,
+    "std_val_bpb": 0.013,
+    "blurb": "Scylla tokenizer (998 vocab TokenMonster) + AR self-gen GPTQ int6 + BigramHash 3072x112 + VRL + XSA-11 + QK-Gain 4.0 + EMA/SWA + LZMA-9. No SLOT, no TTT. Legally clean."
+}