diff --git a/records/track_10min_16mb/2026-04-07_SP8192_ParallelResid7_Loop35_NgramTilt/README.md b/records/track_10min_16mb/2026-04-07_SP8192_ParallelResid7_Loop35_NgramTilt/README.md
new file mode 100644
index 0000000000..a3354d719b
--- /dev/null
+++ b/records/track_10min_16mb/2026-04-07_SP8192_ParallelResid7_Loop35_NgramTilt/README.md
@@ -0,0 +1,227 @@
+# Record: SP8192 + Parallel Residuals + 3-Layer Recurrence + Token-Only N-gram Tilt — val_bpb 1.08091 (5-seed mean, causal-corrected)
+
+**val_bpb: 1.08091** (5-seed mean, std 0.00043) | **2.79210 nats per token** | **~16.00 MB** | 8×H100 SXM, 600 s | Legal Score-First TTT + Causal Token-Only N-gram Tilt
+
+Beats [PR #1394](https://github.com/openai/parameter-golf/pull/1394) (1.08563) by **+0.01219 nats per token** — comfortably clearing the 0.005-nat record threshold (2.4× the bar). Also beats merged SOTA [PR #1019](https://github.com/openai/parameter-golf/pull/1019) (1.11473) by **+0.08736 nats per token**.
+
+> **2026-04-07 PM correction note** — see [Legality Fix](#legality-fix-2026-04-07-pm) section. The originally posted 5-seed mean (1.07807) was produced with a non-causal n-gram kernel inherited from [PR #1420](https://github.com/openai/parameter-golf/pull/1420). @abaybektursun [has acknowledged the bug and proposed the same fix I applied here](https://github.com/openai/parameter-golf/pull/1420#issuecomment-4199452189). The current 5-seed mean (1.08091) is ~+0.00284 BPB above the originally posted (illegal) 1.07807, but it still passes the 0.005-nat record bar against PR #1394 by 2.4×, so this remains a valid record submission. Pre-fix per-seed values are preserved in `submission.json` under `seed_results_pre_fix` for the public record.
+
+## Bar comparisons (5-seed mean 1.08091, val_loss 2.79210 nats/token)
+
+| Comparator | val_bpb | Δ (nats per token) | 0.005-nat bar |
+|---|---:|---:|---|
+| Merged SOTA [PR #1019](https://github.com/openai/parameter-golf/pull/1019) (abaybektursun) | 1.11473 | **+0.08736** | ✅ comfortably |
+| [PR #1394](https://github.com/openai/parameter-golf/pull/1394) (clarkkev) | 1.08563 | **+0.01219** | ✅ clears (2.4× the bar) |
+| Our [PR #1413](https://github.com/openai/parameter-golf/pull/1413) | 1.08279 | +0.00486 | ❌ misses by 0.00014 (essentially tied) |
+| [PR #1420](https://github.com/openai/parameter-golf/pull/1420) (same kernel family; direct pre-fix comparison is not apples-to-apples) | 1.08014 | -0.00199 | ⚠️ see note below |
+
+The unit is nats per token (per the README's record threshold). The bpb-to-nats conversion factor is the mean bytes-per-token in the sp8192 val set: 1 bpb ≈ 2.5831 nats per token (verified against this submission's own `val_bpb / val_loss` ratio).
+
+## Results (8×H100 80GB SXM, PyTorch 2.9.1+cu128, causal token-only n-gram tilt)
+
+### Core (TTT) table — 5-seed verification, all seeds re-run via shipped mini wrapper with the patched kernel
+
+| Seed | Steps | Pre-quant BPB | Sliding BPB | **Post-TTT (causal token-only) BPB** | val_loss (nats) | Artifact (bytes) |
+|---:|---:|---:|---:|---:|---:|---:|
+| 0 | 4911 | 1.08730 | 1.08219 | **1.08035** | 2.79067 | **15,994,644** ✅ |
+| 42 | 4906 | 1.08792 | 1.08272 | **1.08097** | 2.79225 | **15,995,572** ✅ |
+| 1234 | 4915 | 1.08823 | 1.08336 | **1.08127** | 2.79303 | **15,993,531** ✅ |
+| 1337 | 4905 | 1.08759 | 1.08235 | **1.08060** | 2.79131 | **15,988,802** ✅ |
+| 2025 | 4911 | 1.08833 | 1.08302 | **1.08135** | 2.79324 | **15,993,360** ✅ |
+| **5-seed mean** | | **1.08787** | **1.08273** | **1.08091** | **2.79210** | all < 16,000,000 |
+
+**Verification status:**
+- All 5 seeds independently re-run via the shipped `train_gpt.py` (~18.9 KB code) with the **patched** `fused_expert_kernel.cpp` and `NGRAM_WITHIN_BETA=0 NGRAM_WORD_BETA=0`. Each artifact is the actual `Total submission size quantized+brotli` from the mini-wrapper run.
+- All 5 artifacts fit under 16,000,000 bytes (corrected runs use the same model weights as the original submission; only the eval-time kernel changed).
+- 5-seed standard deviation: **0.00043 BPB**.
+- Pre-fix (illegal) per-seed values are preserved in `submission.json` under `seed_results_pre_fix`.
+
+## Legality Fix (2026-04-07 PM)
+
+The original kernel from [PR #1420](https://github.com/openai/parameter-golf/pull/1420) (which this submission ported with `nanobind` removed) had a causality bug in `get_hints_batch`:
+
+- Lines 384-386 read `tok = tokens_[p]` (the **target** token at the position being scored) and derived `is_bnd = is_bnd_[tok]` and `is_ws = has_ls_[tok]`.
+- Lines 399-400 then passed those flags to `within_hint(is_bnd, is_ws, ...)` and `word_hint(is_ws, ...)`, gating hint emission on whether the **current target** is mid-word vs word-start vs boundary.
+
+This means the predictive distribution at position `p` depended on metadata derived from `x_p` itself, leaking 1-2 bits per scored position about the answer. Under the [Issue #1017](https://github.com/openai/parameter-golf/issues/1017) framing, this is a violation of the prefix-only causality requirement. The original 1.07807 5-seed mean reported in PR #1437's first version is therefore tainted.
+
+**The fix** (matches @abaybektursun's [proposed patch](https://github.com/openai/parameter-golf/pull/1420#issuecomment-4199452189)):
+
+1. **Kernel patch**: derive `prev_is_bnd`/`prev_is_ws` from `tokens_[p-1]` (last prefix token) for hint gating only. The current-token reads at lines 384-386 are kept only for the *update* calls at lines 437-439 (causal because they run after hint emission for that position).
+2. **Disable within/word experts**: set `NGRAM_WITHIN_BETA=0 NGRAM_WORD_BETA=0`. Empirically, the within/word experts under prefix-only gating fire for the wrong positions (within fires for word-starts, word fires for mid-word) and contribute *negative* BPB. Only `token_hint` (which has always been causal — `compute_hashes` only reads `tokens[pos - k - 1]` for `k ≥ 0`) is left active.
+
+**Measured leak magnitude (this submission, 5-seed mean):** TTT `1.07807 BPB` → `1.08091 BPB`, delta **+0.00284 BPB ≈ +0.00734 nats per token** (using 1 bpb ≈ 2.5831 nats per token, the mean bytes-per-token in the sp8192 val set). Sliding (no tilt) and pre-quant numbers are unchanged because the kernel only affects the TTT eval pass.
+
+**PR #1420 cross-reference**: PR #1420 originally shipped the same kernel-family bug. @abaybektursun has [acknowledged it in their thread](https://github.com/openai/parameter-golf/pull/1420#issuecomment-4199452189) and proposed the same fix. Because the original `1.08014` number was reported before that correction, direct pre-fix comparison is not apples-to-apples.
+
+## Key Innovations
+
+A 3-lever stack on top of [@clarkkev's PR #1394](https://github.com/openai/parameter-golf/pull/1394) sp8192 baseline:
+
+### 1. Parallel Residuals on layers 7–10 (from [PR #1412](https://github.com/openai/parameter-golf/pull/1412) by @Robby955)
+
+GPT-J-style parallel attention + MLP for the last 4 layers. Both attention and MLP read the same pre-residual input and their outputs are summed in parallel. Reduces interference between attention and MLP during GPTQ calibration → tighter quantization gap.
+
+```python
+# Parallel (layers 7-10):
+x_out = x + attn_scale * Attn(norm(x)) + mlp_scale * MLP(norm(x))
+
+# Sequential (layers 0-6, unchanged):
+h = x + attn_scale * Attn(norm(x))
+x_out = h + mlp_scale * MLP(norm(h))
+```
+
+Verified standalone contribution: **−0.00048 BPB** on 3-seed mean (par7 alone vs control).
+
+### 2. 3-Layer Depth Recurrence (extending PR #1394's 2-layer recurrence)
+
+Loop layers **3–5 twice** instead of 4–5 twice. Encoder pattern `[0,1,2,3,4,5,3,4]` and decoder `[5,3,4,5,6,7,8,9,10]`. Costs ~200 training steps but the additional virtual depth (17 vs 15 layers) more than compensates.
+
+Verified standalone contribution on top of par7: **−0.00128 BPB** on s42.
+
+### 3. Eval-Time Causal N-gram Tilt (from [PR #1420](https://github.com/openai/parameter-golf/pull/1420) by @abaybektursun, lineage [PR #1145](https://github.com/openai/parameter-golf/pull/1145) @AnirudhRahul)
+
+A causal open-addressing n-gram cache (token orders 8/10/12/14/16, within-word orders 1–3, word-start order 4) proposes a single hint token from strict prefix state. The model's full softmax distribution is then **rescaled with a one-token exponential tilt**:
+
+```
+p_tilt(t) = p_model(t) · exp(β · 𝟙[t==hint]) / Z
+Z = 1 + p_model(hint) · (exp(β) − 1)
+```
+
+This is a **renormalized full-vocab distribution**, not a `p(correct_token)`-only blend. The hint at position `p` is computed from `tokens[0..p−1]` only; the cache is updated with `tokens[p]` AFTER position `p`'s score is locked.
+
+Per-position NLL becomes:
+```python
+mixed_nll = scored_nll + has_hint * (Z.log() - β * is_hit)
+```
+
+C++ kernel ported from PR #1420 with the nanobind dependency removed (replaced with a `extern "C"` shim and ctypes loader). Build is a single `g++ -O3 -march=native -std=c++17 -fPIC -shared` invocation against `fused_expert_kernel.cpp`. The kernel processes ~3M tokens/sec; the precompute over the full ~40.5M val tokens runs in ~32 s on rank 0 then broadcasts hints/betas to other ranks.
+
+Verified standalone contribution on top of par7: **−0.00297 BPB** on s42 (PR #1420 reports −0.0029 — port is byte-correct).
+
+## Stacking decomposition (s42)
+
+| Stack | TTT BPB | Δ vs control |
+|---|---|---|
+| Control (PR #1413) | 1.08315 | — |
+| + Parallel residuals layers 7+ | 1.08239 | −0.00076 |
+| + 3-layer recurrence | 1.08111 | −0.00204 |
+| + N-gram tilt | **1.07808** | **−0.00507** |
+
+The three levers stack approximately linearly with slight positive synergy (predicted −0.00473, actual −0.00507).
+
+## Changes from baseline (PR #1394 → this PR)
+
+| Component | PR #1394 | This PR |
+|---|---|---|
+| Tokenizer | SentencePiece BPE 8192 | (same) |
+| Architecture core | 11L / 512d / 8H / 4KV, MLP 4× | (same) |
+| Depth recurrence | Loop layers 4–5 twice | **Loop layers 3–5 twice** |
+| Block forward pattern | Sequential attn → MLP all 11 layers | **Parallel attn+MLP for layers 7–10**, sequential layers 0–6 |
+| Optimizer | MuonEq-R, WD=0.085 | (same) |
+| Quantization | GPTQ int6 + int8 embed + SDClip | (same) |
+| Eval | sliding window | sliding window **+ score-first TTT + causal n-gram tilt** |
+| QK_GAIN_INIT | 4.0 | **5.0** |
+| TTT | none | **score-first, LR=0.005, epochs=3, freeze=0** |
+| val_bpb (3-seed mean) | 1.08563 | **1.07800** |
+| Δ vs PR #1394 (per-token nats) | — | **−0.01971** |
+
+## Architecture
+
+11L × 512d × 8H / 4KV, MLP 4×, LeakyReLU(0.5)² activation, Partial RoPE (16/64 dims), tied token embeddings. Depth recurrence: encoder `[0,1,2,3,4,5,3,4]`, decoder `[5,3,4,5,6,7,8,9,10]` (loops layers 3–5 twice, activated at frac=0.5 of training, ~step 2924). Layers 7–10 use the GPT-J parallel attention+MLP pattern; layers 0–6 stay sequential.
+
+Quantization: full-Hessian GPTQ on all attention/MLP matrices at int6 with SD-based clip (`row_std × 12.85 / 31`); token embedding at int8 with clip `20 × row_std`; small control tensors and scalars kept float16/float32 via passthrough. Compression: byte-shuffle + Brotli-11. Self-extracting LZMA mini runner (~18,905 bytes code).
+
+N-gram tilt subsystem: 5 token-order open-addressing hash tables (orders 8, 10, 12, 14, 16) at `open_table_bits=26` ≈ 67M slots × 16 B/entry = 1 GB each (5 GB token-cache) + 3 within-word tables and 1 word-start table at `bits=20` (≈ 16 MB total) + 1 `WordStartState` Python dict. **Host RAM only** — not counted toward the 16 MB artifact. Built fresh from val tokens on rank 0 in ~32 s, hints/betas broadcast to other ranks before TTT eval starts.
+
+## Rule Compliance
+
+Per [repo README](https://github.com/openai/parameter-golf) and [Issue #1017](https://github.com/openai/parameter-golf/issues/1017) four conditions:
+
+- **Condition 1 (Causality)**: The n-gram cache state at position `p` is built solely from `tokens[0..p−1]`; the C++ kernel's `compute_hashes` reads only `tokens[pos − k − 1]` for `k ≥ 0`. The hint at position `p` is written to the output buffer BEFORE the kernel mutates any table with `tokens[p]`. The model forward pass is the standard causal transformer; sliding-window eval never references future tokens. See `fused_expert_kernel.cpp` `get_hints_batch` lines around the explicit `hints[i] = best_hint; betas[i] = best_beta; ... token_update(...);` ordering.
+- **Condition 2 (Normalized full distribution)**: Standard softmax over the full sp8192 vocab. The n-gram tilt rescales each per-position distribution as `p_tilt(t) = p_model(t) · exp(β · 𝟙[t==hint]) / Z` with `Z = 1 + p_model(hint) · (exp(β) − 1)`. This is a proper probability distribution over the entire alphabet — not a `p_t(correct_token)`-only blend. The hint token is chosen from prefix-only state BEFORE the realized target is consulted; the only target dependence is the indicator `𝟙[tgt==hint]`, which is the legitimate "did the realized token land on the boosted token" term.
+- **Condition 3 (Score before update)**: Every TTT chunk is scored under `torch.no_grad()` before any parameter update. Every n-gram tilt position is scored before its target token is mixed into the cache state. No same-symbol adaptation, no self-exclusion.
+- **Condition 4 (Single pass)**: Each token is scored exactly once. Sliding-window eval is forward-only (`stride < seq_len`). The C++ kernel's `get_hints_batch` walks positions in monotonic order. No rescoring, no oracle selection.
+
+Additional:
+- **No SLOT** (standard or causal). No eval-time delta optimization in hidden space.
+- **No pre-quant TTT on val data**. The model is quantized once after training, then the quantized model is evaluated under score-first TTT + n-gram tilt.
+- **No ETLB**.
+- **No tokenizer change** — uses PR #1394's SentencePiece BPE 8192 unchanged.
+- **GPTQ calibration uses `fineweb_train_*` exclusively**, inside the 588 s training cap (12 s GPTQ reserve).
+- **N-gram cache state lives in host RAM only**, not in the 16 MB artifact.
+- **C++ kernel and Python wrapper live alongside `train_gpt.py`** in the records folder; only `train_gpt.py` (the LZMA self-extracting mini wrapper, ~18.9 KB) counts toward the 16 MB artifact, matching the precedent set by [PR #1145](https://github.com/openai/parameter-golf/pull/1145).
+- **3 distinct seeds** (0, 42, 1234) — independent runs on the same hardware.
+
+## Requirements
+
+```
+torch==2.9.1+cu128
+flash-attn==2.8.3
+flash-attn-3 (interface wheel; Hopper build)
+sentencepiece
+numpy
+brotli
+gcc (any version supporting C99/C++17)
+```
+
+GCP 8×H100 80GB SXM pod with `NCCL_NET=Socket` (GCP-specific; NCCL 2.27.5 + gIB device issue).
+
+## Run Command
+
+```bash
+export NCCL_NET=Socket
+export QK_GAIN_INIT=5.0
+export PARALLEL_RESIDUAL_START=7
+export LOOP_START=3
+export LOOP_END=5
+export TTT_ENABLED=1
+export TTT_LR=0.005
+export TTT_EPOCHS=3
+export NGRAM_TILT_ENABLED=1
+export NGRAM_BASE_BETA=2.0
+export NGRAM_AGREE_BONUS=0.1
+export NGRAM_WITHIN_THRESHOLD=0.25
+# CAUSAL CORRECTION: disable within/word experts
+export NGRAM_WITHIN_BETA=0.0
+export NGRAM_WORD_BETA=0.0
+
+for SEED in 0 42 1234 1337 2025; do
+    SEED=$SEED uv run torchrun --standalone --nproc_per_node=8 train_gpt.py
+done
+```
+
+The first run will compile `fused_expert_kernel.cpp` to `libfused_ngram.so` via gcc; subsequent runs reuse the cached `.so`.
+
+## Lineage
+
+- **[PR #1394](https://github.com/openai/parameter-golf/pull/1394)** (@clarkkev) — sp8192 + GPTQ embeddings + SDClip + MuonEq-R + 2-layer depth recurrence — base stack
+- **[PR #1413](https://github.com/openai/parameter-golf/pull/1413)** (@dexhunter, ours) — sp8192 + QK-Gain 5 + legal score-first TTT — direct predecessor
+- **[PR #1412](https://github.com/openai/parameter-golf/pull/1412)** (@Robby955) — Parallel Residuals + Hessian-Aware SDClip — parallel residuals lever
+- **[PR #1420](https://github.com/openai/parameter-golf/pull/1420)** (@abaybektursun) — Triple Loop + Fused Kernels + N-gram Tilt — n-gram tilt kernel and tilt math
+- **[PR #1145](https://github.com/openai/parameter-golf/pull/1145)** (@AnirudhRahul) — Online Best-Agree N-gram — first legal normalized n-gram cache, organizer-discussed precedent in [issue #677](https://github.com/openai/parameter-golf/issues/677)
+- **[PR #1019](https://github.com/openai/parameter-golf/pull/1019)** (@abaybektursun, merged) — AR Self-Gen GPTQ + XSA + BigramHash 3072 — current merged SOTA, GPTQ pipeline ancestor
+- **[PR #549](https://github.com/openai/parameter-golf/pull/549)** (@abaybektursun, merged) — LeakyReLU² + score-first TTT — legal-TTT precedent
+
+## Credits
+
+- **@clarkkev** for the sp8192 base stack (PR #1394) this submission builds on
+- **@Robby955** for parallel residuals on layers 7–10 (PR #1412)
+- **@abaybektursun** for the n-gram tilt mechanism, the C++ kernel, and the merged-precedent legal-TTT (PRs #1420, #1019, #549)
+- **@AnirudhRahul** for the original normalized causal n-gram cache pattern (PR #1145)
+- **@msisovic** for depth recurrence (PR #1204)
+- **@bigbag** for MuonEq-R (PR #1217)
+- **@unnir** for XSA (PR #265)
+- **@simon-marcus** for the corrected Scylla byte-accounting reference (PR #1314) — used for legality discussions, not in this submission
+- **@NoesisGenesis** for the four-conditions framework (issue #1017)
+
+## Included Files
+
+- `README.md` (this file)
+- `submission.json`
+- `train_gpt.py` — self-extracting LZMA mini wrapper, ~18.9 KB. The only file counted toward the 16 MB artifact.
+- `ngram_tilt.py` — Python ctypes wrapper for the C++ n-gram kernel. Imported at runtime by `train_gpt.py`. Not counted toward artifact (parallel pattern to PR #1145's separate `online_best_agree_eval.py`).
+- `fused_expert_kernel.cpp` — C++ source for the n-gram cache. Built to `libfused_ngram.so` at runtime via `gcc -O3 -march=native -std=c++17 -fPIC -shared`. Not counted toward artifact.
+- `train_seed0.log`
+- `train_seed42.log`
+- `train_seed1234.log`
diff --git a/records/track_10min_16mb/2026-04-07_SP8192_ParallelResid7_Loop35_NgramTilt/fused_expert_kernel.cpp b/records/track_10min_16mb/2026-04-07_SP8192_ParallelResid7_Loop35_NgramTilt/fused_expert_kernel.cpp
new file mode 100644
index 0000000000..990c9faac0
--- /dev/null
+++ b/records/track_10min_16mb/2026-04-07_SP8192_ParallelResid7_Loop35_NgramTilt/fused_expert_kernel.cpp
@@ -0,0 +1,495 @@
+#include <algorithm>
+#include <cstdint>
+#include <cstring>
+
+#ifdef __linux__
+#include <sys/mman.h>
+#endif
+
+static constexpr uint64_t PRIMES[] = {
+    36313ULL,   27191ULL,   51647ULL,   81929ULL,   131071ULL,  196613ULL,
+    262147ULL,  393241ULL,  524309ULL,  655373ULL,  786433ULL,  917521ULL,
+    1048583ULL, 1179653ULL, 1310729ULL, 1441801ULL, 1572869ULL, 1703941ULL,
+    1835017ULL, 1966087ULL, 2097169ULL, 2228243ULL, 2359319ULL, 2490389ULL,
+    2621471ULL, 2752549ULL, 2883617ULL, 3014687ULL, 3145757ULL, 3276833ULL,
+    3407903ULL, 3538973ULL,
+};
+static constexpr int N_PRIMES = 32;
+static constexpr uint64_t PAIR_MIX = 1000003ULL;
+static constexpr uint64_t PREFIX_BASE = 1099511628211ULL;
+static constexpr uint64_t LEN_MIX = 0x9E3779B185EBCA87ULL;
+static constexpr uint64_t TABLE_MIX = 0x9e3779b97f4a7c15ULL;
+static constexpr uint64_t EMPTY_KEY = 0xFFFFFFFFFFFFFFFFULL;
+
+struct CtxEntry {
+    uint64_t key;
+    uint32_t count;
+    uint16_t best_tok;
+    uint16_t best_count;
+};
+
+struct PairEntry {
+    uint64_t key;
+    uint32_t count;
+    uint32_t _pad;
+};
+
+struct OpenTable {
+    uint32_t mask;
+    static constexpr int MAX_PROBES = 16;
+
+    CtxEntry* ctx = nullptr;
+    PairEntry* pair = nullptr;
+    size_t cap = 0;
+
+    ~OpenTable() { free_tables(); }
+
+    void free_tables() {
+#ifdef __linux__
+        if (ctx) { munmap(ctx, cap * sizeof(CtxEntry)); ctx = nullptr; }
+        if (pair) { munmap(pair, cap * sizeof(PairEntry)); pair = nullptr; }
+#else
+        delete[] ctx; ctx = nullptr;
+        delete[] pair; pair = nullptr;
+#endif
+    }
+
+    void init(int bits) {
+        free_tables();
+        cap = size_t(1) << bits;
+        mask = uint32_t(cap - 1);
+#ifdef __linux__
+        ctx = (CtxEntry*)mmap(nullptr, cap * sizeof(CtxEntry),
+                              PROT_READ | PROT_WRITE,
+                              MAP_PRIVATE | MAP_ANONYMOUS | MAP_POPULATE, -1, 0);
+        pair = (PairEntry*)mmap(nullptr, cap * sizeof(PairEntry),
+                                PROT_READ | PROT_WRITE,
+                                MAP_PRIVATE | MAP_ANONYMOUS | MAP_POPULATE, -1, 0);
+#else
+        ctx = new CtxEntry[cap];
+        pair = new PairEntry[cap];
+#endif
+        clear();
+    }
+
+    void clear() {
+        for (size_t i = 0; i < cap; i++) ctx[i] = {EMPTY_KEY, 0, 0, 0};
+        for (size_t i = 0; i < cap; i++) pair[i] = {EMPTY_KEY, 0, 0};
+    }
+
+    void reset() { clear(); }
+
+    void prefetch_ctx(uint64_t key) const {
+        uint32_t slot = uint32_t((key * TABLE_MIX) & mask);
+        __builtin_prefetch(&ctx[slot], 0, 0);
+    }
+    void prefetch_update(uint64_t ctx_key, uint64_t pair_key) const {
+        __builtin_prefetch(&ctx[uint32_t((ctx_key * TABLE_MIX) & mask)], 1, 0);
+        __builtin_prefetch(&pair[uint32_t((pair_key * TABLE_MIX) & mask)], 1, 0);
+    }
+
+    void ctx_lookup(uint64_t key, int& out_tok, double& out_conf,
+                    uint32_t& out_count) const {
+        uint32_t slot = uint32_t((key * TABLE_MIX) & mask);
+        for (int p = 0; p < MAX_PROBES; p++) {
+            uint32_t s = (slot + p) & mask;
+            if (ctx[s].key == key) {
+                out_count = ctx[s].count;
+                out_tok = ctx[s].best_tok;
+                out_conf = double(ctx[s].best_count) / double(out_count);
+                return;
+            }
+            if (ctx[s].key == EMPTY_KEY) break;
+        }
+        out_tok = -1; out_conf = 0.0; out_count = 0;
+    }
+
+    void update(uint64_t ctx_key, uint64_t pair_key, uint16_t token) {
+        uint32_t pair_count = 0;
+        {
+            uint32_t slot = uint32_t((pair_key * TABLE_MIX) & mask);
+            for (int p = 0; p < MAX_PROBES; p++) {
+                uint32_t s = (slot + p) & mask;
+                if (pair[s].key == pair_key) {
+                    pair[s].count++; pair_count = pair[s].count; break;
+                }
+                if (pair[s].key == EMPTY_KEY) {
+                    pair[s].key = pair_key; pair[s].count = 1;
+                    pair_count = 1; break;
+                }
+            }
+        }
+        {
+            uint32_t slot = uint32_t((ctx_key * TABLE_MIX) & mask);
+            for (int p = 0; p < MAX_PROBES; p++) {
+                uint32_t s = (slot + p) & mask;
+                if (ctx[s].key == ctx_key) {
+                    ctx[s].count++;
+                    if (token == ctx[s].best_tok) ctx[s].best_count++;
+                    else if (pair_count > ctx[s].best_count) {
+                        ctx[s].best_tok = token;
+                        ctx[s].best_count = uint16_t(std::min(pair_count, 65535u));
+                    }
+                    return;
+                }
+                if (ctx[s].key == EMPTY_KEY) {
+                    ctx[s] = {ctx_key, 1, token, 1}; return;
+                }
+            }
+        }
+    }
+};
+
+class ContextMixer {
+    static constexpr int OPEN_MIN = 8;
+    static constexpr int OPEN_MAX = 16;
+    static constexpr int N_OPEN = OPEN_MAX - OPEN_MIN + 1;
+
+    OpenTable open_[N_OPEN];
+
+    struct OrderConfig { double threshold; uint32_t min_count; };
+    OrderConfig cfg_[N_OPEN];
+
+    bool order_active_[N_OPEN];
+    int order_stride_;
+
+    static constexpr int WITHIN_ORDERS = 3;
+    OpenTable within_[WITHIN_ORDERS];
+    uint64_t within_hash_;
+    uint32_t within_len_;
+    double within_threshold_, within_beta_;
+
+    static constexpr int WORD_ORDER = 4;
+    OpenTable word_table_;
+    uint64_t word_ring_[4];
+    int word_ring_head_, word_ring_fill_;
+    uint64_t current_word_hash_;
+    int current_word_len_;
+    double word_threshold_, word_beta_;
+
+    double base_beta_, agree_bonus_;
+
+    const int64_t* tokens_ = nullptr;
+    int64_t n_tokens_ = 0;
+    const int16_t* base_bytes_ = nullptr;
+    const uint8_t* has_ls_ = nullptr;
+    const uint8_t* is_bnd_ = nullptr;
+
+    static void compute_hashes(const int64_t* tokens, int64_t pos, int max_ord,
+                               uint64_t* hashes) {
+        uint64_t h = 0;
+        int lim = std::min(max_ord, int(pos));
+        for (int k = 0; k < lim; k++) {
+            h ^= uint64_t(tokens[pos - k - 1]) * PRIMES[k % N_PRIMES];
+            hashes[k] = h;
+        }
+        for (int k = lim; k < max_ord; k++) hashes[k] = 0;
+    }
+
+    static uint64_t pair_key(uint64_t ctx, uint16_t tok, int order) {
+        return (ctx * PAIR_MIX) ^ (uint64_t(tok) * PRIMES[order % N_PRIMES]);
+    }
+
+    static uint64_t extend_prefix(uint64_t h, uint16_t tok, uint32_t pos) {
+        return (h * PREFIX_BASE) ^ ((uint64_t(tok) + 1) * PRIMES[pos % N_PRIMES]);
+    }
+
+    void token_hint(const uint64_t* hashes, int max_avail,
+                    int& out_tok, double& out_beta) {
+        for (int order = std::min(OPEN_MAX, max_avail); order >= OPEN_MIN; order--) {
+            int oi = order - OPEN_MIN;
+            if (!order_active_[oi]) continue;
+            uint64_t ch = hashes[order - 1];
+            int hint; double conf; uint32_t count;
+            open_[oi].ctx_lookup(ch, hint, conf, count);
+            if (hint >= 0 && conf >= cfg_[oi].threshold
+                          && count >= cfg_[oi].min_count) {
+                out_tok = hint;
+                out_beta = base_beta_ * conf;
+                return;
+            }
+        }
+        out_tok = -1; out_beta = 0.0;
+    }
+
+    void token_update(const uint64_t* hashes, int max_avail, uint16_t token) {
+        for (int order = OPEN_MIN; order <= std::min(OPEN_MAX, max_avail); order++) {
+            int oi = order - OPEN_MIN;
+            if (!order_active_[oi]) continue;
+            uint64_t ch = hashes[order - 1];
+            uint64_t pk = pair_key(ch, token, order);
+            open_[oi].update(ch, pk, token);
+        }
+    }
+
+    void within_hint(bool is_bnd, bool is_ws, int& out_tok, double& out_beta) {
+        if (is_bnd || is_ws || within_len_ == 0) {
+            out_tok = -1; out_beta = 0.0; return;
+        }
+        uint64_t ctx = within_hash_ ^ (uint64_t(within_len_) * LEN_MIX);
+        int oi = std::min(int(within_len_) - 1, WITHIN_ORDERS - 1);
+        int hint; double conf; uint32_t count;
+        within_[oi].ctx_lookup(ctx, hint, conf, count);
+        if (hint >= 0 && conf >= within_threshold_ && count >= 1) {
+            out_tok = hint; out_beta = within_beta_;
+        } else {
+            out_tok = -1; out_beta = 0.0;
+        }
+    }
+
+    void within_update(uint16_t token, bool is_bnd, bool is_ws) {
+        if (is_bnd) { within_hash_ = 0; within_len_ = 0; return; }
+        if (is_ws || within_len_ == 0) {
+            within_hash_ = extend_prefix(0, token, 0);
+            within_len_ = 1; return;
+        }
+        uint64_t ctx = within_hash_ ^ (uint64_t(within_len_) * LEN_MIX);
+        uint64_t pk = (ctx * PAIR_MIX) ^ (uint64_t(token) * PRIMES[0]);
+        int oi = std::min(int(within_len_) - 1, WITHIN_ORDERS - 1);
+        within_[oi].update(ctx, pk, token);
+        within_hash_ = extend_prefix(within_hash_, token, within_len_);
+        within_len_++;
+    }
+
+    uint64_t word_ctx_hash() const {
+        uint64_t h = 0;
+        int n = std::min(word_ring_fill_, WORD_ORDER);
+        for (int j = 0; j < n; j++) {
+            int idx = (word_ring_head_ - n + j + WORD_ORDER) % WORD_ORDER;
+            h ^= word_ring_[idx] * PRIMES[j % N_PRIMES];
+        }
+        return h;
+    }
+
+    void word_hint(bool is_ws, int& out_tok, double& out_beta) {
+        if (!is_ws || word_ring_fill_ < WORD_ORDER) {
+            out_tok = -1; out_beta = 0.0; return;
+        }
+        uint64_t ctx = word_ctx_hash();
+        int hint; double conf; uint32_t count;
+        word_table_.ctx_lookup(ctx, hint, conf, count);
+        if (hint >= 0 && conf >= word_threshold_ && count >= 3) {
+            out_tok = hint; out_beta = word_beta_;
+        } else {
+            out_tok = -1; out_beta = 0.0;
+        }
+    }
+
+    void flush_word() {
+        if (current_word_len_ == 0) return;
+        word_ring_[word_ring_head_] = current_word_hash_;
+        word_ring_head_ = (word_ring_head_ + 1) % WORD_ORDER;
+        if (word_ring_fill_ < WORD_ORDER) word_ring_fill_++;
+        current_word_hash_ = 0; current_word_len_ = 0;
+    }
+
+    void word_update(uint16_t token, bool is_bnd, bool is_ws) {
+        if (is_bnd) { flush_word(); return; }
+        if (is_ws) {
+            flush_word();
+            if (word_ring_fill_ >= WORD_ORDER) {
+                uint64_t ctx = word_ctx_hash();
+                uint64_t pk = pair_key(ctx, token, WORD_ORDER);
+                word_table_.update(ctx, pk, token);
+            }
+        }
+        current_word_hash_ = current_word_hash_ * 31 + token;
+        current_word_len_++;
+    }
+
+    void prefetch_open_lookups(const uint64_t* hashes, int max_avail) const {
+        for (int order = std::min(OPEN_MAX, max_avail); order >= OPEN_MIN; order--) {
+            int oi = order - OPEN_MIN;
+            if (!order_active_[oi]) continue;
+            open_[oi].prefetch_ctx(hashes[order - 1]);
+        }
+    }
+
+    void prefetch_open_updates(const uint64_t* hashes, int max_avail, uint16_t token) const {
+        for (int order = OPEN_MIN; order <= std::min(OPEN_MAX, max_avail); order++) {
+            int oi = order - OPEN_MIN;
+            if (!order_active_[oi]) continue;
+            uint64_t ch = hashes[order - 1];
+            uint64_t pk = pair_key(ch, token, order);
+            open_[oi].prefetch_update(ch, pk);
+        }
+    }
+
+public:
+    ContextMixer(double base_beta = 1.0, double agree_bonus = 0.5,
+                 double within_threshold = 0.80, double within_beta = 0.75,
+                 double word_threshold = 0.80, double word_beta = 0.50,
+                 int open_table_bits = 22, double token_threshold_scale = 1.0,
+                 int order_stride = 1)
+        : within_hash_(0), within_len_(0),
+          within_threshold_(within_threshold), within_beta_(within_beta),
+          word_ring_head_(0), word_ring_fill_(0),
+          current_word_hash_(0), current_word_len_(0),
+          word_threshold_(word_threshold), word_beta_(word_beta),
+          base_beta_(base_beta), agree_bonus_(agree_bonus),
+          order_stride_(order_stride) {
+
+        std::memset(word_ring_, 0, sizeof(word_ring_));
+
+        for (int i = 0; i < N_OPEN; i++) {
+            int order = OPEN_MIN + i;
+            order_active_[i] = ((order - OPEN_MIN) % order_stride == 0);
+            if (order_active_[i])
+                open_[i].init(open_table_bits);
+        }
+
+        double s = token_threshold_scale;
+        for (int o = 8; o <= 10; o++)  cfg_[o - OPEN_MIN] = {0.70 * s, 3};
+        for (int o = 11; o <= 13; o++) cfg_[o - OPEN_MIN] = {0.60 * s, 2};
+        for (int o = 14; o <= 16; o++) cfg_[o - OPEN_MIN] = {0.50 * s, 2};
+
+        for (int i = 0; i < WITHIN_ORDERS; i++)
+            within_[i].init(20);
+
+        word_table_.init(20);
+    }
+
+    void set_tokens(const int64_t* t, int64_t n) {
+        tokens_ = t; n_tokens_ = n;
+    }
+
+    void set_luts(const int16_t* bb, const uint8_t* ls, const uint8_t* bd) {
+        base_bytes_ = bb; has_ls_ = ls; is_bnd_ = bd;
+    }
+
+    void reset() {
+        for (auto& o : open_) if (o.ctx) o.reset();
+        for (auto& w : within_) w.reset();
+        word_table_.reset();
+        within_hash_ = 0; within_len_ = 0;
+        word_ring_head_ = 0; word_ring_fill_ = 0;
+        current_word_hash_ = 0; current_word_len_ = 0;
+    }
+
+    void get_hints_batch(const int64_t* pos, int n,
+                         int32_t* hints, double* betas) {
+
+        uint64_t hashes[OPEN_MAX];
+        uint64_t next_hashes[OPEN_MAX];
+
+        if (n > 0) {
+            int64_t p0 = pos[0];
+            compute_hashes(tokens_, p0, OPEN_MAX, hashes);
+            int ma0 = std::min(OPEN_MAX, int(p0));
+            prefetch_open_lookups(hashes, ma0);
+        }
+
+        // CAUSAL FIX (matches @abaybektursun's fix in PR #1420 — see
+        // https://github.com/openai/parameter-golf/pull/1420#issuecomment-4199452189):
+        //   1. Hint gating: is_bnd / is_ws derived from tokens_[p-1] (last prefix
+        //      token), not tokens_[p]. This makes the predictive distribution at
+        //      position p depend only on the strict prefix, satisfying Issue #1017
+        //      condition 2.
+        //   2. Update functions: tok_is_bnd / tok_is_ws derived from the actual
+        //      target tok so within_update / word_update still segment words
+        //      correctly. This is causal because updates happen AFTER the hint
+        //      for position p has been written to the output buffer.
+        //
+        // (Variable naming and structure copied verbatim from PR #1420's fix.
+        //  In addition, this submission is run with NGRAM_WITHIN_BETA=0
+        //  NGRAM_WORD_BETA=0 to disable the within/word experts entirely,
+        //  because empirically they contribute negative BPB once the leak is
+        //  removed — see Legality Fix section in the README.)
+        for (int i = 0; i < n; i++) {
+            int64_t p = pos[i];
+            auto tok = uint16_t(tokens_[p]);
+            auto prev_tok = (p > 0) ? uint16_t(tokens_[p - 1]) : uint16_t(0);
+            bool is_bnd = is_bnd_ && is_bnd_[prev_tok];
+            bool is_ws = has_ls_ && has_ls_[prev_tok];
+            int max_avail = std::min(OPEN_MAX, int(p));
+
+            if (i + 1 < n) {
+                int64_t np = pos[i + 1];
+                compute_hashes(tokens_, np, OPEN_MAX, next_hashes);
+                int nma = std::min(OPEN_MAX, int(np));
+                prefetch_open_lookups(next_hashes, nma);
+            }
+
+            int tok_hint, within_tok, word_tok;
+            double tok_beta, within_b, word_b;
+            token_hint(hashes, max_avail, tok_hint, tok_beta);
+            within_hint(is_bnd, is_ws, within_tok, within_b);
+            word_hint(is_ws, word_tok, word_b);
+
+            struct Cand { int hint; double beta; };
+            Cand cands[3]; int nc = 0;
+            if (tok_hint >= 0) cands[nc++] = {tok_hint, tok_beta};
+            if (within_tok >= 0) cands[nc++] = {within_tok, within_b};
+            if (word_tok >= 0) cands[nc++] = {word_tok, word_b};
+
+            int best_hint = -1; double best_beta = 0.0;
+            if (nc > 0) {
+                for (int a = 0; a < nc; a++)
+                    for (int b = 0; b < nc; b++)
+                        if (b != a && cands[b].hint == cands[a].hint)
+                            { cands[a].beta += agree_bonus_; break; }
+                int bi = 0;
+                for (int a = 1; a < nc; a++)
+                    if (cands[a].beta > cands[bi].beta) bi = a;
+                best_hint = cands[bi].hint;
+                best_beta = cands[bi].beta;
+            }
+
+            hints[i] = best_hint;
+            betas[i] = best_beta;
+
+            prefetch_open_updates(hashes, max_avail, tok);
+
+            bool tok_is_bnd = is_bnd_ && is_bnd_[tok];
+            bool tok_is_ws = has_ls_ && has_ls_[tok];
+            token_update(hashes, max_avail, tok);
+            within_update(tok, tok_is_bnd, tok_is_ws);
+            word_update(tok, tok_is_bnd, tok_is_ws);
+
+            std::memcpy(hashes, next_hashes, sizeof(hashes));
+        }
+    }
+
+};
+
+
+
+extern "C" {
+
+void* ctxmixer_new(double base_beta, double agree_bonus,
+                   double within_threshold, double within_beta,
+                   double word_threshold, double word_beta,
+                   int open_table_bits, double token_threshold_scale,
+                   int order_stride) {
+    return new ContextMixer(base_beta, agree_bonus,
+                            within_threshold, within_beta,
+                            word_threshold, word_beta,
+                            open_table_bits, token_threshold_scale,
+                            order_stride);
+}
+
+void ctxmixer_delete(void* self) {
+    delete static_cast<ContextMixer*>(self);
+}
+
+void ctxmixer_set_tokens(void* self, const int64_t* tokens, int64_t n) {
+    static_cast<ContextMixer*>(self)->set_tokens(tokens, n);
+}
+
+void ctxmixer_set_luts(void* self,
+                       const int16_t* bb,
+                       const uint8_t* ls,
+                       const uint8_t* bd) {
+    static_cast<ContextMixer*>(self)->set_luts(bb, ls, bd);
+}
+
+void ctxmixer_reset(void* self) {
+    static_cast<ContextMixer*>(self)->reset();
+}
+
+void ctxmixer_get_hints_batch(void* self, const int64_t* pos, int n,
+                              int32_t* out_hints, double* out_betas) {
+    static_cast<ContextMixer*>(self)->get_hints_batch(pos, n, out_hints, out_betas);
+}
+
+}  // extern "C"
diff --git a/records/track_10min_16mb/2026-04-07_SP8192_ParallelResid7_Loop35_NgramTilt/ngram_tilt.py b/records/track_10min_16mb/2026-04-07_SP8192_ParallelResid7_Loop35_NgramTilt/ngram_tilt.py
new file mode 100644
index 0000000000..7d0691f06a
--- /dev/null
+++ b/records/track_10min_16mb/2026-04-07_SP8192_ParallelResid7_Loop35_NgramTilt/ngram_tilt.py
@@ -0,0 +1,218 @@
+"""N-gram tilt eval-time helper.
+
+Wraps the C++ ContextMixer kernel from PR #1420 (legality argument in
+issue #1017) via ctypes. Builds the open-addressing hash tables on rank 0,
+broadcasts hints/betas to other ranks, and provides a torch helper that
+applies the one-token exponential tilt to per-position NLL.
+
+Math:
+  p_tilt(t) = p_model(t) * exp(beta * 1[t==hint]) / Z
+  Z = 1 + p_hint * (exp(beta) - 1)
+  -log p_tilt(target) = nll + has_hint * (log(Z) - beta * 1[tgt==hint])
+
+Score-before-update is enforced inside the C kernel: hint for position p
+is read from the prefix-only hash tables BEFORE the kernel updates them
+with token at position p.
+"""
+from __future__ import annotations
+
+import ctypes
+import os
+import subprocess
+import time
+from pathlib import Path
+
+import numpy as np
+import torch
+import torch.distributed as dist
+import torch.nn.functional as F
+
+
+_HERE = Path(__file__).resolve().parent
+# Look in ./ngram/ subdir first (dev layout), then current dir (submission layout)
+if (_HERE / "ngram" / "fused_expert_kernel.cpp").exists():
+    _NGRAM_DIR = _HERE / "ngram"
+else:
+    _NGRAM_DIR = _HERE
+_LIB_PATH = _NGRAM_DIR / "libfused_ngram.so"
+_SRC_PATH = _NGRAM_DIR / "fused_expert_kernel.cpp"
+
+_lib = None
+
+
+def _ensure_lib():
+    global _lib
+    if _lib is not None:
+        return _lib
+    if (not _LIB_PATH.exists()) or (
+        _SRC_PATH.exists() and _SRC_PATH.stat().st_mtime_ns > _LIB_PATH.stat().st_mtime_ns
+    ):
+        subprocess.run(
+            [
+                "g++", "-O3", "-march=native", "-std=c++17",
+                "-fPIC", "-shared",
+                str(_SRC_PATH),
+                "-o", str(_LIB_PATH),
+            ],
+            check=True,
+        )
+    lib = ctypes.CDLL(str(_LIB_PATH))
+    lib.ctxmixer_new.restype = ctypes.c_void_p
+    lib.ctxmixer_new.argtypes = [
+        ctypes.c_double, ctypes.c_double,
+        ctypes.c_double, ctypes.c_double,
+        ctypes.c_double, ctypes.c_double,
+        ctypes.c_int, ctypes.c_double, ctypes.c_int,
+    ]
+    lib.ctxmixer_delete.restype = None
+    lib.ctxmixer_delete.argtypes = [ctypes.c_void_p]
+    lib.ctxmixer_set_tokens.restype = None
+    lib.ctxmixer_set_tokens.argtypes = [
+        ctypes.c_void_p, ctypes.POINTER(ctypes.c_int64), ctypes.c_int64,
+    ]
+    lib.ctxmixer_set_luts.restype = None
+    lib.ctxmixer_set_luts.argtypes = [
+        ctypes.c_void_p,
+        ctypes.POINTER(ctypes.c_int16),
+        ctypes.POINTER(ctypes.c_uint8),
+        ctypes.POINTER(ctypes.c_uint8),
+    ]
+    lib.ctxmixer_reset.restype = None
+    lib.ctxmixer_reset.argtypes = [ctypes.c_void_p]
+    lib.ctxmixer_get_hints_batch.restype = None
+    lib.ctxmixer_get_hints_batch.argtypes = [
+        ctypes.c_void_p,
+        ctypes.POINTER(ctypes.c_int64), ctypes.c_int,
+        ctypes.POINTER(ctypes.c_int32), ctypes.POINTER(ctypes.c_double),
+    ]
+    _lib = lib
+    return _lib
+
+
+class NgramTiltState:
+    """Owns the precomputed hints/betas for the entire validation stream.
+
+    Construction is collective: all ranks call build_hints() but only
+    rank 0 actually runs the C++ kernel; other ranks receive the hints
+    via broadcast.
+    """
+
+    def __init__(
+        self,
+        val_tokens: torch.Tensor,
+        has_leading_space_lut: torch.Tensor,
+        is_boundary_token_lut: torch.Tensor,
+        rank: int,
+        world_size: int,
+        device: torch.device,
+        base_beta: float = 2.0,
+        agree_bonus: float = 0.1,
+        within_threshold: float = 0.25,
+        within_beta: float = 0.92,
+        word_threshold: float = 0.80,
+        word_beta: float = 0.50,
+        open_table_bits: int = 26,
+        token_threshold_scale: float = 1.0,
+        order_stride: int = 2,
+        log=print,
+    ):
+        self.rank = rank
+        self.world_size = world_size
+        self.device = device
+
+        n_tok = val_tokens.numel()
+        # Hints[i] = hint for position i (the token at val_tokens[i] given
+        # the prefix val_tokens[:i]). Position 0 has no prefix => hint -1.
+        # We compute for positions 1..n_tok-1.
+        self.hints_cpu = torch.full((n_tok,), -1, dtype=torch.int32)
+        self.betas_cpu = torch.zeros((n_tok,), dtype=torch.float64)
+
+        if rank == 0:
+            t0 = time.perf_counter()
+            lib = _ensure_lib()
+            tokens_np = val_tokens.cpu().numpy().astype(np.int64, copy=False)
+            tokens_np = np.ascontiguousarray(tokens_np)
+            # base_bytes is unused by the kernel hints (only LUTs that
+            # determine word boundaries matter), but the API expects it.
+            base_bytes_np = np.zeros(has_leading_space_lut.numel(), dtype=np.int16)
+            ls_np = has_leading_space_lut.cpu().numpy().astype(np.uint8, copy=False)
+            bd_np = is_boundary_token_lut.cpu().numpy().astype(np.uint8, copy=False)
+            base_bytes_np = np.ascontiguousarray(base_bytes_np)
+            ls_np = np.ascontiguousarray(ls_np)
+            bd_np = np.ascontiguousarray(bd_np)
+
+            mixer = lib.ctxmixer_new(
+                base_beta, agree_bonus,
+                within_threshold, within_beta,
+                word_threshold, word_beta,
+                open_table_bits, token_threshold_scale, order_stride,
+            )
+            if not mixer:
+                raise RuntimeError("ctxmixer_new returned NULL")
+            try:
+                lib.ctxmixer_set_tokens(
+                    mixer,
+                    tokens_np.ctypes.data_as(ctypes.POINTER(ctypes.c_int64)),
+                    ctypes.c_int64(int(n_tok)),
+                )
+                lib.ctxmixer_set_luts(
+                    mixer,
+                    base_bytes_np.ctypes.data_as(ctypes.POINTER(ctypes.c_int16)),
+                    ls_np.ctypes.data_as(ctypes.POINTER(ctypes.c_uint8)),
+                    bd_np.ctypes.data_as(ctypes.POINTER(ctypes.c_uint8)),
+                )
+                positions = np.arange(1, n_tok, dtype=np.int64)
+                positions = np.ascontiguousarray(positions)
+                out_hints = np.full(n_tok - 1, -1, dtype=np.int32)
+                out_betas = np.zeros(n_tok - 1, dtype=np.float64)
+                lib.ctxmixer_get_hints_batch(
+                    mixer,
+                    positions.ctypes.data_as(ctypes.POINTER(ctypes.c_int64)),
+                    ctypes.c_int(int(n_tok - 1)),
+                    out_hints.ctypes.data_as(ctypes.POINTER(ctypes.c_int32)),
+                    out_betas.ctypes.data_as(ctypes.POINTER(ctypes.c_double)),
+                )
+            finally:
+                lib.ctxmixer_delete(mixer)
+            self.hints_cpu[1:] = torch.from_numpy(out_hints)
+            self.betas_cpu[1:] = torch.from_numpy(out_betas)
+            elapsed = time.perf_counter() - t0
+            n_hits = int((out_hints >= 0).sum())
+            log(
+                f"ngram_tilt:precompute n_tok={n_tok} hints={n_hits} "
+                f"({100*n_hits/(n_tok-1):.2f}%) elapsed={elapsed:.1f}s "
+                f"base_beta={base_beta} within_beta={within_beta} agree_bonus={agree_bonus}"
+            )
+
+        # Move to device, broadcast from rank 0
+        self.hints = self.hints_cpu.to(device=device, dtype=torch.int64)
+        self.betas = self.betas_cpu.to(device=device, dtype=torch.float64)
+        if world_size > 1:
+            dist.broadcast(self.hints, src=0)
+            dist.broadcast(self.betas, src=0)
+
+    def tilt_nll(
+        self,
+        scored_nll: torch.Tensor,        # [N] float64, per-position NLL from base softmax
+        scored_logits: torch.Tensor,     # [N, V] float, logits at scored positions
+        target_ids: torch.Tensor,        # [N] int64, true target tokens
+        global_positions: torch.Tensor,  # [N] int64, position index into the val stream
+    ) -> torch.Tensor:
+        """Apply n-gram tilt to per-position NLL.
+
+        Returns mixed_nll [N] float64. When hint is -1 the tilt is a no-op.
+        """
+        hints = self.hints[global_positions]
+        betas = self.betas[global_positions]
+        has_hint = (hints >= 0).to(torch.float64)
+
+        # Recover logsumexp from nll: nll = lse - logit_tgt  =>  lse = nll + logit_tgt
+        logit_tgt = scored_logits.gather(-1, target_ids.unsqueeze(-1)).squeeze(-1).to(torch.float64)
+        safe_h = hints.clamp(min=0)
+        logit_hint = scored_logits.gather(-1, safe_h.unsqueeze(-1)).squeeze(-1).to(torch.float64)
+        lse = scored_nll + logit_tgt
+        p_hint = (logit_hint - lse).exp().clamp(0.0, 1.0)
+        Z = 1.0 + p_hint * (betas.exp() - 1.0)
+        is_hit = (target_ids == hints).to(torch.float64)
+        mixed_nll = scored_nll + has_hint * (Z.log() - betas * is_hit)
+        return mixed_nll
diff --git a/records/track_10min_16mb/2026-04-07_SP8192_ParallelResid7_Loop35_NgramTilt/submission.json b/records/track_10min_16mb/2026-04-07_SP8192_ParallelResid7_Loop35_NgramTilt/submission.json
new file mode 100644
index 0000000000..da8c2e7092
--- /dev/null
+++ b/records/track_10min_16mb/2026-04-07_SP8192_ParallelResid7_Loop35_NgramTilt/submission.json
@@ -0,0 +1,90 @@
+{
+  "name": "Record: SP8192 + Parallel Residuals + 3-Layer Recurrence + Token-Only N-gram Tilt \u2014 val_bpb 1.08091 (5-seed mean, causal-corrected)",
+  "val_bpb": 1.08091,
+  "val_loss": 2.7921,
+  "bytes_total": 15995572,
+  "blurb": "Causal-corrected 5-seed mean 1.08091 BPB (val_loss 2.79210 nats per token, std 0.00043). Beats PR #1394 (clarkkev, 1.08563) by +0.01219 nats per token, clearing the 0.005-nat record bar by 2.4x. Beats merged SOTA PR #1019 (1.11473) by +0.08736 nats per token. The originally-posted 1.07807 5-seed mean used a non-causal n-gram kernel inherited from PR #1420 (within_hint and word_hint gated emission on is_bnd[tokens_[p]]/is_ws[tokens_[p]], leaking 1-2 bits about the answer per scored position, Issue #1017 condition 2 violation). Fix matches @abaybektursun proposed patch in the PR #1420 thread: derive prev_is_bnd/prev_is_ws from tokens_[p-1] for hint gating, while updates use the actual target token via tok_is_bnd/tok_is_ws (causal because they happen after hint emission for that position). 5 seeds re-run from this submission folder with the patched kernel; the corrected number is the legitimate value. PR #1420 has the identical bug; @abaybektursun has acknowledged it. Pre-fix per-seed values preserved in seed_results_pre_fix for the public record.",
+  "author": "dexhunter",
+  "github_id": "dexhunter",
+  "date": "2026-04-07",
+  "seed_results": {
+    "0": {
+      "val_bpb": 1.08035,
+      "val_loss": 2.79067,
+      "steps": 4911,
+      "artifact_bytes": 15994644
+    },
+    "42": {
+      "val_bpb": 1.08097,
+      "val_loss": 2.79225,
+      "steps": 4906,
+      "artifact_bytes": 15995572
+    },
+    "1234": {
+      "val_bpb": 1.08127,
+      "val_loss": 2.79303,
+      "steps": 4915,
+      "artifact_bytes": 15993531
+    },
+    "1337": {
+      "val_bpb": 1.0806,
+      "val_loss": 2.79131,
+      "steps": 4905,
+      "artifact_bytes": 15988802
+    },
+    "2025": {
+      "val_bpb": 1.08135,
+      "val_loss": 2.79324,
+      "steps": 4911,
+      "artifact_bytes": 15993360
+    }
+  },
+  "lineage": [
+    "PR #1394 (clarkkev) \u2014 sp8192 base",
+    "PR #1413 (dexhunter) \u2014 sp8192 + QK5 + legal score-first TTT",
+    "PR #1412 (Robby955) \u2014 parallel residuals on layers 7-10",
+    "PR #1420 (abaybektursun) \u2014 n-gram tilt mechanism + C++ kernel",
+    "PR #1145 (AnirudhRahul) \u2014 original normalized causal n-gram cache pattern",
+    "PR #549 (abaybektursun, merged) \u2014 score-first TTT precedent"
+  ],
+  "seed_results_pre_fix": {
+    "0": {
+      "val_bpb": 1.07751,
+      "val_loss": 2.78333,
+      "steps": 4918,
+      "artifact_bytes": 15992304
+    },
+    "42": {
+      "val_bpb": 1.07809,
+      "val_loss": 2.78481,
+      "steps": 4911,
+      "artifact_bytes": 15993733
+    },
+    "1234": {
+      "val_bpb": 1.07813,
+      "val_loss": 2.78492,
+      "steps": 4908,
+      "artifact_bytes": 15990539
+    },
+    "1337": {
+      "val_bpb": 1.07801,
+      "val_loss": 2.78461,
+      "steps": 4909,
+      "artifact_bytes": 15988039
+    },
+    "2025": {
+      "val_bpb": 1.07862,
+      "val_loss": 2.7862,
+      "steps": 4908,
+      "artifact_bytes": 15992215
+    }
+  },
+  "correction_note": {
+    "date": "2026-04-07",
+    "issue": "Issue #1017 condition 2 (causality)",
+    "root_cause": "fused_expert_kernel.cpp::get_hints_batch read tokens_[p] (target token) and used is_bnd[tok]/is_ws[tok] to gate within_hint/word_hint emission, leaking 1-2 bits per scored position",
+    "fix": "kernel patched to derive prev_is_bnd/prev_is_ws from tokens_[p-1] for hint gates only; updates still use current tok (causal because they happen after hint emission). Additionally NGRAM_WITHIN_BETA=0 NGRAM_WORD_BETA=0 disables within/word experts (they contribute negative BPB once causal). Only token_hint contributes (already causal).",
+    "leak_magnitude_nats": 0.00284,
+    "shared_with": "PR #1420 (acknowledged by @abaybektursun in PR #1420 thread, fix proposal at https://github.com/openai/parameter-golf/pull/1420#issuecomment-4199452189)"
+  }
+}
\ No newline at end of file
diff --git a/records/track_10min_16mb/2026-04-07_SP8192_ParallelResid7_Loop35_NgramTilt/train_gpt.py b/records/track_10min_16mb/2026-04-07_SP8192_ParallelResid7_Loop35_NgramTilt/train_gpt.py
new file mode 100644
index 0000000000..97c9dbf21e
--- /dev/null
+++ b/records/track_10min_16mb/2026-04-07_SP8192_ParallelResid7_Loop35_NgramTilt/train_gpt.py
@@ -0,0 +1,2 @@
+import lzma as L,base64 as B
+exec(L.decompress(B.b85decode(";NAW@u3Z2$n@VT6Qap3bt~@<3h>ok~)Km^%<bI~`7~^@P9dNt*OmJtouSV|m@^}~LJOY#qcoM@9BaGiY8ypPvdJq=NbK}E`t%**OWHq5Yg*w`LcO2`&Ki8P4h{e@}LyJM^e5fwE@n-bff1Ph*mUliW#yCOV*=I+v<n|QZ$y02ziZN=i)3}Qx?Dtm=+{LAgGTC@~>c^ys%R{D_%yAk9-_tV7^coUOo3$w>`(`ci)t`2F7>r>Ltx>>S2CRw|7ov>Wn1e~_!RLQ=%V9g?)G3yPsu%SBy!lj1PaC-x%dDmCDOZ^r^!)+WWz}ejKXTJ#^U6Ra!};QocHHXQC+4UM!QQ!-N5Xd|%~a(9)bTYIO+>B~8~@lqmri%^qEkQUy074Rh6w7V_#^s9J-3BNA`G;qyR$LYcI?e+loZVWi~B$n=TKFp{%SeHYp{oNWh;U@Ahk8M2$OU%K8B$lb*dRQXd-GR_@*KAZdRdwSd#X_bO(lvJ3fp9Otblkh?o!zlDF02+sRjLV6IqG{ieQx44UY<d&*~wW)rT(c@2s{9hl6`y|y8rgmg1R#?1!{D}?K9zb-nK3<LeSkh!LeUzlS8h1WHc?b{wRB8OkBB)(6yMkVv8B?0MGfkcPfl|~~$iXJcP(EqoeNOe4svf$SH_eUIMd3dtb|BSUb1vN`%al^P71leUBh%RAF5AXfk0tT4&iR1x?tv*Fqvv>(f20c)^AD5kE{7_@f9?Q-ePHMY$wCTcn5ij2k?>T>CFcZ<|5Bh`%hA!j2d4G(X-Bbwu<(#drc<T8}FJXTYMR7vF8q3yez=cNGB8F1PdmPmj9VC!WV!^&%dKzWR96*{A7A)N7k}vCp0ML~vVXoZYuA*2FSP01TJ&$T%a|!qZZa~I^2Aiw;pbe|H(ZRgX7ZRW>k2`tR2eo<C*fPF=ua-Gjm$6ej#6$+#oQea}M^CL|C)@Vy#>$wi$p$UEHkQdFiFmlJR#zIG@3*smd<EnlP}3{vNpKXtGa{;0yoE9_J7iup*avf}g|D`*^A5(PhPU5<`XPgyoW{6*wx-$Tw0|x*z(c}SPuGvdSsAz?4>lqZ?s>Cn@<hX23WqFEHc>I!i44iGk>T1KUmKDUWEJXYFF3Mh*&Tbca$esa+z^`enxeV%UmK_#Ex_)>$lBJA(W<ErW)!h>j|4yV%J<~unPL@@@KfP=NTcv-SVPiG3BDdu=*>C1izrS~RvqEe6Re7Xf)zp2fR3F%Ntl(>3N{Nxb8vzZkhK?{<eqK!tMa#OYP1o`Dy)oYNF+~|aGI-K__ESpdyNRoBiL${AF%Y3@3k1X@$hLMe}7%sFB8m}td+;Q`z`$r_9*3{1$UQTe#oFI!vccSiPbRWIFLr-mwq_XV!8u>oB36UUOsA0T!NlVVSbwcS(gjl)HEvHq=6Z*=1K+leVBg<m4jJ7e4EcmPJSr}$Pp)Ay64)5kxRV(+mMdUbn><%)DReGXK8KQtT=Ob)zlJzHGtJR02}Vu%fKLkM)~HhU5dD=&VlgL!gs|8c=j$D8?oEu_^dRlQ}!6RxLe^t0TcirXeVR}(2Pu1`D!BOi*P<3^5?bu5eTqJT-`}_pGtdpkN4!&mXNZ<Q;xmmG~6@DcUTGjHUuNC+QydcMSU94r|TLJZRU8S*$X&)2e|KpW~oOD+=aKxpa}Kvdk-@ne2((e0K8d)ywVk=oGrCfPDT0xkV<<lcfD!Mr~U~I#J|2V&;-97RKS|1^MI*jyS#veX7LN)Ef{tk!zMoJpqI;Ea1|}TTa#c8$uO*8&1t}Tv9Ni5VlPS6nz39NP)`t)=IHn))`&)zEdvOGA@USK;+1flr$m=Rl6?{LdV%)`zAru1Ds}2{3IRt;_fqhEtuD5FguxuJq{rtkNQHm_f^Oj_*<(rl9@Ye+JDZ%4cedZ5ugs#oS{vgWPvBIsyheDYw`i-H4Tt`>*FBi3nTQ@igs;6i97g}21;R9YNPDD>?%w6P{8Z_KmZ}_DJFT-f23^K?m9B<Q-RXo8(>eF2r#}9&mN6UUNJMb~dcQ6l#Jz=?qD<ki)GY~f7@IyAjs2RHel_*DV~bRzFuD;{Mph5v>v*uJ-r>z>ZN--6tXeCq2b|Prt+lOeWhp^NJTt-%0@?3KB*V-jCLL5a?YNw~LGAC?q&iOu${8+rK@%`yD!r{NfWrZ7@{)GDBb@&<^<UrRG>lU9Y)rpq#^m-5wiz>qp_@|$E|1w?KK*n#fPl9YiT9Ro@s#LNC|+Om7AIqIyq5_|R4PeHn4I9%nm`+~#WRmIxo%tiNZAzIJBCN$=O0f(1*tJ*Ml1>;0|H>MdSu;v1pyZK8@L^yxT8dVP(;a&7P)9FlWygn#kJ3EtFl*b*90Xf29&>cN%<J%fA7g(s^dtawyu#vaoE=p7X$img0o=~sU*QzJsPzou1mB%vRJkll&;VpGr?jl3M|w5*#aq?I%$E;T3edl*SB-a5-(deK-#Xm1s;23osMm^Y6uq{Db|wZ>UC*%K&2MAu>iogQ&k>o!fwyXP&%N7w6lx1HCXirEhX+}V%Qt7ZQ%Lrao6UcR_x&{B%Jk5CSY*L@PjJYz|p_{j=nb#80T4Ac-YbLx~y}i(s&xkwCV5QsrS4gQ-{LosImeTb9(3|paqE)wQBqcL49@UNI6_4>Y)YD8+s(xg-6RsnfKImHj<qxozfZ3xkGj>P^=G#HnF6AX=d5~-O8vO66z=4#cmgw#gYtp{JO2`hd;-<C>Q+()1EBRuLvwRID|2#%4M^kjypBTgsyss)JJz|qRyC+J_hU%RhTISO<VLBwxE(>JVx^`U<R_RPYX~IU|!6~6z;w_C(^mVNAKzPv{T)DxJN_zffC(#wIcu`2615ukbvAa+(n=L!gTq_^ct6JT{eFCwd-3^I(Y(<f=m$Tj5J=_JLj2cF3XPV8&<835KfftMJR*Rh32gTaY{i~CPkV|KsK~~u1gzhoQdZ7X}^nSJY9d~?w6h|zTnK(^|l)W#kJ`vQsNud6l8Z5@L;ET?8=Gv^@P7wt!G*_R5CCap#2&MHJH{0SF+M=gf=8T&&`{J8QLB0H~4qucsJ3J(%M!py6Fw1r1^V4+QXTpB7C!nT~pcp1k5yg<t6mJCn_djYWBE)-2)Hpq7w3eCe}X92h)Z={F7%gsXnW&#ODEtsTT7FNQK{+f0PFe=uSARaDGf*g-uzy7bDuR#21L(Z-O&&%J5q-Kh{)XP7p(4(oE~^1zwf-1Vqdun^Tm?(FIgQO#bJhku+_mA8E`)SrRd3c-i`!|6ge6<8rDCr!`NSF++4^Rs9V_XfmrvQ<)Da5A*NG-9{u+$z!Q-K?G^Yd*47_<ADLt&$vk+<E?TsBk4S+w7sobD@GQjP?6kWGZ{@<?C+TtaGF;?MUQkQ2BlAdvlQ*Xca(O`UVS=5VQA#9hBIb<+$+(Sk=w>Lxysi`+AHS5@!cU;lz7_cZW9cM#a$1VmZf}&P-se+9D!Vs;@i1u>e|k8-wrYGSydxjyptn=cQK2gzYi~hfrv<Q40F|k3Mbunu>JeH7u9xhc3DBNG&vVTH0Mh}J~JI`?r4$@w(%a|kK5$-;Zc&0n(x0i)|KN_sfXLco76cS=b$+<9R|^=EmIES013ekl6rJS($E<+cujl_xS&l6_)zXROiBKOPx6+~xLTv;ec|T0A3VdD^Nsjq^@Z|WBDk<E`t)J1uD&br=o=Ad(T@sD?5y6(Y<XU84jtK}R-7qS^2q5^229VF2zgoyqJ~H5YG8>-PUPdc)&A^qhSxF+fX(T-i5|~MH05s?O)TmN3Ut>^&JvXOFD1plN7TroePsLTngI}%Zr$crBi572K`nC-X$Oz0l7ApiE}y3bUxkiSkOShgSd_iPAEDz0khSSAby#CQ<ht}c)ewoka6|2pro75;<d7U>DCDf-&plvd9GW>`95a>EktzE61>kS*n~=Y?9QwFtv&Ygd8m2ptK6CHbP8gZW$5bE)9%(LKUzjZ5r;OoO?lNM&Su{k$-iLqk7bIYd_s7HwsoRS=Blz-3*X`2lOk$aGnI7CBmi;}lYxJ3elT=c>LH?ose*rB2k8tF7$+PUyHWlK2<80lvbZKz$_3^GJLa;y^$JCl}u}Fy0JZZPF*bAEq8*juQ$m73j*9|k(V0Mc#6GrnX;L9)A1J&1!aDuOKERLoWjgeVKq0TCZD_9lf(C3)>@jeh>++)inHuP2^8i~tHJB;1w+LbNBA9y%Gm`9#mYxHI+K!rF@ENU!M|4VB+G8cshRR>L@I7#y-TDs;%9q)*8#V3b`4B9el-gE)U0qtLj#p6RgwM6Q|R!En|hV4}rG!xZ4y~|<XY1B1g>Ckj*4#nkAcI0^Fw`SWd%5mCbpc|CyaaVf5<~?Kj-M6z^q4T~D2U+IxU_09W_`<6pQjJQxwSh07rI4Ed1|kV|+YBS`;}~_}tlo2F)2f7#Dth7gCa6dxR!X}#>YTyg^;^L8W%Ka1GbxGuh0Hq#Aa=(1K(o=zRsbvP(=GtfS%Y*o!t4C|DE*Vn$nt62!D~y#gf_%aksxbJEh6uOz98#e-loQIRn4MebhNg>!SPi&m8XPYPD9HC@%OQivgMr45}6dPXXPX$vrOY6BGM8>B(Wp+ro*;I1|`MmpAb@$VQ<g;t4yFaun8u8BjSG7B_f($@4ihjD6aV*sez0{vTs1Oc07S|;TjN!Vh^UeLeiVhiuy*7SdKfxEzbQ33mEKI@jhV5UbuCnm8k$dSa1b3G(VxKF@3ov;#%F`^;36+WBr->g%(RaNl=H~zRq0avZ0o17WS>$I{P>jrIU(d)SwO?oEXRT7L=v`-`e!ttJAz%7lA=);r#kKQ``15>D8Uu749Q5k^0IfBim5B2#?jY4#y$#r*bHABrfRb;!IDi&aufD&<A?3KDy4+qj39;H$oa$E<{Pzh%2=TG#Xk@);ryMf}Eh|iCH|80N!_?JN0IeRt;QUQgRY*!ghzIH-MhsoNOea?DSNo^5=`uiRzTW;+L@q1P)A_sg`4ro#nWQ2-0M^z#Ug*)TY~mtV|?&qwhmCdCgNzP;9Ln@jdFLrX>id%%IH2Wx^@E-+NcnPf+*=QrG^7bk(OuMXBhW4AxXWW9Gc_lcBbITsK;IXW0OqMd;+*cUR9^-`vS{h$f1PnFd#{J-jeBVI@baIJ`nKC%rs?_lPYtpj5&Q)VE-WdIp(><vg0RXfE6%0D7yoT>25#RTnuFDrUd)1z|A*B;$^wI{*NuG?|8r(1O}Uexu9pT%BhBC3jQgqgwN4Obu+|BIu<*OyesXK<am98FrmN>uTTyi{B-jYBE4wMRym&1)o}DOvVO|<$vbSs8RE#_wN5z@WS)8`RluiH1;4w?NRmAIHs9INc*IU(!$C9!~qFXcYf2ALjgI8rLY5_jUB-4ytZDXY(fAQR?%j{;DK{;Of@*Xp4xfxa;rD4+0n+s$tfizn!<i$FJx5TNb&wF|5d16&*$A;MMIrNPjYI9(1%|(q7jsG^9ohMj)C?`RtJHj3J(k@nlFZ4u9**Q3u?tD3;(m|PNXg_lZv8Z!4n)#8Cor9a5M+({=TVVMhrH)?UY-(?WDWc13Ys)f`b7)RZ>Bhao&OBa<WJOj@Xay5~^-jZpF{+J~s-2F7|EXk~bZ5%=FdVd`5Y0wAHWh>nxCf%DvDH)pH|q$T}()J__k=Ya-xdQ774qszo?~Yg=d`3ybb|eP)NE(bLpJOKepf6_hB>)KQW-us&f!R{I}FI!2v!OwuQBmHPX>;+G7KjJQkMWXU|6!sl=yT3MO;`;c2L*$a9XP<OkU<gJ{6xq;(-dUYr0-g!#ng-qkzr-4THZts9c{OApXD84xprlsJ<A6FpSK*`1nPbqM~lcg}ror_i0S{<Ah1iS9}Tw>3<kSt42-QGUL#rEn|1VGFVh54Qv>T|ehA^prt8!zN{m597Ii5BoeV!o5V6?PKmS-wAhM@JuBi_v^~$25z#=Mj02r&Gm!`A*k!98$%|g>-31hy6sqd5a)~+AC{lXdA^11g^f8k87WRGo3YxqRZG~&gf59D^L?SFq#J;KcB55x=$?X^eLwFl+}vL_@3u=sgr~KbW9CYS|o?pZVy8n&ycO}*V3_`;1eYwmf1af=LrB>-AH?YZQMlzFLz>ENZK229N4}_QSk0Kcgj#E&0%{yatj0_L=_!_-b#F(zQpig5c*pfoNX05d@qwCz;m-BTu^&7qAi-sb+=NwYTz97q50CRojK%Od8M9edg02<lWL~WTlm2P2dnAOL%0$qHC%qc87z;9o~5o$DjI#o$cA6TM)xhrjir#d`TYA2R}0J7qJ^TqwqR2P>b?TT>LcyKMQFHtC#doXU>#;$!$VySE<DyK%i@>o&>g>UcFU)JaRG)lyNqTX2O;PsAh?|!TXSov8-@kAc-3e}D#oM{aBIu64qq4*XRYol2L@YbVY`Ju2cfwGIu8BIE>T+<!_4oEMp0cAA1U&zRW!eGy~A~Vh1@HdH37-eR{KLAxgNuIapDeLb!qR8kr+u|_k<Ui@p&A+=xO!d)B4<yNzFUgx!gz%gFl#es6JQ3F=vBBK^yQ?QN9HQl+jxQRTx3nGcB?lcvmF6UVwwa;knDSSm=xGI@=iT8|o{?G;DM9nGvI*%(S-=aE?E@f?Jtn-+$quc1LAn4FCJ$TPYOLrxl;vem3W?-Qm;-+QJbrUSaA$^I+@?Wx;2#KY~WN_i*EWnHEX7wZjdWgkofeGV=kAHbbplED37khT`*hoX)mLW=nV^;tl9`!s{!jqA59o)5}lYUa}N!HV1*(uOllgN-ymF6U~Kgu6cDGoI+;|BCCnHp03|-5&nR5%!ukHFFO%uMJ}9>vQ7z*ZNgA1hBb(HF%e!iBMV>q9(Cj?7$v!1^NP@cmei7lES~!TrqK+?^aeUVX}a2Fw5#IcV7+5loL2A*5uto*Yx?pFFmVbR^cOTpL6^qiF6X5SvBRUg$80-!Yr{bh()C*zlNS?3jZ+O|W^Mzx7(2Zw$bciGOBtY%$$%>Aw1!Xa@woD*GibHCfL-_!uW<rr6Ft=el4$BjmjL^9FG|^2y)$hZ7}=A~=g5U^U7Z@P;boh=e({L-t9TT6qG5Hr9i!QyGoDt!?A7P-fw1ySfk=Mk$^qM_npL9bUcQN~9@hM0qVGuf#mKfeyl$P4zL!Z}h2=X0*bv~8LXFWkr~FD7@{x8h<{X2vkmFB~07bQEa~|rH0-o$O%6A`fZ44re74h360jd+88r&f3(5xx=%PO;31+jz$$jn#}n0LEVh0kL&m@I=}&$yd|craXzw)1Qbt~iVeH(@q*ToMb{+nvREO!N;zB0NoZ9zhLM`VC6IFRh`kqYZnIPA807Z5t6687rSq9`!6d3?Uviq!V9daFVX@j1?-FNK+lGDYJn1C9}eXOJU^RaOtH%adW$>QzqR2Qirev6mDK}y3Re2gteHk&ZJF^{Ihb3lIF4$ioaXIA@{Rj%AIOngnw}f13*#%L;$`G?Fa=0yY!FMo}K9LzpukJJnbFxoX^FogkM7Zxl@q}BUBq7R7hR>Hf2Ko*j{#lg7$wRFDhn!(<jC-AFUEMJV2nLi^i%7%TV--9V5p?0fIb!Met-YLwQ@ODqeTXrM0uc!oZVg-~`}+x_kI+D?5u#;BS<FLMNo93rjk1MpHyj#XrLwoM5mqwuT{+r$WtP?9nje0cm<h_s;CJjh!m0vea@%j>K{yh$T16n0zliGr;%g)UZrnA-7CH*Zniy&EZ5>pYrYIyDOh6{}jyZjB~}vBYI|YbYw^de|Xv{Ft8YjV^;f}W2C5nDdRh}<QYJ~HcEaqwg#wyWT}>pzk)9<f2K~&MPO#+zw<oZVx)d2j4?1z6cEHgT9i(;J6NVuriX2tPSzvSXbsH#q)ns>knyZ$A|E$Or$FmoazJ1By<~RugsnIn2lw?U%_X%`n(ubm^1ku00w2ge8YYoHO{X25beFVvJ9omJ{D)Fy)H+<_!6)fA*iHXhouYp~(0+)l2KF!B2`<$e9gmiuF$uDH<jo%cf<UwUWYB3X`cRK~vG(@ehN!Ffe^JjpcT$`iTF6#*0Tu(FPHx?zEIQr}PAIo*K6z~_je{z>5EHuJRXK1dBBR3hf?E>{H>f$OMJ6uWg4P<BCKfSjBcI54dL6V!-NU$LA-bg{;#lraq8ro+<9rd^S|lfjOM_MGuX5|X_c5l|VXz2XwR5odmT7(f?-xY~iKQ?jwbZJaY9`Ku=?HH|jKNB8#2UiuIGJRHGL(Z1*8_3Vxf&=7AzHbCxs$$Z04SUsYhGP%ql`y+>2yt!U!^$70~nNZS!Akj9RHx5)TFqlwUm*s`Zn*PR&@mUYf#Gyg*xe%O`su~+_F^irS1>RuiJ2gOnHRz!(xWPwQ4GGp!l>(*4{8x)!{f3yj$is$janrP$D<qLFHVIK|72rI6^DNEeXF(tF8!d85;HnxTdwBl5wlqXs9~}YVcZ&lAhF!hy8gV?Tx0x(?C~pn|*G;6nUMkZ9Gs*S>O`hZC!z^q^@fk6yu%H5q;%U(AW8NG?s0TopSG^6zlD=F1ac|bV%N-ebAUGq-=Nfup@AWCo=jAWOYI=tOb`}dzvei8Y0n|keC5T-l4+{%|{Un{7FR>w8Oe%EkXrPQp~%nqIDX3H9S2|zF{IQOlk-}Mwe+_>gY1XmZM_R-8X#!FNvbT&KAIMO=UgejB6<Xm>i<d^Y0<d0^?c*%Y*-aA>d8xg-uC#NQ8Jz<!~W3yL!E+JJ_)^r5hAyQlo}@Ekx|4{3@PqxW)@*`zGWTu>xP(;Ytt#-%M_R#9|=Nz!EEI^7PYzKXc*=1k}zdWibc5-nfm*ezsd~(-xIgzJ634TnqS}sI#KG7#7f02gTnk!P#-Tfx#p=gA-0u4EJ=)<x=fGV4wztC}M``MZZMWQ%GXA2+@N(sy}@Q84*wUk#CahNG|^hrkKdWXHi0aJ~#Uzndo3~on|(2mxzz&fu<k3JxA~~d3yf#+dh(W#$4E4mJ$<QrMItF!?e1+ES0DiR`8vBz2zZ3oX|Mxg9~0PVk~c6vfdtoR4U-$wt6tKDO=<S#^(JIB1p~w8^^(6ps<xH==d1?Eu?tpXc8L=iGNlEJJ04Lc$e$wO_7A8+I90dwh&l|pA&f1iiWo6wau{ER_Q#@l0EcHL8X+lxnQ^;V#$F{cys*nhhDFI(Vk8J$9mcmFVSIG0tbj9;X&`I<gdT=3K-27w{Yk9vJ&W77&TO178$3Wfnu{UZQJ6pFMJ^UXQl{FZh1jV<MdzVcgEt}nr0U*9U<*U{rSl~C-lpXP1p-j8^JzyTDH{<pfcicX>&g)8{2>Tri_yhOw42uW8G!idB_$l@B8hF59DDxDpqBO*!2%F34{=nh)(zne9^?VucCEhD{Ot*CbYHoO8fxUsaU`7HgcFb>#ptz59nM&Ew3NbTY;Ds$*?pfswbR3w&DCwkIDv`ot_+e%d{i$HD4Qs;xrT}^Abr&!grAnq(JesK64&8+M{|}FT*chmA8JQ51TEq6V3}gMhL5NW$~f&NZ_509D5Xv3Wnkke7w)PGC_>ZQzE*AKwU3mXeY2H39m>0EJ?MNiQo-`JN8KrWEN%VF+jO?$b;Pbn=uiouR#u_)WlG$Ek5LVjF&VPKpMh?l*O1N0?5P|MPChz*(EQ8nyO#u?hmB9YNu$Pa0webnZt<s))&LraKsQoCDWn>x+dVdQ)2_1(#0vQ|G0X@GcO?Slflxy{Qq&<+utg5w4G8Hq3g!{V|%YLp#;4hMhwleAZjb`$L4@;k(Z!wvs;0m!?}PrsnjY~H7&@U2(;Yi!^c#=tcRiwOw{I<V<_o?nfG)j9?%QDA5&uDYI%go9E5C1=;T}%WgQa5<XzoB6Z!sZS(E0A`_N#C;oMEayn~hR9lW|dPK)X*j*7qK3+%KL1m9)39r@XM{4=Lm2lWU2To7Ph_Si|8;%&HetE!m|u;yIpK%<{za7fZ+O5f8hN$V69%RnPqZflX4sE<O)(!TjGu2_gpWiQWS)7Q+}x9%O^p|!B-7e4P>G&9l-<jum|6b^Ql9HkOIx#G3Pk>Vi6FN8gURRO_Vndhh)5MVw~=s1z&S0nu}eowh#4lj4I7$4szF0wa9>-Onom}2D@%GQugfwL);6hOc($kxzvO<J#Oyg$Gb<)uB?7eMc{!h2n-o0W4vQ&uN=35h5)m-AnolVmoJ?(K+n(B>$7)1YGJZ7U)FCLq8t$F{mUF|zuFUmjVBlO(s&sN9`fwY|D@de0>4`hRBM4zCgn5GDL|Kb#jvtLjP6iZqhzj9OBnM*jnu#_uZr^+8N7)7DV1v0qNIGEiqcyhgxohFe<@6w4SEe!6!D^{6I0uCj9ik#qe0;lF&)mw(p@Aw6g4gh-BHtUwTcRM}y#F%A1wx2k2H^N9?!x|G1*ZaLs<+is{%yVE~}Bh1dDOxs@60el*UH{)MqpwPIEhp*k?Xlt-lhtCgEZ^2P3a`E1$6oGC{(8nqv6Jdx)9T5*M)uxPdt(iNd6jmbiF6NCfc1RBVZ<=Ig`d}f|l1S6jo3sKvs-L7lh5|+~rf+2Ei@?LTr~@4Tx!%K8y+kwf(I>NmOeJ>lccUD>OG^WWP|Q2qBPLqu{N+?C%rUB%+JDydrTsf|r(!?e!4eFoqlUxucsJP&77r-hCoy`FpQQa@wymW&Y`;-aJp$Y((*<czz>(AX0cUz2(xa08#H)9+tQeLOT@ikF2IEY+v5*%c2IJdFWLT*bytoP2Id!h9;ON1y(rnJGW`Kb?*Rw-=K7eFd>)D?x<KP9?6~|e`2Zw?%3An<Mya)wfz*AZy{8$=2CwFp8IHWDm9gf@ui-wekt0(-SY&T+frhdpM!QzTKeco??C?fRq*U@_S0zQ@~(Mt;dW}$1L`39H}c{lG_wWUlFyObr8PIftqq+U<k%Sl6n`p0GE-wRahG+1{JoBz6KPn==G#X=36d{k$q2@tR+CE(KLE-6di%;coP;XA=>>ne65(||Fzsp|Y_$RsPTlk55-Rg}E9PO>QxAE*L+k?r!qr0DDjO1<H$ABXU&vR+mdHiFn2b56AgZX=kuNJ>%gp%-AwSN=Z4<C4!-Sl}NGbAS?1qi~U;(2F0x+0kV5n%-ReU4Fs}A?Wu8$#=OJ0wjSs(!Iw*(SO6<??NM-E+i5yqL5nVW7<=f>y5l+83;*Ih4I#F#StdTEQE0@Q9P9H(n?MAm3X!Z7Qt6e1nPoottA`BG*l0t8(-G{L=a#Pa)Co~<$j8j_utZm&BC)AvrmCk)gcxlFSWa@gK69zBpUphXbo96E$ACQ2aPq@ig@pRy34Gtl{F!(KKu=rh}__pu{*>+9i9j-mFxE1ZB<742QUYvq}Fu3*Xssc)lQRT%bLd(PdbHap($b{Dy);`Q(MM1YSJ2tMSc6H_85e2AG4w9TKT$;PJsi*%D8DbeydII+$7w~DIUsKQoXu4=yed{+#*|+r56um<FtLtNO87H&b~P+1YDwkA^bO=b98UvW`+fn)zZio7RZ$k*PpA_PHPK39kL#kH}+&oDpfCfCoRT<m)n6453I~mbS+WI{GsE>NW!SsUPImhv^kSjb&0b+?f<bdgl!>?8VycwPDx}AQ7Z7F<IfN;y8RzD%a2`|h7x+9$L~KxURvK-C<{OxEr9zo)jwbM0%8NQ9iJHNT750>2lr}4np7{aiI{T1`T*8VRUaQ<R-cN=i1gIGC4Q=c#><s+D0n`&OZl~^=y~b)pA4nDFKEF>ENfIzy@zisW`_Yl5Jch(48RLOqL86_uabjMo;n{)V&<-jJ~$=h3uzZBX{tT^8|!ce+cmA;@vr+31nxoqORy&}G6O#{B}0c>)Hllu;|lAuN9rr!%qwVX!nvIqZEsbzNqR(D+kyb?)=-r9V}-vZ(5ZldGnoa>EO}l3;Lx&nH(9qsfRrL2t_2BEc^1^zgY9gi<z^X*t6*-={`wz!ZC-OYxGB-dVh<Uk>fvte%u#_8V!gZg0SwK#TBFNA=NPJFZWB=~=%AOdC&5v%y(az<LHI~yFF{8)W`MHgpS~0u)c!!{_%DH?AM@%rb?xIN1ZpEn9bw;TCy6SmwKpMTv^HZaZ`JT~)1ygFYq<dy5j(g&<T6u=`3>u^dQB2Co@^-26^9|HMd$iOpWtzhKQ8zb4ETwkjn#gMbV3<>Lw|}vR9!>AX0Y}J=Rby$hH*+XcQsR8C(|Sv35~42`|vJ;H}wKJda#0UB{IUBpA7fezt=2pUlqA-vkLnt*?HWq3(HCj@ffHr(oKk{R1<8hveONuMz^F+eaFlviWxO!f6&WaUniEX_v0o=rMgopNM{d}?K~ikX>;abG7CEPgFe+@;~_HI8wb^~?=7Uu+SrR1bpYaR=??(VyB&%o^qKk%NtgCy_Bvh<qYBD{S(`sHlo8905E|1L%ovuxeF+bdcIP^COw{@M@W{<i=#rd@*}QAaB?;|1qm8Isk%N@_(U}*1*ZgVI?OJkivc0Q<j53OnoGERf{IJyZOP})UoA$87v$xu|FVB(lDY(8x)%v+I%vlvqVF}pu4gOz?n0_$Bv}tl-zGfdBO2!HDJHRHW`28X!Gy|wZmO+8JBGCj`k?e5`j=n_Ro*{*C-5&8*5>!_rQ?H>|T{-3uasHw_m3bZqDa~e3m8bAf;)6FR#qCW1Zse%PF}A?btGWVsOccK83WIVVVXs6BuUAHkcK7w2aSeG8Y*?=@C(ktsoAln;4POxqDu;TpEOVUW8|lM*Sr--&Q@0$zhp}ss;Vg(AR|oot@OB+`iLP-Qw$e&>mZ}E6@kEw>=U8K6Ozg;rKFKy)GsVFOJc!lIn~CA(9!DCCVwCuCRgZ>c$P1WEQn>}Fb@ty$(q`K$HYoZc?$S;VFHF$Lq8t$j+~<9&B^O^|K9=LaDVWn+Ryad>J>#4CioR29%KS+nOn4hwg7pS#crW&Yr2i%ZNj^YHphLHfn%5TaoN?SS3al7T7Ivk#PDT-75`7>#%gSb}54YnHR8_0GZtxQY_ygafvnbUwwUg-~+jhd<KWPx(Z52?qTnHR@`$Xb@TMP8&Yz?iqMO(o&M=BP6KWhzk5u$#>;bu3Z0|Le=sSu?Ib<~LR8gP>s`}^z{(BDiAn0C=|d>&)6WUwBP6erT+2?26LM}ON*13Tjre67~Q-b{H`Kvyhwj|G%CwU4Y(RqAWxSl*EdEK%1PcsyV8Co*=pEmgXaU--rEEM*3)_*v+YSuV$*HL++6EuJ;dQ;7id8HHRiJpq9XO#bA|9A#t-`iHEn&3iYJx*&pg2mqih*K-$7vsrkEvCJR`vx>Pz^z*!Iu{>sRN7LlJ^KZ&LtK<1t^jw;GtYLS(iUVlZTmHU%GRqvg&N!<M6Hx$YRqAe1QU7<E_XEcD3b>hr1G~;xRu3(%4U-3s1-x;-bjHw;sPmP{cqGqaL!Dy>|8n^2Ir@kXkHKGS&lD5$n<nY|Ekd87Rs=gkyAKuIY|=7}J}2VTr3FD0H@Pnn3J!<}L*2-^<p%6?FX=@ZZekwT{4vCmWL56oO#VK*%|BjA)92|Y^u9K+-*ie_&?Ztq0Td4(s>x44AG#DptE?S{o6R;4HcX8=<u>>Q!3yB#+VO@wwfB$!urm&uWqUC+!emOC>hSNJwsGEA*!-LDNu(ctyok~M`-3^>6%v-!)?S5-mM9{*@s<0t(~m*T)L05IXPLYf&9;pv5A250>9&c45W5UVUz);q<!NU&YFh$U2uaZ9NgT~|U4Wb2K!>il@)e!Aq+1wcX&N;h0?X_VHNu56e2)$OUIWntr%{TF{@ne+1UfwO)Cft>p?0w#d+&)HA~9-?$H|;CwQkY~lmgNU4#xVqZX5*PH9$E8#pCa!c97&4gDdJRSC~VI>0!s=d>=PHe<ln(8{19p?y-)S_jKujXF&&FQ8t%Hc)5Wa5DKH{z1rzEv+5)%c{f(k85Yi-BkOm3vVK!geC!m5anYm+2c+#=M4+w^@A*!z9#^r;iIEYXhi97Qq=T7|f#=VGPLyBaXfBJc?nA!O+4&Q1DBoCf_)Fzo*++=uX=rYBU`?uzMmIfByPa;p{7-fWU@VUXd_FAfkH>tg#r1dCxN)ut-i8z?|8EfQhn@jg`_stnoUj+BWeFkCmHfG=xVLrhlDUX>`oQt9OvXE!b1Ic1M)!+*&?Ug6bJ>y$1UtAD24U`EE_n=^n5EpXZP3Qj%no>%7%2ErX$n4A^$E+qXn6#GgnzKmOt%IH%5F);72gmREDlFga(OfgNGbD}gxk*RGV$U3dqPnwkluCG#w@uE^x)RQ+DsIIwPH+VzU0T4b@a$P0PSm1ZBzU}b4mbv35<s<N0a_p+9Yq-l-5h9rx&8za4x^DE(-dMPqXyfc&>y&b=IrnA)t>*0`sm*aDwBgQfy9&4eCo!>c(lBF*-peo0q!ulYFv9L1Q><f~&=NhmOh(1A*k?g24S-QBStQz(_B~yTN=jOK)GXKTLBPj!9_PL3fb7fEf9>xJbH(2NQ4o<Kud<J!$m6&oGGk<fL&QsiK~wq9sGUs&JH2$zTg7tQlMrc0D&H7*}s|0D|jTBkSFx0u^hAZL^koV(Xw`1LoqMDFbB`Fro^`M$}mD`vdH-ey;cs;$V+`sdk~{K$5dz_mtkqB&A>=n`b9HPuXwUsV|kaz|?Q-8OAdOn4TGw!j0q5Qr8#{F}Zrr>?{qY9I%AGg%n{t%t`O@yM*m$cULT$$YoL~vGXN*n1x|B--16Svr@hQwIwafm|n5~X#|a6+b>iT20XzL`LG)eZ}?VznZCgPVm0pDatw0Q%*EmEo~G(!_KLg|os3fwv%X9Fx2L0@1*AJvMcyoxBC7#zZ)Dl^R@L8M;f3v3(>dyY1?}jw%}QSttqj(!T&(+EWTonb{G=x*FGQ&<-f(PDLC;+&97keiR({#m^Lj38qKGFow0+7W09iTDLN{CB;J>m!wai9XGH>`8U%n`P-b7_>REDsyQ6v)QiO1xYXZGLy6ui*bD({Wab8?dZptsJ=Ym$49Tt|nK)a{)oF2=y6qcM0tT7E*dsxdm1nrfoa2;EG~WAB)tR7J0@aD@l1*T)mUB%tSd3m(CYqtpaE?08i)r2A90ouv*i7+b>P`1d&vxk+f4?uR%C)fS*bX;2=DfpY_`2`JxD!vpLSwj`}P0hVv76REbLkb%C?m@%?r>`4u6B3ll@WPGBC2C%(p@|vXCvw%nDW^ho68t%l)B2;Sbddv=}dO5>_a@JfE!RW^oq!Gw1ymQQb<_GRwP+mxqJ@+K6>=tni1wQ~*re>15<I)MnLT`7#<t@3cnu^WpYWeSBtKAIqIdw_M<_Zyco9_c7Z=aB$UEfr46($j8A^IH?1YStor1qk?(nQ)n`(-?%ji7%a^rik_#|If{Y_mOz<&<l<f7wCzIp+zKAjz~|BN7zg&Cxi2nZf{l&C3tS4zo-m9(&V`1>>6hmxJPVyE6cyC>ctXC4Zwq57?l-jkF3tfJMpT?7x-*oa~*$wF=DCwO&{UWuUE##B}%!bue#M!C6$Y#_Xd~n!aN`j3bsnv0ByV-)M4&j5Hvch8-kICwk@rCt%ab#>P`=z*Mlo$p#ueOwA&ajS93|c|z~y=XdiG@oYj=Q1PzE>+OMA%T2HW6Gf4=tijw64QJN1_vm>UA?>v6_x`4`ormh2U?)yM@ejj`flo71m3=gb%e?BA&dE9Br8lJB>?l&+;;=cv13w}3K7j{1n%?bWty#ecwdWATwT6Yl@-ejk(mBDfFp%SJik=Z#jM`%F|1;sP^$Og+nSH_eeEOwt9PygjRlTI2*DSLr8t3*?R0^%sM4@_Nd&eQ&)_|BN@6^+ZELj$W;r!z(7=_+>LCkMuCc@w#3oK8ftNlt-ZWnaiu&d_$u%FNBhO^{-Eo^uqIy(1Lb*}_sCMo>*$F)>_zHcysBRhoV4}wC(zGiV!k`YA08xSoi${RZ$Z@S+D2IJ}WIun#I$6uZ(j-O=#eV(RcNy`*ca&1>E6Cow*=BC`H1vk1PI|kImiTNCL09bzc+l8|yAHPSmTt5YzaC(?RH=>b)^9%G0`9Jcxi&LM2R5%ZWYCJ~6f7di5vl6hNt7U`Yj5HR{dqm@?d|)3Q!MsjE-)61e!x<P^p*Pf(Zv3v|Qak3Y+o{Xd_59R5Px6?(AZ5qc8?p|w_F`P(9}W>r@}TbYANQ6qPj6H74_WAA1D%dMiajsuOadu)Qet%MNN%E|7eO_}6<J%U8Hv|0`@P7kj3fH|=keM*Mbk}H-(3lR&LMpRA*46+bt;G%q*8S|=*J5vVzKTA?8KS}geK434h5fP4ZRl;HoQ(;tqf^SB?XF%j^cSD_sPGrNUsgAhZ@xkXyMc?4OKpkQk#CLexqk-dQ`VQj|nN_WgnP=C+>)$IMyk724|ONTeiW9CFf^iV+Sl7O(uO$uYnY*3zfE8XkfITTl2-37ZKBzc%N@-M(c1f5P||?M_R&7A~ZacFTO69Zl4BC38;}E)J7X?$C1Y^qeNZtn~><VC4QT9Fcvn3=xJB7;#G`GZd}g(x2U2Qd9y&L#otD9TL1O=EZe+D7GPxw+o~(aU?Bt7h6+!ER|XXy%52FJb>M0Z68>%f2^|x0&3OwI+$dU~LQ6<P^!4;}YAjaZQw+&aI$8IhwA|!N%e=}`?VBJk748|yE(T0^s3Zq2EQ{N@maAu?Bo}J^z+tAh^!ni~fv>a=*Bvi6dB*b3pE#oq*}MH7SRJ+UXNwhW1BFjLSTR6sAwX!T*6UCb3+LVrf^ayPaYX11KlCn3VllidNK1{{3c;tng7sVi_&;p&NR9zol+2T|n4|^X-<~R}RiK}H3CE$Nimgpai--Wu?XB!a2Y8-YGSm^q_1GMS;n*hyVf-0%rUW0nw=tJ`PK%1eLGce*2csX%zK~xc2{EZchkYC3tsRdg?9FUZ<%rVexw};On(gZq`X!)9q02w%N`7G7fAQy1$mVuBNE2EBqIn=#RjecF6pVM+Q&+{Tb-)RRCDU?h!OjuX)&80)kO*tSL>%7S+`$21j{cq*_kFmwB_21d4$in?DM4P+4Vf0pEPxhL6m3=}${~Ta)_ZLl#?t1B0x}ZIuI9jIE{^4Hz}V3-^Y|ykPAN=>W=r7p21n^~=m0{byRX{D(3Zfyj%yTDkMfNcou!L;(*{1xBu!E%sN(QMRpv$lvD*&_<K3GC#<H-;l`0{Gu47W-z+dLr2!JoE9z9%o${Ic%o$Jp}uVp3AGwG3IHeW-v_AOe)??g_s4~gLcJAlsCHs*`9rn)@`BE!#whx@mN5sSr;wNZ%GAQrfSYfG0DXp->)iV}1llkj>k#>`Eg%p;XN0`XXq(;|$%ki6XTxXL%vHFFIXCH|r{TlE%7r5Myv7;$`(s?!_bI*0J8Ktr7149Y)}I9BeCVz&PmXE>+W5D|prm}QHmg$vz$JsJgnm8%>aCF_Qv^gG~g?CRFTcb+Yy`f5J5Vx>sx4VsnEPrev+l<tKPi#Ym&ob3bKmHZpWqOEhrRoLuX>Te#G=84vvqUWZg{aXh>J$oXdjK)*T3t0ZU&gc0Z&NcWc%4f@<PpJp)!YHg-&E+W|fg}zcPM^vL<ik!v-m_5VrL>87`)F6weXiXU>%WIJscSVZ*ZP`gD+>MA@U6Dx_FTpDT^vRd&s@1sCm_Jo%ri8UrNloBT@!I8;s5DP!mck9IB08snRg6<AYry3ud^28nqmqQgaN$I{az`~^ls(nE5yDgr5e_J%5Xb+Xj`$#D=-=T#z4{>$C>b2uLk^@3CO=2mT@CB_117l32;^tCLPh~?aF;&5|X<#KVVMg*SGRJX4sT%SFbYdq%Z##toNk+rlRkX(Gw2~RRy^?@j;euEgn~mbZYO~#HlxEiAX9N`(W+nS4_0fIFxueYJm93jhG3o-ES$uf++tLzS+<^e85ucXKG6Ut4WUATf{elsmo=WNK$O>i|#vV!#vbS7A*b4t*XrTB!78<K9DE6cevWM@)wODt9)jE%ja5`(dhuls6QJ2b;Yl(vNR>VF%8WCF|8YQGzG<lmd@wW{U|+k0Hm@l)7>}~(xeVV?7j9!>hSd&RH{RnMiYHYpxedBuya}W#GC;repu~qC$v%mruWWZB}D(oKOM|NpHQwf-44_aEiWy2Vo=`5;Qiseu>vq7t=<ai6%qX!qm_KWzi}ofD>k`q^w!PEe#DHDK=WXg(1wpokp2@r3?)l}O`j%&v>Z*}ecUfa{RXo+^`N=r8n)&c4*yLNBx@csVCucGbs+ewNgOx5Eo+3<?JCQk&l*bY6v8{Dz(QppQ}>gt2S4Lji_YWBbl$*m*P2RpAD+VmN-G{SN5ZibfnJ~jCne&8`@vfU=}^0txAG6}Nrz6QGYd4{%Xs~QxX*-<M)}>hXQ2%CzAkE!<1fo{*oaHS@~SI|Bk#|9OcfT7oYWG3`}(ibBbugZ(T9qkn;@prfWmXMMcSH@V!KKERyLk!OcfeHfraIpQfe_QU<&g$ErNh=aF#0(aGu+Ntuh8Iu<v?0HD-Oq?gRQd?52N}l2BllD?aFx_Sqfok4GON#WJWKxF|xEY#Xm?ElI7GfyFb~q&jsy)d1)>J#eI}gtex5S!PkkC-R?kzC~bAtWWroMCpnk7)7R7rEoG%*c?ALM_xs>^A=8;e5n&zAKalORN)jcq|If}`FD%zxamZLLfDuVCW(d4np`a`7ET&ZaP&k#NM$LhRFQKdNDGTpv*bL8WR{BQ&G|E$L;2>)T-^c?`>?Tu4OdnU(=qlP;zx}<yN_vs!Jm0ok%Vo+r!-bvGeCrhDd3r19?dAO8bl`Ff=I4*nZ;fFEp^WQUT%5KHr&o~lIQBdg>m_3oNc;0FptW@(VvO$tvgd6U0dCs=Aa>)<ob)U-`j<={G@YN&Jw0b1<X&Rm{0f!9LH2oz%a#4dQ(}G;l4TsPeUd3tgaV|vjX2O9n6xBGv6^3Cv^yq5!*|WN;3VM2{1?KLllBpL*@e}7ov`Ligq&wG@ksevFdZYk`3k?FWOQi|I4XG{oWj?l{BcpHo6CL8HT4qN=2dPjI;aFZ+(IbR$cIsW#!;UJp}vIS+`;lBqfwu7kVohYlp4syUegTZ1?+H;ozzb+{tqZogyC4+QKjUQz)Xi?82Py=-aKol%w*{VCfq)&aR<^Yxqk2EnE#&NT(ut0x`dxz*C!g7XQnfi0WZYIv^@)+zq`?5t6)Xzf4nid>q8K>ATj)W=G%0mrO~%BL<F^4cTM5{in32e@hH#;yx}=!MqDu01dU`oRI0JO_7HwF5#kdr2q"),format=L.FORMAT_RAW,filters=[{"id":L.FILTER_LZMA2}]))
diff --git a/records/track_10min_16mb/2026-04-07_SP8192_ParallelResid7_Loop35_NgramTilt/train_seed0.log b/records/track_10min_16mb/2026-04-07_SP8192_ParallelResid7_Loop35_NgramTilt/train_seed0.log
new file mode 100644
index 0000000000..b20f408362
--- /dev/null
+++ b/records/track_10min_16mb/2026-04-07_SP8192_ParallelResid7_Loop35_NgramTilt/train_seed0.log
@@ -0,0 +1,294 @@
+
+*****************************************
+Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
+*****************************************
+Hyperparameters:
+  adam_eps: 1e-08
+  adam_wd: 0.02
+  asymmetric_logit: False
+  beta1: 0.9
+  beta2: 0.95
+  cautious_wd: False
+  compressor: brotli
+  data_dir: /home/dex/parameter-golf-with-cc/data
+  datasets_dir: /home/dex/parameter-golf-with-cc/data/datasets/fineweb10B_sp8192
+  distributed: True
+  ema_decay: 0.997
+  embed_bits: 8
+  embed_clip_sigmas: 20.0
+  embed_lr: 0.6
+  embed_wd: 0.085
+  embedding_dim: 512
+  enable_looping_at: 0.5
+  etlb_clip: 3.0
+  etlb_enabled: False
+  etlb_lr: 0.05
+  etlb_steps: 5
+  eval_seq_len: 2048
+  eval_stride: 64
+  gptq_calibration_batches: 64
+  gptq_reserve_seconds: 12.0
+  grad_accum_steps: 1
+  grad_clip_norm: 0.3
+  head_lr: 0.008
+  int5_layers_str: 
+  is_main_process: True
+  iterations: 20000
+  ln_scale: True
+  local_rank: 0
+  logfile: logs/causal_s0.txt
+  logit_softcap: 30.0
+  loop_end: 5
+  loop_start: 3
+  matrix_bits: 6
+  matrix_clip_sigmas: 12.85
+  matrix_lr: 0.02
+  max_wallclock_seconds: 600.0
+  min_lr: 0.0
+  mixed_quant: False
+  mlp_mult: 4.0
+  model_dim: 512
+  model_path: final_model.pt
+  muon_backend_steps: 5
+  muon_beta2: 0.95
+  muon_momentum: 0.99
+  muon_momentum_warmup_start: 0.92
+  muon_momentum_warmup_steps: 1500
+  muon_row_normalize: True
+  muon_wd: 0.085
+  n_int6_layers: 50
+  ngram_agree_bonus: 0.1
+  ngram_base_beta: 2.0
+  ngram_open_table_bits: 26
+  ngram_order_stride: 2
+  ngram_tilt_enabled: True
+  ngram_within_beta: 0.0
+  ngram_within_threshold: 0.25
+  ngram_word_beta: 0.0
+  ngram_word_threshold: 0.8
+  num_heads: 8
+  num_kv_heads: 4
+  num_layers: 11
+  num_loops: 2
+  parallel_residual_start: 7
+  qk_gain_init: 5.0
+  quantized_model_path: final_model.int6.ptz
+  rank: 0
+  rope_base: 10000.0
+  rope_dims: 16
+  rope_train_seq_len: 2048
+  run_id: causal_s0
+  scalar_lr: 0.02
+  seed: 0
+  skip_gates_enabled: True
+  sliding_window_enabled: True
+  tie_embeddings: True
+  tied_embed_init_std: 0.005
+  tied_embed_lr: 0.03
+  tokenizer_path: /home/dex/parameter-golf-with-cc/data/tokenizers/fineweb_8192_bpe.model
+  train_batch_tokens: 786432
+  train_files: /home/dex/parameter-golf-with-cc/data/datasets/fineweb10B_sp8192/fineweb_train_*.bin
+  train_log_every: 500
+  train_seq_len: 2048
+  ttt_batch_seqs: 32
+  ttt_chunk_tokens: 32768
+  ttt_enabled: True
+  ttt_epochs: 3
+  ttt_freeze_blocks: 0
+  ttt_grad_clip: 1.0
+  ttt_lr: 0.005
+  ttt_momentum: 0.9
+  use_polar_express: False
+  val_batch_tokens: 524288
+  val_files: /home/dex/parameter-golf-with-cc/data/datasets/fineweb10B_sp8192/fineweb_val_*.bin
+  val_loss_every: 4000
+  vocab_size: 8192
+  warmdown_frac: 0.667
+  warmup_steps: 20
+  world_size: 8
+  xsa_last_n: 11
+train_shards: 80
+val_tokens: 40540160
+model_params:35944536
+gptq:reserving 12s, effective=588000ms
+warmup_step: 1/20
+warmup_step: 2/20
+warmup_step: 3/20
+warmup_step: 4/20
+warmup_step: 5/20
+warmup_step: 6/20
+warmup_step: 10/20
+warmup_step: 20/20
+loop_warmup:enabled encoder:[0, 1, 2, 3, 4, 5, 3, 4] decoder:[5, 3, 4, 5, 6, 7, 8, 9, 10]
+loop_warmup_step: 1/20
+loop_warmup_step: 2/20
+loop_warmup_step: 3/20
+loop_warmup_step: 4/20
+loop_warmup_step: 5/20
+loop_warmup_step: 6/20
+loop_warmup_step: 10/20
+loop_warmup_step: 20/20
+0/20000 val_loss: 9.0072 val_bpb: 3.4870
+1/20000 train_loss: 9.0086 train_time: 0.0m tok/s: 8276069
+2/20000 train_loss: 12.3326 train_time: 0.0m tok/s: 8187232
+3/20000 train_loss: 11.0305 train_time: 0.0m tok/s: 8101172
+4/20000 train_loss: 9.5366 train_time: 0.0m tok/s: 8057830
+5/20000 train_loss: 8.4303 train_time: 0.0m tok/s: 8025631
+500/20000 train_loss: 3.3768 train_time: 0.8m tok/s: 7802451
+1000/20000 train_loss: 3.2695 train_time: 1.7m tok/s: 7813043
+1500/20000 train_loss: 3.1743 train_time: 2.5m tok/s: 7819492
+2000/20000 train_loss: 3.0675 train_time: 3.4m tok/s: 7821032
+2500/20000 train_loss: 3.1525 train_time: 4.2m tok/s: 7821847
+layer_loop:enabled step:2925 frac:0.500 encoder:[0, 1, 2, 3, 4, 5, 3, 4] decoder:[5, 3, 4, 5, 6, 7, 8, 9, 10]
+3000/20000 train_loss: 2.9562 train_time: 5.1m tok/s: 7730603
+3500/20000 train_loss: 2.9699 train_time: 6.3m tok/s: 7258473
+4000/20000 train_loss: 2.8554 train_time: 7.6m tok/s: 6940851
+4000/20000 val_loss: 2.9085 val_bpb: 1.1260
+4500/20000 train_loss: 2.8776 train_time: 8.8m tok/s: 6712381
+4911/20000 val_loss: 2.8106 val_bpb: 1.0881
+stopping_early: wallclock_cap train_time: 588043ms step: 4911/20000
+peak memory allocated: 39046 MiB reserved: 39070 MiB
+ema:applying EMA weights
+pre-quantization post-ema val_loss:2.80860160 val_bpb:1.08729721 eval_time:6259ms
+Serialized model: 135431033 bytes
+Code size: 18905 bytes
+GPTQ:collecting Hessians from calibration data...
+GPTQ:collected 67 Hessians in 12.8s
+Quantized weights:
+  gptq (int6): blocks.attn.c_k.weight, blocks.attn.c_q.weight, blocks.attn.c_v.weight, blocks.attn.proj.weight, blocks.mlp.fc.weight, blocks.mlp.proj.weight
+  gptq (int8): tok_emb.weight
+  passthrough (float16): blocks.attn.q_gain, blocks.attn_scale, blocks.mlp_scale, blocks.resid_mix, skip_gates, skip_weights
+Serialized model quantized+brotli: 15975739 bytes
+Total submission size quantized+brotli: 15994644 bytes
+quantized val_loss:2.83870359 val_bpb:1.09895062 eval_time:8551ms
+quantized_sliding_window val_loss:2.79540904 val_bpb:1.08218995 eval_time:91866ms
+ngram_tilt:precompute n_tok=40540161 hints=9560451 (23.58%) elapsed=32.6s base_beta=2.0 within_beta=0.0 agree_bonus=0.1
+ttt_sliding:start chunks=1238 chunk_tokens=32768 total_windows=633409 stride=64 ttt_lr=0.005 ttt_epochs=3 freeze_blocks=0
+ttt_sliding:params unfrozen=35944536 frozen=0
+  ttt_chunk [1/1238] bpb=1.118450 time=4.7s
+  ttt_chunk [11/1238] bpb=1.071112 time=9.2s
+  ttt_chunk [21/1238] bpb=1.107883 time=11.8s
+  ttt_chunk [31/1238] bpb=1.102565 time=14.5s
+  ttt_chunk [41/1238] bpb=1.095443 time=17.1s
+  ttt_chunk [51/1238] bpb=1.088703 time=19.7s
+  ttt_chunk [61/1238] bpb=1.080011 time=22.3s
+  ttt_chunk [71/1238] bpb=1.087220 time=25.0s
+  ttt_chunk [81/1238] bpb=1.080767 time=27.6s
+  ttt_chunk [91/1238] bpb=1.077460 time=30.3s
+  ttt_chunk [101/1238] bpb=1.077163 time=32.9s
+  ttt_chunk [111/1238] bpb=1.075292 time=35.5s
+  ttt_chunk [121/1238] bpb=1.078348 time=38.2s
+  ttt_chunk [131/1238] bpb=1.082140 time=40.8s
+  ttt_chunk [141/1238] bpb=1.082652 time=43.5s
+  ttt_chunk [151/1238] bpb=1.082359 time=46.1s
+  ttt_chunk [161/1238] bpb=1.082793 time=48.8s
+  ttt_chunk [171/1238] bpb=1.082500 time=51.4s
+  ttt_chunk [181/1238] bpb=1.081055 time=54.0s
+  ttt_chunk [191/1238] bpb=1.080854 time=56.6s
+  ttt_chunk [201/1238] bpb=1.078381 time=59.3s
+  ttt_chunk [211/1238] bpb=1.082896 time=61.9s
+  ttt_chunk [221/1238] bpb=1.083284 time=64.6s
+  ttt_chunk [231/1238] bpb=1.085096 time=67.2s
+  ttt_chunk [241/1238] bpb=1.083227 time=69.8s
+  ttt_chunk [251/1238] bpb=1.083271 time=72.5s
+  ttt_chunk [261/1238] bpb=1.084186 time=75.1s
+  ttt_chunk [271/1238] bpb=1.084606 time=77.7s
+  ttt_chunk [281/1238] bpb=1.083891 time=80.4s
+  ttt_chunk [291/1238] bpb=1.085165 time=83.0s
+  ttt_chunk [301/1238] bpb=1.085349 time=85.6s
+  ttt_chunk [311/1238] bpb=1.084198 time=88.3s
+  ttt_chunk [321/1238] bpb=1.084014 time=91.5s
+  ttt_chunk [331/1238] bpb=1.084321 time=94.2s
+  ttt_chunk [341/1238] bpb=1.083465 time=96.8s
+  ttt_chunk [351/1238] bpb=1.084215 time=99.5s
+  ttt_chunk [361/1238] bpb=1.083150 time=102.1s
+  ttt_chunk [371/1238] bpb=1.081579 time=104.7s
+  ttt_chunk [381/1238] bpb=1.081984 time=107.4s
+  ttt_chunk [391/1238] bpb=1.081679 time=110.0s
+  ttt_chunk [401/1238] bpb=1.081769 time=112.7s
+  ttt_chunk [411/1238] bpb=1.082381 time=115.3s
+  ttt_chunk [421/1238] bpb=1.081865 time=118.0s
+  ttt_chunk [431/1238] bpb=1.082078 time=120.6s
+  ttt_chunk [441/1238] bpb=1.082163 time=123.3s
+  ttt_chunk [451/1238] bpb=1.083335 time=125.9s
+  ttt_chunk [461/1238] bpb=1.081564 time=128.9s
+  ttt_chunk [471/1238] bpb=1.081583 time=131.5s
+  ttt_chunk [481/1238] bpb=1.081721 time=134.2s
+  ttt_chunk [491/1238] bpb=1.082156 time=137.4s
+  ttt_chunk [501/1238] bpb=1.081745 time=140.1s
+  ttt_chunk [511/1238] bpb=1.081401 time=142.8s
+  ttt_chunk [521/1238] bpb=1.080889 time=145.5s
+  ttt_chunk [531/1238] bpb=1.080872 time=148.1s
+  ttt_chunk [541/1238] bpb=1.080995 time=150.8s
+  ttt_chunk [551/1238] bpb=1.080554 time=153.5s
+  ttt_chunk [561/1238] bpb=1.079848 time=156.2s
+  ttt_chunk [571/1238] bpb=1.079298 time=158.9s
+  ttt_chunk [581/1238] bpb=1.079633 time=161.7s
+  ttt_chunk [591/1238] bpb=1.079855 time=164.4s
+  ttt_chunk [601/1238] bpb=1.079804 time=167.1s
+  ttt_chunk [611/1238] bpb=1.080356 time=169.8s
+  ttt_chunk [621/1238] bpb=1.081217 time=172.5s
+  ttt_chunk [631/1238] bpb=1.081287 time=175.2s
+  ttt_chunk [641/1238] bpb=1.081755 time=177.9s
+  ttt_chunk [651/1238] bpb=1.082082 time=180.6s
+  ttt_chunk [661/1238] bpb=1.081422 time=183.4s
+  ttt_chunk [671/1238] bpb=1.081173 time=186.1s
+  ttt_chunk [681/1238] bpb=1.082509 time=188.8s
+  ttt_chunk [691/1238] bpb=1.082687 time=191.5s
+  ttt_chunk [701/1238] bpb=1.082528 time=194.2s
+  ttt_chunk [711/1238] bpb=1.083202 time=196.9s
+  ttt_chunk [721/1238] bpb=1.083489 time=199.7s
+  ttt_chunk [731/1238] bpb=1.082877 time=202.4s
+  ttt_chunk [741/1238] bpb=1.082571 time=205.1s
+  ttt_chunk [751/1238] bpb=1.081678 time=207.9s
+  ttt_chunk [761/1238] bpb=1.081076 time=210.6s
+  ttt_chunk [771/1238] bpb=1.080073 time=213.3s
+  ttt_chunk [781/1238] bpb=1.080064 time=216.1s
+  ttt_chunk [791/1238] bpb=1.080398 time=218.8s
+  ttt_chunk [801/1238] bpb=1.080661 time=221.6s
+  ttt_chunk [811/1238] bpb=1.080165 time=224.3s
+  ttt_chunk [821/1238] bpb=1.078975 time=227.1s
+  ttt_chunk [831/1238] bpb=1.078651 time=229.8s
+  ttt_chunk [841/1238] bpb=1.078201 time=232.5s
+  ttt_chunk [851/1238] bpb=1.077912 time=235.2s
+  ttt_chunk [861/1238] bpb=1.077576 time=238.0s
+  ttt_chunk [871/1238] bpb=1.077471 time=240.7s
+  ttt_chunk [881/1238] bpb=1.077027 time=243.4s
+  ttt_chunk [891/1238] bpb=1.076521 time=246.2s
+  ttt_chunk [901/1238] bpb=1.076881 time=248.9s
+  ttt_chunk [911/1238] bpb=1.076589 time=251.6s
+  ttt_chunk [921/1238] bpb=1.076889 time=254.4s
+  ttt_chunk [931/1238] bpb=1.077578 time=257.1s
+  ttt_chunk [941/1238] bpb=1.077982 time=259.8s
+  ttt_chunk [951/1238] bpb=1.077935 time=262.6s
+  ttt_chunk [961/1238] bpb=1.078777 time=265.3s
+  ttt_chunk [971/1238] bpb=1.079190 time=268.0s
+  ttt_chunk [981/1238] bpb=1.079544 time=270.8s
+  ttt_chunk [991/1238] bpb=1.079326 time=273.5s
+  ttt_chunk [1001/1238] bpb=1.079370 time=276.3s
+  ttt_chunk [1011/1238] bpb=1.079710 time=279.0s
+  ttt_chunk [1021/1238] bpb=1.080418 time=281.7s
+  ttt_chunk [1031/1238] bpb=1.080891 time=284.5s
+  ttt_chunk [1041/1238] bpb=1.081345 time=287.2s
+  ttt_chunk [1051/1238] bpb=1.081278 time=290.0s
+  ttt_chunk [1061/1238] bpb=1.081277 time=292.7s
+  ttt_chunk [1071/1238] bpb=1.081428 time=295.5s
+  ttt_chunk [1081/1238] bpb=1.081311 time=298.2s
+  ttt_chunk [1091/1238] bpb=1.081508 time=301.0s
+  ttt_chunk [1101/1238] bpb=1.082049 time=303.7s
+  ttt_chunk [1111/1238] bpb=1.082344 time=306.4s
+  ttt_chunk [1121/1238] bpb=1.082519 time=309.1s
+  ttt_chunk [1131/1238] bpb=1.082159 time=311.9s
+  ttt_chunk [1141/1238] bpb=1.081808 time=314.6s
+  ttt_chunk [1151/1238] bpb=1.081850 time=317.3s
+  ttt_chunk [1161/1238] bpb=1.081957 time=320.0s
+  ttt_chunk [1171/1238] bpb=1.081727 time=322.8s
+  ttt_chunk [1181/1238] bpb=1.081268 time=325.5s
+  ttt_chunk [1191/1238] bpb=1.081418 time=328.2s
+  ttt_chunk [1201/1238] bpb=1.081458 time=331.0s
+  ttt_chunk [1211/1238] bpb=1.081148 time=333.7s
+  ttt_chunk [1221/1238] bpb=1.080683 time=336.5s
+  ttt_chunk [1231/1238] bpb=1.080324 time=339.2s
+  ttt_chunk [1238/1238] bpb=1.080324 time=343.2s
+ttt_sliding:done val_loss=2.790668 val_bpb=1.080355 elapsed=343.2s
+legal_ttt_exact val_loss:2.79066814 val_bpb:1.08035460 eval_time:343413ms
diff --git a/records/track_10min_16mb/2026-04-07_SP8192_ParallelResid7_Loop35_NgramTilt/train_seed1234.log b/records/track_10min_16mb/2026-04-07_SP8192_ParallelResid7_Loop35_NgramTilt/train_seed1234.log
new file mode 100644
index 0000000000..7911ec62fe
--- /dev/null
+++ b/records/track_10min_16mb/2026-04-07_SP8192_ParallelResid7_Loop35_NgramTilt/train_seed1234.log
@@ -0,0 +1,294 @@
+
+*****************************************
+Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
+*****************************************
+Hyperparameters:
+  adam_eps: 1e-08
+  adam_wd: 0.02
+  asymmetric_logit: False
+  beta1: 0.9
+  beta2: 0.95
+  cautious_wd: False
+  compressor: brotli
+  data_dir: /home/dex/parameter-golf-with-cc/data
+  datasets_dir: /home/dex/parameter-golf-with-cc/data/datasets/fineweb10B_sp8192
+  distributed: True
+  ema_decay: 0.997
+  embed_bits: 8
+  embed_clip_sigmas: 20.0
+  embed_lr: 0.6
+  embed_wd: 0.085
+  embedding_dim: 512
+  enable_looping_at: 0.5
+  etlb_clip: 3.0
+  etlb_enabled: False
+  etlb_lr: 0.05
+  etlb_steps: 5
+  eval_seq_len: 2048
+  eval_stride: 64
+  gptq_calibration_batches: 64
+  gptq_reserve_seconds: 12.0
+  grad_accum_steps: 1
+  grad_clip_norm: 0.3
+  head_lr: 0.008
+  int5_layers_str: 
+  is_main_process: True
+  iterations: 20000
+  ln_scale: True
+  local_rank: 0
+  logfile: logs/causal_s1234.txt
+  logit_softcap: 30.0
+  loop_end: 5
+  loop_start: 3
+  matrix_bits: 6
+  matrix_clip_sigmas: 12.85
+  matrix_lr: 0.02
+  max_wallclock_seconds: 600.0
+  min_lr: 0.0
+  mixed_quant: False
+  mlp_mult: 4.0
+  model_dim: 512
+  model_path: final_model.pt
+  muon_backend_steps: 5
+  muon_beta2: 0.95
+  muon_momentum: 0.99
+  muon_momentum_warmup_start: 0.92
+  muon_momentum_warmup_steps: 1500
+  muon_row_normalize: True
+  muon_wd: 0.085
+  n_int6_layers: 50
+  ngram_agree_bonus: 0.1
+  ngram_base_beta: 2.0
+  ngram_open_table_bits: 26
+  ngram_order_stride: 2
+  ngram_tilt_enabled: True
+  ngram_within_beta: 0.0
+  ngram_within_threshold: 0.25
+  ngram_word_beta: 0.0
+  ngram_word_threshold: 0.8
+  num_heads: 8
+  num_kv_heads: 4
+  num_layers: 11
+  num_loops: 2
+  parallel_residual_start: 7
+  qk_gain_init: 5.0
+  quantized_model_path: final_model.int6.ptz
+  rank: 0
+  rope_base: 10000.0
+  rope_dims: 16
+  rope_train_seq_len: 2048
+  run_id: causal_s1234
+  scalar_lr: 0.02
+  seed: 1234
+  skip_gates_enabled: True
+  sliding_window_enabled: True
+  tie_embeddings: True
+  tied_embed_init_std: 0.005
+  tied_embed_lr: 0.03
+  tokenizer_path: /home/dex/parameter-golf-with-cc/data/tokenizers/fineweb_8192_bpe.model
+  train_batch_tokens: 786432
+  train_files: /home/dex/parameter-golf-with-cc/data/datasets/fineweb10B_sp8192/fineweb_train_*.bin
+  train_log_every: 500
+  train_seq_len: 2048
+  ttt_batch_seqs: 32
+  ttt_chunk_tokens: 32768
+  ttt_enabled: True
+  ttt_epochs: 3
+  ttt_freeze_blocks: 0
+  ttt_grad_clip: 1.0
+  ttt_lr: 0.005
+  ttt_momentum: 0.9
+  use_polar_express: False
+  val_batch_tokens: 524288
+  val_files: /home/dex/parameter-golf-with-cc/data/datasets/fineweb10B_sp8192/fineweb_val_*.bin
+  val_loss_every: 4000
+  vocab_size: 8192
+  warmdown_frac: 0.667
+  warmup_steps: 20
+  world_size: 8
+  xsa_last_n: 11
+train_shards: 80
+val_tokens: 40540160
+model_params:35944536
+gptq:reserving 12s, effective=588000ms
+warmup_step: 1/20
+warmup_step: 2/20
+warmup_step: 3/20
+warmup_step: 4/20
+warmup_step: 5/20
+warmup_step: 6/20
+warmup_step: 10/20
+warmup_step: 20/20
+loop_warmup:enabled encoder:[0, 1, 2, 3, 4, 5, 3, 4] decoder:[5, 3, 4, 5, 6, 7, 8, 9, 10]
+loop_warmup_step: 1/20
+loop_warmup_step: 2/20
+loop_warmup_step: 3/20
+loop_warmup_step: 4/20
+loop_warmup_step: 5/20
+loop_warmup_step: 6/20
+loop_warmup_step: 10/20
+loop_warmup_step: 20/20
+0/20000 val_loss: 9.0072 val_bpb: 3.4870
+1/20000 train_loss: 9.0086 train_time: 0.0m tok/s: 8322388
+2/20000 train_loss: 12.2951 train_time: 0.0m tok/s: 8228432
+3/20000 train_loss: 10.9517 train_time: 0.0m tok/s: 8111970
+4/20000 train_loss: 9.4461 train_time: 0.0m tok/s: 8063286
+5/20000 train_loss: 8.3311 train_time: 0.0m tok/s: 8033014
+500/20000 train_loss: 3.3758 train_time: 0.8m tok/s: 7801374
+1000/20000 train_loss: 3.2728 train_time: 1.7m tok/s: 7817161
+1500/20000 train_loss: 3.1778 train_time: 2.5m tok/s: 7824348
+2000/20000 train_loss: 3.0773 train_time: 3.3m tok/s: 7826469
+2500/20000 train_loss: 3.1573 train_time: 4.2m tok/s: 7827289
+layer_loop:enabled step:2927 frac:0.500 encoder:[0, 1, 2, 3, 4, 5, 3, 4] decoder:[5, 3, 4, 5, 6, 7, 8, 9, 10]
+3000/20000 train_loss: 2.9588 train_time: 5.1m tok/s: 7738641
+3500/20000 train_loss: 2.9730 train_time: 6.3m tok/s: 7264817
+4000/20000 train_loss: 2.8538 train_time: 7.5m tok/s: 6946360
+4000/20000 val_loss: 2.9108 val_bpb: 1.1269
+4500/20000 train_loss: 2.8766 train_time: 8.8m tok/s: 6718077
+4915/20000 val_loss: 2.8129 val_bpb: 1.0890
+stopping_early: wallclock_cap train_time: 588118ms step: 4915/20000
+peak memory allocated: 39046 MiB reserved: 39070 MiB
+ema:applying EMA weights
+pre-quantization post-ema val_loss:2.81100962 val_bpb:1.08822942 eval_time:6164ms
+Serialized model: 135431033 bytes
+Code size: 18905 bytes
+GPTQ:collecting Hessians from calibration data...
+GPTQ:collected 67 Hessians in 12.8s
+Quantized weights:
+  gptq (int6): blocks.attn.c_k.weight, blocks.attn.c_q.weight, blocks.attn.c_v.weight, blocks.attn.proj.weight, blocks.mlp.fc.weight, blocks.mlp.proj.weight
+  gptq (int8): tok_emb.weight
+  passthrough (float16): blocks.attn.q_gain, blocks.attn_scale, blocks.mlp_scale, blocks.resid_mix, skip_gates, skip_weights
+Serialized model quantized+brotli: 15974626 bytes
+Total submission size quantized+brotli: 15993531 bytes
+quantized val_loss:2.84198305 val_bpb:1.10022020 eval_time:8772ms
+quantized_sliding_window val_loss:2.79842760 val_bpb:1.08335853 eval_time:91460ms
+ngram_tilt:precompute n_tok=40540161 hints=9560451 (23.58%) elapsed=32.0s base_beta=2.0 within_beta=0.0 agree_bonus=0.1
+ttt_sliding:start chunks=1238 chunk_tokens=32768 total_windows=633409 stride=64 ttt_lr=0.005 ttt_epochs=3 freeze_blocks=0
+ttt_sliding:params unfrozen=35944536 frozen=0
+  ttt_chunk [1/1238] bpb=1.119794 time=4.7s
+  ttt_chunk [11/1238] bpb=1.073018 time=9.1s
+  ttt_chunk [21/1238] bpb=1.110243 time=11.5s
+  ttt_chunk [31/1238] bpb=1.103346 time=13.9s
+  ttt_chunk [41/1238] bpb=1.095761 time=16.3s
+  ttt_chunk [51/1238] bpb=1.089048 time=18.7s
+  ttt_chunk [61/1238] bpb=1.080933 time=21.1s
+  ttt_chunk [71/1238] bpb=1.088084 time=23.6s
+  ttt_chunk [81/1238] bpb=1.081489 time=26.0s
+  ttt_chunk [91/1238] bpb=1.078091 time=28.4s
+  ttt_chunk [101/1238] bpb=1.077960 time=30.8s
+  ttt_chunk [111/1238] bpb=1.076266 time=33.3s
+  ttt_chunk [121/1238] bpb=1.079091 time=35.7s
+  ttt_chunk [131/1238] bpb=1.082865 time=38.2s
+  ttt_chunk [141/1238] bpb=1.083466 time=40.6s
+  ttt_chunk [151/1238] bpb=1.083343 time=43.0s
+  ttt_chunk [161/1238] bpb=1.083802 time=45.5s
+  ttt_chunk [171/1238] bpb=1.083685 time=47.9s
+  ttt_chunk [181/1238] bpb=1.082252 time=50.3s
+  ttt_chunk [191/1238] bpb=1.082117 time=52.8s
+  ttt_chunk [201/1238] bpb=1.079709 time=55.2s
+  ttt_chunk [211/1238] bpb=1.084124 time=57.6s
+  ttt_chunk [221/1238] bpb=1.084475 time=60.1s
+  ttt_chunk [231/1238] bpb=1.086068 time=62.5s
+  ttt_chunk [241/1238] bpb=1.084278 time=64.9s
+  ttt_chunk [251/1238] bpb=1.084241 time=67.4s
+  ttt_chunk [261/1238] bpb=1.085229 time=69.8s
+  ttt_chunk [271/1238] bpb=1.085659 time=72.3s
+  ttt_chunk [281/1238] bpb=1.085001 time=74.7s
+  ttt_chunk [291/1238] bpb=1.086209 time=77.1s
+  ttt_chunk [301/1238] bpb=1.086422 time=79.6s
+  ttt_chunk [311/1238] bpb=1.085241 time=82.0s
+  ttt_chunk [321/1238] bpb=1.085105 time=85.1s
+  ttt_chunk [331/1238] bpb=1.085381 time=87.5s
+  ttt_chunk [341/1238] bpb=1.084444 time=89.9s
+  ttt_chunk [351/1238] bpb=1.085165 time=92.4s
+  ttt_chunk [361/1238] bpb=1.084090 time=94.8s
+  ttt_chunk [371/1238] bpb=1.082537 time=97.2s
+  ttt_chunk [381/1238] bpb=1.082928 time=99.7s
+  ttt_chunk [391/1238] bpb=1.082613 time=102.3s
+  ttt_chunk [401/1238] bpb=1.082663 time=104.9s
+  ttt_chunk [411/1238] bpb=1.083208 time=107.3s
+  ttt_chunk [421/1238] bpb=1.082686 time=109.7s
+  ttt_chunk [431/1238] bpb=1.082866 time=112.2s
+  ttt_chunk [441/1238] bpb=1.082921 time=114.7s
+  ttt_chunk [451/1238] bpb=1.084119 time=117.1s
+  ttt_chunk [461/1238] bpb=1.082360 time=119.5s
+  ttt_chunk [471/1238] bpb=1.082336 time=122.0s
+  ttt_chunk [481/1238] bpb=1.082465 time=124.5s
+  ttt_chunk [491/1238] bpb=1.082904 time=127.2s
+  ttt_chunk [501/1238] bpb=1.082504 time=129.7s
+  ttt_chunk [511/1238] bpb=1.082169 time=132.1s
+  ttt_chunk [521/1238] bpb=1.081684 time=134.6s
+  ttt_chunk [531/1238] bpb=1.081653 time=137.6s
+  ttt_chunk [541/1238] bpb=1.081727 time=140.0s
+  ttt_chunk [551/1238] bpb=1.081242 time=142.5s
+  ttt_chunk [561/1238] bpb=1.080533 time=144.9s
+  ttt_chunk [571/1238] bpb=1.080013 time=147.4s
+  ttt_chunk [581/1238] bpb=1.080370 time=149.8s
+  ttt_chunk [591/1238] bpb=1.080586 time=152.3s
+  ttt_chunk [601/1238] bpb=1.080492 time=154.7s
+  ttt_chunk [611/1238] bpb=1.081085 time=157.2s
+  ttt_chunk [621/1238] bpb=1.081961 time=159.7s
+  ttt_chunk [631/1238] bpb=1.082056 time=162.1s
+  ttt_chunk [641/1238] bpb=1.082495 time=164.6s
+  ttt_chunk [651/1238] bpb=1.082851 time=167.0s
+  ttt_chunk [661/1238] bpb=1.082195 time=169.5s
+  ttt_chunk [671/1238] bpb=1.081979 time=171.9s
+  ttt_chunk [681/1238] bpb=1.083266 time=174.4s
+  ttt_chunk [691/1238] bpb=1.083468 time=176.8s
+  ttt_chunk [701/1238] bpb=1.083274 time=179.3s
+  ttt_chunk [711/1238] bpb=1.083942 time=181.7s
+  ttt_chunk [721/1238] bpb=1.084270 time=184.2s
+  ttt_chunk [731/1238] bpb=1.083613 time=186.6s
+  ttt_chunk [741/1238] bpb=1.083304 time=189.1s
+  ttt_chunk [751/1238] bpb=1.082406 time=191.6s
+  ttt_chunk [761/1238] bpb=1.081786 time=194.0s
+  ttt_chunk [771/1238] bpb=1.080763 time=196.5s
+  ttt_chunk [781/1238] bpb=1.080752 time=199.0s
+  ttt_chunk [791/1238] bpb=1.081119 time=201.4s
+  ttt_chunk [801/1238] bpb=1.081407 time=203.8s
+  ttt_chunk [811/1238] bpb=1.080912 time=206.3s
+  ttt_chunk [821/1238] bpb=1.079733 time=208.7s
+  ttt_chunk [831/1238] bpb=1.079407 time=211.2s
+  ttt_chunk [841/1238] bpb=1.078927 time=213.7s
+  ttt_chunk [851/1238] bpb=1.078648 time=216.1s
+  ttt_chunk [861/1238] bpb=1.078309 time=218.5s
+  ttt_chunk [871/1238] bpb=1.078187 time=221.0s
+  ttt_chunk [881/1238] bpb=1.077752 time=223.5s
+  ttt_chunk [891/1238] bpb=1.077229 time=225.9s
+  ttt_chunk [901/1238] bpb=1.077604 time=228.4s
+  ttt_chunk [911/1238] bpb=1.077288 time=230.8s
+  ttt_chunk [921/1238] bpb=1.077570 time=233.3s
+  ttt_chunk [931/1238] bpb=1.078256 time=235.7s
+  ttt_chunk [941/1238] bpb=1.078651 time=238.2s
+  ttt_chunk [951/1238] bpb=1.078583 time=240.6s
+  ttt_chunk [961/1238] bpb=1.079411 time=243.1s
+  ttt_chunk [971/1238] bpb=1.079813 time=245.5s
+  ttt_chunk [981/1238] bpb=1.080195 time=248.0s
+  ttt_chunk [991/1238] bpb=1.079988 time=250.4s
+  ttt_chunk [1001/1238] bpb=1.080031 time=252.9s
+  ttt_chunk [1011/1238] bpb=1.080387 time=255.3s
+  ttt_chunk [1021/1238] bpb=1.081105 time=257.8s
+  ttt_chunk [1031/1238] bpb=1.081577 time=260.2s
+  ttt_chunk [1041/1238] bpb=1.082036 time=262.7s
+  ttt_chunk [1051/1238] bpb=1.081952 time=265.1s
+  ttt_chunk [1061/1238] bpb=1.081959 time=267.6s
+  ttt_chunk [1071/1238] bpb=1.082116 time=270.0s
+  ttt_chunk [1081/1238] bpb=1.082007 time=272.5s
+  ttt_chunk [1091/1238] bpb=1.082200 time=274.9s
+  ttt_chunk [1101/1238] bpb=1.082736 time=277.4s
+  ttt_chunk [1111/1238] bpb=1.083040 time=279.8s
+  ttt_chunk [1121/1238] bpb=1.083225 time=282.3s
+  ttt_chunk [1131/1238] bpb=1.082882 time=284.7s
+  ttt_chunk [1141/1238] bpb=1.082544 time=287.2s
+  ttt_chunk [1151/1238] bpb=1.082571 time=289.6s
+  ttt_chunk [1161/1238] bpb=1.082699 time=292.1s
+  ttt_chunk [1171/1238] bpb=1.082465 time=294.5s
+  ttt_chunk [1181/1238] bpb=1.082002 time=297.0s
+  ttt_chunk [1191/1238] bpb=1.082153 time=299.5s
+  ttt_chunk [1201/1238] bpb=1.082213 time=301.9s
+  ttt_chunk [1211/1238] bpb=1.081895 time=304.3s
+  ttt_chunk [1221/1238] bpb=1.081435 time=306.8s
+  ttt_chunk [1231/1238] bpb=1.081068 time=309.2s
+  ttt_chunk [1238/1238] bpb=1.081062 time=313.0s
+ttt_sliding:done val_loss=2.793025 val_bpb=1.081267 elapsed=313.1s
+legal_ttt_exact val_loss:2.79302513 val_bpb:1.08126707 eval_time:313293ms
diff --git a/records/track_10min_16mb/2026-04-07_SP8192_ParallelResid7_Loop35_NgramTilt/train_seed1337.log b/records/track_10min_16mb/2026-04-07_SP8192_ParallelResid7_Loop35_NgramTilt/train_seed1337.log
new file mode 100644
index 0000000000..d68c328de3
--- /dev/null
+++ b/records/track_10min_16mb/2026-04-07_SP8192_ParallelResid7_Loop35_NgramTilt/train_seed1337.log
@@ -0,0 +1,294 @@
+
+*****************************************
+Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
+*****************************************
+Hyperparameters:
+  adam_eps: 1e-08
+  adam_wd: 0.02
+  asymmetric_logit: False
+  beta1: 0.9
+  beta2: 0.95
+  cautious_wd: False
+  compressor: brotli
+  data_dir: /home/dex/parameter-golf-with-cc/data
+  datasets_dir: /home/dex/parameter-golf-with-cc/data/datasets/fineweb10B_sp8192
+  distributed: True
+  ema_decay: 0.997
+  embed_bits: 8
+  embed_clip_sigmas: 20.0
+  embed_lr: 0.6
+  embed_wd: 0.085
+  embedding_dim: 512
+  enable_looping_at: 0.5
+  etlb_clip: 3.0
+  etlb_enabled: False
+  etlb_lr: 0.05
+  etlb_steps: 5
+  eval_seq_len: 2048
+  eval_stride: 64
+  gptq_calibration_batches: 64
+  gptq_reserve_seconds: 12.0
+  grad_accum_steps: 1
+  grad_clip_norm: 0.3
+  head_lr: 0.008
+  int5_layers_str: 
+  is_main_process: True
+  iterations: 20000
+  ln_scale: True
+  local_rank: 0
+  logfile: logs/causal_s1337.txt
+  logit_softcap: 30.0
+  loop_end: 5
+  loop_start: 3
+  matrix_bits: 6
+  matrix_clip_sigmas: 12.85
+  matrix_lr: 0.02
+  max_wallclock_seconds: 600.0
+  min_lr: 0.0
+  mixed_quant: False
+  mlp_mult: 4.0
+  model_dim: 512
+  model_path: final_model.pt
+  muon_backend_steps: 5
+  muon_beta2: 0.95
+  muon_momentum: 0.99
+  muon_momentum_warmup_start: 0.92
+  muon_momentum_warmup_steps: 1500
+  muon_row_normalize: True
+  muon_wd: 0.085
+  n_int6_layers: 50
+  ngram_agree_bonus: 0.1
+  ngram_base_beta: 2.0
+  ngram_open_table_bits: 26
+  ngram_order_stride: 2
+  ngram_tilt_enabled: True
+  ngram_within_beta: 0.0
+  ngram_within_threshold: 0.25
+  ngram_word_beta: 0.0
+  ngram_word_threshold: 0.8
+  num_heads: 8
+  num_kv_heads: 4
+  num_layers: 11
+  num_loops: 2
+  parallel_residual_start: 7
+  qk_gain_init: 5.0
+  quantized_model_path: final_model.int6.ptz
+  rank: 0
+  rope_base: 10000.0
+  rope_dims: 16
+  rope_train_seq_len: 2048
+  run_id: causal_s1337
+  scalar_lr: 0.02
+  seed: 1337
+  skip_gates_enabled: True
+  sliding_window_enabled: True
+  tie_embeddings: True
+  tied_embed_init_std: 0.005
+  tied_embed_lr: 0.03
+  tokenizer_path: /home/dex/parameter-golf-with-cc/data/tokenizers/fineweb_8192_bpe.model
+  train_batch_tokens: 786432
+  train_files: /home/dex/parameter-golf-with-cc/data/datasets/fineweb10B_sp8192/fineweb_train_*.bin
+  train_log_every: 500
+  train_seq_len: 2048
+  ttt_batch_seqs: 32
+  ttt_chunk_tokens: 32768
+  ttt_enabled: True
+  ttt_epochs: 3
+  ttt_freeze_blocks: 0
+  ttt_grad_clip: 1.0
+  ttt_lr: 0.005
+  ttt_momentum: 0.9
+  use_polar_express: False
+  val_batch_tokens: 524288
+  val_files: /home/dex/parameter-golf-with-cc/data/datasets/fineweb10B_sp8192/fineweb_val_*.bin
+  val_loss_every: 4000
+  vocab_size: 8192
+  warmdown_frac: 0.667
+  warmup_steps: 20
+  world_size: 8
+  xsa_last_n: 11
+train_shards: 80
+val_tokens: 40540160
+model_params:35944536
+gptq:reserving 12s, effective=588000ms
+warmup_step: 1/20
+warmup_step: 2/20
+warmup_step: 3/20
+warmup_step: 4/20
+warmup_step: 5/20
+warmup_step: 6/20
+warmup_step: 10/20
+warmup_step: 20/20
+loop_warmup:enabled encoder:[0, 1, 2, 3, 4, 5, 3, 4] decoder:[5, 3, 4, 5, 6, 7, 8, 9, 10]
+loop_warmup_step: 1/20
+loop_warmup_step: 2/20
+loop_warmup_step: 3/20
+loop_warmup_step: 4/20
+loop_warmup_step: 5/20
+loop_warmup_step: 6/20
+loop_warmup_step: 10/20
+loop_warmup_step: 20/20
+0/20000 val_loss: 9.0047 val_bpb: 3.4860
+1/20000 train_loss: 9.0060 train_time: 0.0m tok/s: 8311314
+2/20000 train_loss: 12.2777 train_time: 0.0m tok/s: 8189775
+3/20000 train_loss: 10.9105 train_time: 0.0m tok/s: 8084378
+4/20000 train_loss: 9.4144 train_time: 0.0m tok/s: 8035985
+5/20000 train_loss: 8.2870 train_time: 0.0m tok/s: 8011743
+500/20000 train_loss: 3.3703 train_time: 0.8m tok/s: 7786226
+1000/20000 train_loss: 3.2748 train_time: 1.7m tok/s: 7800834
+1500/20000 train_loss: 3.1785 train_time: 2.5m tok/s: 7807383
+2000/20000 train_loss: 3.0734 train_time: 3.4m tok/s: 7809010
+2500/20000 train_loss: 3.1553 train_time: 4.2m tok/s: 7808406
+layer_loop:enabled step:2919 frac:0.500 encoder:[0, 1, 2, 3, 4, 5, 3, 4] decoder:[5, 3, 4, 5, 6, 7, 8, 9, 10]
+3000/20000 train_loss: 2.9593 train_time: 5.1m tok/s: 7710032
+3500/20000 train_loss: 2.9714 train_time: 6.3m tok/s: 7242342
+4000/20000 train_loss: 2.8527 train_time: 7.6m tok/s: 6927365
+4000/20000 val_loss: 2.9081 val_bpb: 1.1258
+4500/20000 train_loss: 2.8743 train_time: 8.8m tok/s: 6701077
+4905/20000 val_loss: 2.8113 val_bpb: 1.0883
+stopping_early: wallclock_cap train_time: 588066ms step: 4905/20000
+peak memory allocated: 39046 MiB reserved: 39070 MiB
+ema:applying EMA weights
+pre-quantization post-ema val_loss:2.80936864 val_bpb:1.08759415 eval_time:6371ms
+Serialized model: 135431033 bytes
+Code size: 18905 bytes
+GPTQ:collecting Hessians from calibration data...
+GPTQ:collected 67 Hessians in 12.7s
+Quantized weights:
+  gptq (int6): blocks.attn.c_k.weight, blocks.attn.c_q.weight, blocks.attn.c_v.weight, blocks.attn.proj.weight, blocks.mlp.fc.weight, blocks.mlp.proj.weight
+  gptq (int8): tok_emb.weight
+  passthrough (float16): blocks.attn.q_gain, blocks.attn_scale, blocks.mlp_scale, blocks.resid_mix, skip_gates, skip_weights
+Serialized model quantized+brotli: 15969897 bytes
+Total submission size quantized+brotli: 15988802 bytes
+quantized val_loss:2.83920775 val_bpb:1.09914580 eval_time:8657ms
+quantized_sliding_window val_loss:2.79581237 val_bpb:1.08234610 eval_time:91708ms
+ngram_tilt:precompute n_tok=40540161 hints=9560451 (23.58%) elapsed=33.1s base_beta=2.0 within_beta=0.0 agree_bonus=0.1
+ttt_sliding:start chunks=1238 chunk_tokens=32768 total_windows=633409 stride=64 ttt_lr=0.005 ttt_epochs=3 freeze_blocks=0
+ttt_sliding:params unfrozen=35944536 frozen=0
+  ttt_chunk [1/1238] bpb=1.121181 time=4.7s
+  ttt_chunk [11/1238] bpb=1.070369 time=9.1s
+  ttt_chunk [21/1238] bpb=1.107805 time=11.5s
+  ttt_chunk [31/1238] bpb=1.101824 time=13.9s
+  ttt_chunk [41/1238] bpb=1.094923 time=16.3s
+  ttt_chunk [51/1238] bpb=1.088662 time=18.8s
+  ttt_chunk [61/1238] bpb=1.080264 time=21.2s
+  ttt_chunk [71/1238] bpb=1.087487 time=23.6s
+  ttt_chunk [81/1238] bpb=1.080780 time=26.1s
+  ttt_chunk [91/1238] bpb=1.077362 time=28.5s
+  ttt_chunk [101/1238] bpb=1.077110 time=30.9s
+  ttt_chunk [111/1238] bpb=1.075486 time=33.4s
+  ttt_chunk [121/1238] bpb=1.078568 time=35.8s
+  ttt_chunk [131/1238] bpb=1.082379 time=38.3s
+  ttt_chunk [141/1238] bpb=1.082983 time=40.7s
+  ttt_chunk [151/1238] bpb=1.082752 time=43.2s
+  ttt_chunk [161/1238] bpb=1.083429 time=45.6s
+  ttt_chunk [171/1238] bpb=1.083248 time=48.0s
+  ttt_chunk [181/1238] bpb=1.081708 time=50.5s
+  ttt_chunk [191/1238] bpb=1.081454 time=52.9s
+  ttt_chunk [201/1238] bpb=1.079061 time=55.3s
+  ttt_chunk [211/1238] bpb=1.083460 time=57.8s
+  ttt_chunk [221/1238] bpb=1.083774 time=60.2s
+  ttt_chunk [231/1238] bpb=1.085457 time=62.7s
+  ttt_chunk [241/1238] bpb=1.083629 time=65.1s
+  ttt_chunk [251/1238] bpb=1.083620 time=67.6s
+  ttt_chunk [261/1238] bpb=1.084653 time=70.0s
+  ttt_chunk [271/1238] bpb=1.085168 time=72.5s
+  ttt_chunk [281/1238] bpb=1.084460 time=74.9s
+  ttt_chunk [291/1238] bpb=1.085658 time=77.3s
+  ttt_chunk [301/1238] bpb=1.085864 time=79.8s
+  ttt_chunk [311/1238] bpb=1.084678 time=82.3s
+  ttt_chunk [321/1238] bpb=1.084510 time=85.3s
+  ttt_chunk [331/1238] bpb=1.084759 time=87.8s
+  ttt_chunk [341/1238] bpb=1.083910 time=90.3s
+  ttt_chunk [351/1238] bpb=1.084683 time=92.7s
+  ttt_chunk [361/1238] bpb=1.083635 time=95.1s
+  ttt_chunk [371/1238] bpb=1.082038 time=97.6s
+  ttt_chunk [381/1238] bpb=1.082418 time=100.1s
+  ttt_chunk [391/1238] bpb=1.082114 time=102.5s
+  ttt_chunk [401/1238] bpb=1.082181 time=105.0s
+  ttt_chunk [411/1238] bpb=1.082722 time=107.4s
+  ttt_chunk [421/1238] bpb=1.082224 time=109.9s
+  ttt_chunk [431/1238] bpb=1.082366 time=112.3s
+  ttt_chunk [441/1238] bpb=1.082406 time=114.8s
+  ttt_chunk [451/1238] bpb=1.083589 time=117.3s
+  ttt_chunk [461/1238] bpb=1.081817 time=119.7s
+  ttt_chunk [471/1238] bpb=1.081819 time=122.2s
+  ttt_chunk [481/1238] bpb=1.081980 time=124.6s
+  ttt_chunk [491/1238] bpb=1.082418 time=127.3s
+  ttt_chunk [501/1238] bpb=1.082062 time=129.8s
+  ttt_chunk [511/1238] bpb=1.081721 time=132.2s
+  ttt_chunk [521/1238] bpb=1.081232 time=134.7s
+  ttt_chunk [531/1238] bpb=1.081206 time=137.7s
+  ttt_chunk [541/1238] bpb=1.081275 time=140.2s
+  ttt_chunk [551/1238] bpb=1.080817 time=142.7s
+  ttt_chunk [561/1238] bpb=1.080099 time=145.1s
+  ttt_chunk [571/1238] bpb=1.079564 time=147.6s
+  ttt_chunk [581/1238] bpb=1.079905 time=150.0s
+  ttt_chunk [591/1238] bpb=1.080162 time=152.5s
+  ttt_chunk [601/1238] bpb=1.080092 time=155.0s
+  ttt_chunk [611/1238] bpb=1.080664 time=157.4s
+  ttt_chunk [621/1238] bpb=1.081486 time=159.9s
+  ttt_chunk [631/1238] bpb=1.081560 time=162.4s
+  ttt_chunk [641/1238] bpb=1.082020 time=164.8s
+  ttt_chunk [651/1238] bpb=1.082342 time=167.3s
+  ttt_chunk [661/1238] bpb=1.081673 time=169.8s
+  ttt_chunk [671/1238] bpb=1.081409 time=172.3s
+  ttt_chunk [681/1238] bpb=1.082702 time=174.7s
+  ttt_chunk [691/1238] bpb=1.082920 time=177.2s
+  ttt_chunk [701/1238] bpb=1.082711 time=179.7s
+  ttt_chunk [711/1238] bpb=1.083402 time=182.2s
+  ttt_chunk [721/1238] bpb=1.083691 time=184.7s
+  ttt_chunk [731/1238] bpb=1.083054 time=187.2s
+  ttt_chunk [741/1238] bpb=1.082730 time=189.7s
+  ttt_chunk [751/1238] bpb=1.081820 time=192.1s
+  ttt_chunk [761/1238] bpb=1.081219 time=194.6s
+  ttt_chunk [771/1238] bpb=1.080222 time=197.1s
+  ttt_chunk [781/1238] bpb=1.080214 time=199.6s
+  ttt_chunk [791/1238] bpb=1.080578 time=202.0s
+  ttt_chunk [801/1238] bpb=1.080849 time=204.5s
+  ttt_chunk [811/1238] bpb=1.080348 time=207.0s
+  ttt_chunk [821/1238] bpb=1.079160 time=209.5s
+  ttt_chunk [831/1238] bpb=1.078870 time=212.0s
+  ttt_chunk [841/1238] bpb=1.078406 time=214.5s
+  ttt_chunk [851/1238] bpb=1.078111 time=217.0s
+  ttt_chunk [861/1238] bpb=1.077777 time=219.4s
+  ttt_chunk [871/1238] bpb=1.077651 time=221.9s
+  ttt_chunk [881/1238] bpb=1.077202 time=224.4s
+  ttt_chunk [891/1238] bpb=1.076681 time=226.9s
+  ttt_chunk [901/1238] bpb=1.077054 time=229.4s
+  ttt_chunk [911/1238] bpb=1.076728 time=231.9s
+  ttt_chunk [921/1238] bpb=1.077019 time=234.4s
+  ttt_chunk [931/1238] bpb=1.077678 time=236.9s
+  ttt_chunk [941/1238] bpb=1.078057 time=239.3s
+  ttt_chunk [951/1238] bpb=1.077989 time=241.8s
+  ttt_chunk [961/1238] bpb=1.078825 time=244.3s
+  ttt_chunk [971/1238] bpb=1.079252 time=246.8s
+  ttt_chunk [981/1238] bpb=1.079633 time=249.3s
+  ttt_chunk [991/1238] bpb=1.079413 time=251.7s
+  ttt_chunk [1001/1238] bpb=1.079453 time=254.2s
+  ttt_chunk [1011/1238] bpb=1.079792 time=256.7s
+  ttt_chunk [1021/1238] bpb=1.080483 time=259.1s
+  ttt_chunk [1031/1238] bpb=1.080967 time=261.6s
+  ttt_chunk [1041/1238] bpb=1.081445 time=264.1s
+  ttt_chunk [1051/1238] bpb=1.081387 time=266.6s
+  ttt_chunk [1061/1238] bpb=1.081389 time=269.1s
+  ttt_chunk [1071/1238] bpb=1.081548 time=271.5s
+  ttt_chunk [1081/1238] bpb=1.081445 time=274.0s
+  ttt_chunk [1091/1238] bpb=1.081635 time=276.5s
+  ttt_chunk [1101/1238] bpb=1.082169 time=279.0s
+  ttt_chunk [1111/1238] bpb=1.082467 time=281.5s
+  ttt_chunk [1121/1238] bpb=1.082650 time=284.0s
+  ttt_chunk [1131/1238] bpb=1.082301 time=286.4s
+  ttt_chunk [1141/1238] bpb=1.081956 time=288.9s
+  ttt_chunk [1151/1238] bpb=1.081986 time=291.4s
+  ttt_chunk [1161/1238] bpb=1.082116 time=293.9s
+  ttt_chunk [1171/1238] bpb=1.081894 time=296.3s
+  ttt_chunk [1181/1238] bpb=1.081433 time=298.8s
+  ttt_chunk [1191/1238] bpb=1.081574 time=301.3s
+  ttt_chunk [1201/1238] bpb=1.081652 time=303.8s
+  ttt_chunk [1211/1238] bpb=1.081338 time=306.2s
+  ttt_chunk [1221/1238] bpb=1.080879 time=308.7s
+  ttt_chunk [1231/1238] bpb=1.080521 time=311.2s
+  ttt_chunk [1238/1238] bpb=1.080531 time=314.9s
+ttt_sliding:done val_loss=2.791308 val_bpb=1.080602 elapsed=315.1s
+legal_ttt_exact val_loss:2.79130763 val_bpb:1.08060217 eval_time:315303ms
diff --git a/records/track_10min_16mb/2026-04-07_SP8192_ParallelResid7_Loop35_NgramTilt/train_seed2025.log b/records/track_10min_16mb/2026-04-07_SP8192_ParallelResid7_Loop35_NgramTilt/train_seed2025.log
new file mode 100644
index 0000000000..5e1f3962d4
--- /dev/null
+++ b/records/track_10min_16mb/2026-04-07_SP8192_ParallelResid7_Loop35_NgramTilt/train_seed2025.log
@@ -0,0 +1,294 @@
+
+*****************************************
+Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
+*****************************************
+Hyperparameters:
+  adam_eps: 1e-08
+  adam_wd: 0.02
+  asymmetric_logit: False
+  beta1: 0.9
+  beta2: 0.95
+  cautious_wd: False
+  compressor: brotli
+  data_dir: /home/dex/parameter-golf-with-cc/data
+  datasets_dir: /home/dex/parameter-golf-with-cc/data/datasets/fineweb10B_sp8192
+  distributed: True
+  ema_decay: 0.997
+  embed_bits: 8
+  embed_clip_sigmas: 20.0
+  embed_lr: 0.6
+  embed_wd: 0.085
+  embedding_dim: 512
+  enable_looping_at: 0.5
+  etlb_clip: 3.0
+  etlb_enabled: False
+  etlb_lr: 0.05
+  etlb_steps: 5
+  eval_seq_len: 2048
+  eval_stride: 64
+  gptq_calibration_batches: 64
+  gptq_reserve_seconds: 12.0
+  grad_accum_steps: 1
+  grad_clip_norm: 0.3
+  head_lr: 0.008
+  int5_layers_str: 
+  is_main_process: True
+  iterations: 20000
+  ln_scale: True
+  local_rank: 0
+  logfile: logs/causal_s2025.txt
+  logit_softcap: 30.0
+  loop_end: 5
+  loop_start: 3
+  matrix_bits: 6
+  matrix_clip_sigmas: 12.85
+  matrix_lr: 0.02
+  max_wallclock_seconds: 600.0
+  min_lr: 0.0
+  mixed_quant: False
+  mlp_mult: 4.0
+  model_dim: 512
+  model_path: final_model.pt
+  muon_backend_steps: 5
+  muon_beta2: 0.95
+  muon_momentum: 0.99
+  muon_momentum_warmup_start: 0.92
+  muon_momentum_warmup_steps: 1500
+  muon_row_normalize: True
+  muon_wd: 0.085
+  n_int6_layers: 50
+  ngram_agree_bonus: 0.1
+  ngram_base_beta: 2.0
+  ngram_open_table_bits: 26
+  ngram_order_stride: 2
+  ngram_tilt_enabled: True
+  ngram_within_beta: 0.0
+  ngram_within_threshold: 0.25
+  ngram_word_beta: 0.0
+  ngram_word_threshold: 0.8
+  num_heads: 8
+  num_kv_heads: 4
+  num_layers: 11
+  num_loops: 2
+  parallel_residual_start: 7
+  qk_gain_init: 5.0
+  quantized_model_path: final_model.int6.ptz
+  rank: 0
+  rope_base: 10000.0
+  rope_dims: 16
+  rope_train_seq_len: 2048
+  run_id: causal_s2025
+  scalar_lr: 0.02
+  seed: 2025
+  skip_gates_enabled: True
+  sliding_window_enabled: True
+  tie_embeddings: True
+  tied_embed_init_std: 0.005
+  tied_embed_lr: 0.03
+  tokenizer_path: /home/dex/parameter-golf-with-cc/data/tokenizers/fineweb_8192_bpe.model
+  train_batch_tokens: 786432
+  train_files: /home/dex/parameter-golf-with-cc/data/datasets/fineweb10B_sp8192/fineweb_train_*.bin
+  train_log_every: 500
+  train_seq_len: 2048
+  ttt_batch_seqs: 32
+  ttt_chunk_tokens: 32768
+  ttt_enabled: True
+  ttt_epochs: 3
+  ttt_freeze_blocks: 0
+  ttt_grad_clip: 1.0
+  ttt_lr: 0.005
+  ttt_momentum: 0.9
+  use_polar_express: False
+  val_batch_tokens: 524288
+  val_files: /home/dex/parameter-golf-with-cc/data/datasets/fineweb10B_sp8192/fineweb_val_*.bin
+  val_loss_every: 4000
+  vocab_size: 8192
+  warmdown_frac: 0.667
+  warmup_steps: 20
+  world_size: 8
+  xsa_last_n: 11
+train_shards: 80
+val_tokens: 40540160
+model_params:35944536
+gptq:reserving 12s, effective=588000ms
+warmup_step: 1/20
+warmup_step: 2/20
+warmup_step: 3/20
+warmup_step: 4/20
+warmup_step: 5/20
+warmup_step: 6/20
+warmup_step: 10/20
+warmup_step: 20/20
+loop_warmup:enabled encoder:[0, 1, 2, 3, 4, 5, 3, 4] decoder:[5, 3, 4, 5, 6, 7, 8, 9, 10]
+loop_warmup_step: 1/20
+loop_warmup_step: 2/20
+loop_warmup_step: 3/20
+loop_warmup_step: 4/20
+loop_warmup_step: 5/20
+loop_warmup_step: 6/20
+loop_warmup_step: 10/20
+loop_warmup_step: 20/20
+0/20000 val_loss: 9.0067 val_bpb: 3.4868
+1/20000 train_loss: 9.0077 train_time: 0.0m tok/s: 8304303
+2/20000 train_loss: 12.3348 train_time: 0.0m tok/s: 8176541
+3/20000 train_loss: 10.9957 train_time: 0.0m tok/s: 8085362
+4/20000 train_loss: 9.4460 train_time: 0.0m tok/s: 8038847
+5/20000 train_loss: 8.3305 train_time: 0.0m tok/s: 8012145
+500/20000 train_loss: 3.3793 train_time: 0.8m tok/s: 7797032
+1000/20000 train_loss: 3.2798 train_time: 1.7m tok/s: 7814976
+1500/20000 train_loss: 3.1779 train_time: 2.5m tok/s: 7822004
+2000/20000 train_loss: 3.0774 train_time: 3.4m tok/s: 7823367
+2500/20000 train_loss: 3.1573 train_time: 4.2m tok/s: 7823907
+layer_loop:enabled step:2925 frac:0.500 encoder:[0, 1, 2, 3, 4, 5, 3, 4] decoder:[5, 3, 4, 5, 6, 7, 8, 9, 10]
+3000/20000 train_loss: 2.9591 train_time: 5.1m tok/s: 7732334
+3500/20000 train_loss: 2.9763 train_time: 6.3m tok/s: 7259617
+4000/20000 train_loss: 2.8573 train_time: 7.6m tok/s: 6941552
+4000/20000 val_loss: 2.9110 val_bpb: 1.1270
+4500/20000 train_loss: 2.8798 train_time: 8.8m tok/s: 6712291
+4911/20000 val_loss: 2.8132 val_bpb: 1.0891
+stopping_early: wallclock_cap train_time: 588115ms step: 4911/20000
+peak memory allocated: 39046 MiB reserved: 39070 MiB
+ema:applying EMA weights
+pre-quantization post-ema val_loss:2.81126615 val_bpb:1.08832873 eval_time:6064ms
+Serialized model: 135431033 bytes
+Code size: 18905 bytes
+GPTQ:collecting Hessians from calibration data...
+GPTQ:collected 67 Hessians in 12.8s
+Quantized weights:
+  gptq (int6): blocks.attn.c_k.weight, blocks.attn.c_q.weight, blocks.attn.c_v.weight, blocks.attn.proj.weight, blocks.mlp.fc.weight, blocks.mlp.proj.weight
+  gptq (int8): tok_emb.weight
+  passthrough (float16): blocks.attn.q_gain, blocks.attn_scale, blocks.mlp_scale, blocks.resid_mix, skip_gates, skip_weights
+Serialized model quantized+brotli: 15974455 bytes
+Total submission size quantized+brotli: 15993360 bytes
+quantized val_loss:2.84087513 val_bpb:1.09979129 eval_time:8533ms
+quantized_sliding_window val_loss:2.79754986 val_bpb:1.08301873 eval_time:91761ms
+ngram_tilt:precompute n_tok=40540161 hints=9560451 (23.58%) elapsed=33.1s base_beta=2.0 within_beta=0.0 agree_bonus=0.1
+ttt_sliding:start chunks=1238 chunk_tokens=32768 total_windows=633409 stride=64 ttt_lr=0.005 ttt_epochs=3 freeze_blocks=0
+ttt_sliding:params unfrozen=35944536 frozen=0
+  ttt_chunk [1/1238] bpb=1.114298 time=4.5s
+  ttt_chunk [11/1238] bpb=1.071220 time=8.8s
+  ttt_chunk [21/1238] bpb=1.108963 time=11.3s
+  ttt_chunk [31/1238] bpb=1.103429 time=13.8s
+  ttt_chunk [41/1238] bpb=1.096517 time=16.3s
+  ttt_chunk [51/1238] bpb=1.090059 time=18.9s
+  ttt_chunk [61/1238] bpb=1.081648 time=21.4s
+  ttt_chunk [71/1238] bpb=1.088983 time=23.9s
+  ttt_chunk [81/1238] bpb=1.082352 time=26.4s
+  ttt_chunk [91/1238] bpb=1.078596 time=28.9s
+  ttt_chunk [101/1238] bpb=1.078228 time=31.4s
+  ttt_chunk [111/1238] bpb=1.076486 time=33.9s
+  ttt_chunk [121/1238] bpb=1.079511 time=36.4s
+  ttt_chunk [131/1238] bpb=1.083101 time=39.0s
+  ttt_chunk [141/1238] bpb=1.083647 time=41.5s
+  ttt_chunk [151/1238] bpb=1.083355 time=44.0s
+  ttt_chunk [161/1238] bpb=1.083939 time=46.5s
+  ttt_chunk [171/1238] bpb=1.083739 time=49.1s
+  ttt_chunk [181/1238] bpb=1.082219 time=51.6s
+  ttt_chunk [191/1238] bpb=1.082030 time=54.1s
+  ttt_chunk [201/1238] bpb=1.079591 time=56.6s
+  ttt_chunk [211/1238] bpb=1.083933 time=59.1s
+  ttt_chunk [221/1238] bpb=1.084234 time=61.6s
+  ttt_chunk [231/1238] bpb=1.085975 time=64.2s
+  ttt_chunk [241/1238] bpb=1.084164 time=66.7s
+  ttt_chunk [251/1238] bpb=1.084078 time=69.2s
+  ttt_chunk [261/1238] bpb=1.085152 time=71.7s
+  ttt_chunk [271/1238] bpb=1.085662 time=74.2s
+  ttt_chunk [281/1238] bpb=1.084982 time=76.8s
+  ttt_chunk [291/1238] bpb=1.086158 time=79.3s
+  ttt_chunk [301/1238] bpb=1.086342 time=81.8s
+  ttt_chunk [311/1238] bpb=1.085262 time=84.3s
+  ttt_chunk [321/1238] bpb=1.085100 time=87.4s
+  ttt_chunk [331/1238] bpb=1.085386 time=89.9s
+  ttt_chunk [341/1238] bpb=1.084519 time=92.4s
+  ttt_chunk [351/1238] bpb=1.085262 time=94.9s
+  ttt_chunk [361/1238] bpb=1.084184 time=97.5s
+  ttt_chunk [371/1238] bpb=1.082672 time=100.0s
+  ttt_chunk [381/1238] bpb=1.083059 time=102.5s
+  ttt_chunk [391/1238] bpb=1.082704 time=105.0s
+  ttt_chunk [401/1238] bpb=1.082762 time=107.5s
+  ttt_chunk [411/1238] bpb=1.083304 time=110.0s
+  ttt_chunk [421/1238] bpb=1.082845 time=112.6s
+  ttt_chunk [431/1238] bpb=1.083012 time=115.1s
+  ttt_chunk [441/1238] bpb=1.083024 time=117.6s
+  ttt_chunk [451/1238] bpb=1.084249 time=120.1s
+  ttt_chunk [461/1238] bpb=1.082472 time=122.9s
+  ttt_chunk [471/1238] bpb=1.082437 time=125.4s
+  ttt_chunk [481/1238] bpb=1.082563 time=128.0s
+  ttt_chunk [491/1238] bpb=1.083053 time=131.1s
+  ttt_chunk [501/1238] bpb=1.082665 time=133.6s
+  ttt_chunk [511/1238] bpb=1.082301 time=136.2s
+  ttt_chunk [521/1238] bpb=1.081831 time=138.7s
+  ttt_chunk [531/1238] bpb=1.081849 time=141.2s
+  ttt_chunk [541/1238] bpb=1.081937 time=143.8s
+  ttt_chunk [551/1238] bpb=1.081455 time=146.3s
+  ttt_chunk [561/1238] bpb=1.080693 time=148.8s
+  ttt_chunk [571/1238] bpb=1.080162 time=151.3s
+  ttt_chunk [581/1238] bpb=1.080510 time=153.9s
+  ttt_chunk [591/1238] bpb=1.080709 time=156.4s
+  ttt_chunk [601/1238] bpb=1.080659 time=158.9s
+  ttt_chunk [611/1238] bpb=1.081213 time=161.5s
+  ttt_chunk [621/1238] bpb=1.082073 time=164.0s
+  ttt_chunk [631/1238] bpb=1.082160 time=166.5s
+  ttt_chunk [641/1238] bpb=1.082617 time=169.0s
+  ttt_chunk [651/1238] bpb=1.082961 time=171.6s
+  ttt_chunk [661/1238] bpb=1.082313 time=174.1s
+  ttt_chunk [671/1238] bpb=1.082078 time=176.6s
+  ttt_chunk [681/1238] bpb=1.083367 time=179.2s
+  ttt_chunk [691/1238] bpb=1.083569 time=181.7s
+  ttt_chunk [701/1238] bpb=1.083400 time=184.2s
+  ttt_chunk [711/1238] bpb=1.084097 time=186.8s
+  ttt_chunk [721/1238] bpb=1.084426 time=189.3s
+  ttt_chunk [731/1238] bpb=1.083760 time=191.8s
+  ttt_chunk [741/1238] bpb=1.083447 time=194.4s
+  ttt_chunk [751/1238] bpb=1.082562 time=196.9s
+  ttt_chunk [761/1238] bpb=1.081936 time=199.5s
+  ttt_chunk [771/1238] bpb=1.080936 time=202.0s
+  ttt_chunk [781/1238] bpb=1.080902 time=204.5s
+  ttt_chunk [791/1238] bpb=1.081274 time=207.1s
+  ttt_chunk [801/1238] bpb=1.081553 time=209.6s
+  ttt_chunk [811/1238] bpb=1.081061 time=212.1s
+  ttt_chunk [821/1238] bpb=1.079866 time=214.7s
+  ttt_chunk [831/1238] bpb=1.079560 time=217.2s
+  ttt_chunk [841/1238] bpb=1.079087 time=219.8s
+  ttt_chunk [851/1238] bpb=1.078816 time=222.3s
+  ttt_chunk [861/1238] bpb=1.078459 time=224.8s
+  ttt_chunk [871/1238] bpb=1.078351 time=227.4s
+  ttt_chunk [881/1238] bpb=1.077914 time=229.9s
+  ttt_chunk [891/1238] bpb=1.077411 time=232.4s
+  ttt_chunk [901/1238] bpb=1.077774 time=235.0s
+  ttt_chunk [911/1238] bpb=1.077473 time=237.5s
+  ttt_chunk [921/1238] bpb=1.077764 time=240.0s
+  ttt_chunk [931/1238] bpb=1.078436 time=242.5s
+  ttt_chunk [941/1238] bpb=1.078841 time=245.1s
+  ttt_chunk [951/1238] bpb=1.078758 time=247.6s
+  ttt_chunk [961/1238] bpb=1.079592 time=250.1s
+  ttt_chunk [971/1238] bpb=1.079997 time=252.6s
+  ttt_chunk [981/1238] bpb=1.080348 time=255.2s
+  ttt_chunk [991/1238] bpb=1.080128 time=257.7s
+  ttt_chunk [1001/1238] bpb=1.080174 time=260.3s
+  ttt_chunk [1011/1238] bpb=1.080525 time=262.8s
+  ttt_chunk [1021/1238] bpb=1.081238 time=265.3s
+  ttt_chunk [1031/1238] bpb=1.081703 time=267.9s
+  ttt_chunk [1041/1238] bpb=1.082171 time=270.4s
+  ttt_chunk [1051/1238] bpb=1.082108 time=272.9s
+  ttt_chunk [1061/1238] bpb=1.082098 time=275.5s
+  ttt_chunk [1071/1238] bpb=1.082244 time=278.0s
+  ttt_chunk [1081/1238] bpb=1.082143 time=280.6s
+  ttt_chunk [1091/1238] bpb=1.082335 time=283.1s
+  ttt_chunk [1101/1238] bpb=1.082871 time=285.7s
+  ttt_chunk [1111/1238] bpb=1.083161 time=288.2s
+  ttt_chunk [1121/1238] bpb=1.083340 time=290.7s
+  ttt_chunk [1131/1238] bpb=1.083003 time=293.2s
+  ttt_chunk [1141/1238] bpb=1.082671 time=295.8s
+  ttt_chunk [1151/1238] bpb=1.082714 time=298.3s
+  ttt_chunk [1161/1238] bpb=1.082832 time=300.8s
+  ttt_chunk [1171/1238] bpb=1.082607 time=303.4s
+  ttt_chunk [1181/1238] bpb=1.082135 time=305.9s
+  ttt_chunk [1191/1238] bpb=1.082285 time=308.4s
+  ttt_chunk [1201/1238] bpb=1.082376 time=310.9s
+  ttt_chunk [1211/1238] bpb=1.082064 time=313.5s
+  ttt_chunk [1221/1238] bpb=1.081603 time=316.0s
+  ttt_chunk [1231/1238] bpb=1.081236 time=318.5s
+  ttt_chunk [1238/1238] bpb=1.081247 time=322.3s
+ttt_sliding:done val_loss=2.793240 val_bpb=1.081350 elapsed=322.3s
+legal_ttt_exact val_loss:2.79323990 val_bpb:1.08135021 eval_time:322490ms
diff --git a/records/track_10min_16mb/2026-04-07_SP8192_ParallelResid7_Loop35_NgramTilt/train_seed42.log b/records/track_10min_16mb/2026-04-07_SP8192_ParallelResid7_Loop35_NgramTilt/train_seed42.log
new file mode 100644
index 0000000000..fb762419fe
--- /dev/null
+++ b/records/track_10min_16mb/2026-04-07_SP8192_ParallelResid7_Loop35_NgramTilt/train_seed42.log
@@ -0,0 +1,294 @@
+
+*****************************************
+Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
+*****************************************
+Hyperparameters:
+  adam_eps: 1e-08
+  adam_wd: 0.02
+  asymmetric_logit: False
+  beta1: 0.9
+  beta2: 0.95
+  cautious_wd: False
+  compressor: brotli
+  data_dir: /home/dex/parameter-golf-with-cc/data
+  datasets_dir: /home/dex/parameter-golf-with-cc/data/datasets/fineweb10B_sp8192
+  distributed: True
+  ema_decay: 0.997
+  embed_bits: 8
+  embed_clip_sigmas: 20.0
+  embed_lr: 0.6
+  embed_wd: 0.085
+  embedding_dim: 512
+  enable_looping_at: 0.5
+  etlb_clip: 3.0
+  etlb_enabled: False
+  etlb_lr: 0.05
+  etlb_steps: 5
+  eval_seq_len: 2048
+  eval_stride: 64
+  gptq_calibration_batches: 64
+  gptq_reserve_seconds: 12.0
+  grad_accum_steps: 1
+  grad_clip_norm: 0.3
+  head_lr: 0.008
+  int5_layers_str: 
+  is_main_process: True
+  iterations: 20000
+  ln_scale: True
+  local_rank: 0
+  logfile: logs/causal_s42.txt
+  logit_softcap: 30.0
+  loop_end: 5
+  loop_start: 3
+  matrix_bits: 6
+  matrix_clip_sigmas: 12.85
+  matrix_lr: 0.02
+  max_wallclock_seconds: 600.0
+  min_lr: 0.0
+  mixed_quant: False
+  mlp_mult: 4.0
+  model_dim: 512
+  model_path: final_model.pt
+  muon_backend_steps: 5
+  muon_beta2: 0.95
+  muon_momentum: 0.99
+  muon_momentum_warmup_start: 0.92
+  muon_momentum_warmup_steps: 1500
+  muon_row_normalize: True
+  muon_wd: 0.085
+  n_int6_layers: 50
+  ngram_agree_bonus: 0.1
+  ngram_base_beta: 2.0
+  ngram_open_table_bits: 26
+  ngram_order_stride: 2
+  ngram_tilt_enabled: True
+  ngram_within_beta: 0.0
+  ngram_within_threshold: 0.25
+  ngram_word_beta: 0.0
+  ngram_word_threshold: 0.8
+  num_heads: 8
+  num_kv_heads: 4
+  num_layers: 11
+  num_loops: 2
+  parallel_residual_start: 7
+  qk_gain_init: 5.0
+  quantized_model_path: final_model.int6.ptz
+  rank: 0
+  rope_base: 10000.0
+  rope_dims: 16
+  rope_train_seq_len: 2048
+  run_id: causal_s42
+  scalar_lr: 0.02
+  seed: 42
+  skip_gates_enabled: True
+  sliding_window_enabled: True
+  tie_embeddings: True
+  tied_embed_init_std: 0.005
+  tied_embed_lr: 0.03
+  tokenizer_path: /home/dex/parameter-golf-with-cc/data/tokenizers/fineweb_8192_bpe.model
+  train_batch_tokens: 786432
+  train_files: /home/dex/parameter-golf-with-cc/data/datasets/fineweb10B_sp8192/fineweb_train_*.bin
+  train_log_every: 500
+  train_seq_len: 2048
+  ttt_batch_seqs: 32
+  ttt_chunk_tokens: 32768
+  ttt_enabled: True
+  ttt_epochs: 3
+  ttt_freeze_blocks: 0
+  ttt_grad_clip: 1.0
+  ttt_lr: 0.005
+  ttt_momentum: 0.9
+  use_polar_express: False
+  val_batch_tokens: 524288
+  val_files: /home/dex/parameter-golf-with-cc/data/datasets/fineweb10B_sp8192/fineweb_val_*.bin
+  val_loss_every: 4000
+  vocab_size: 8192
+  warmdown_frac: 0.667
+  warmup_steps: 20
+  world_size: 8
+  xsa_last_n: 11
+train_shards: 80
+val_tokens: 40540160
+model_params:35944536
+gptq:reserving 12s, effective=588000ms
+warmup_step: 1/20
+warmup_step: 2/20
+warmup_step: 3/20
+warmup_step: 4/20
+warmup_step: 5/20
+warmup_step: 6/20
+warmup_step: 10/20
+warmup_step: 20/20
+loop_warmup:enabled encoder:[0, 1, 2, 3, 4, 5, 3, 4] decoder:[5, 3, 4, 5, 6, 7, 8, 9, 10]
+loop_warmup_step: 1/20
+loop_warmup_step: 2/20
+loop_warmup_step: 3/20
+loop_warmup_step: 4/20
+loop_warmup_step: 5/20
+loop_warmup_step: 6/20
+loop_warmup_step: 10/20
+loop_warmup_step: 20/20
+0/20000 val_loss: 9.0090 val_bpb: 3.4877
+1/20000 train_loss: 9.0104 train_time: 0.0m tok/s: 8332986
+2/20000 train_loss: 12.3691 train_time: 0.0m tok/s: 8179799
+3/20000 train_loss: 11.0384 train_time: 0.0m tok/s: 8094194
+4/20000 train_loss: 9.4930 train_time: 0.0m tok/s: 8051170
+5/20000 train_loss: 8.3620 train_time: 0.0m tok/s: 8019424
+500/20000 train_loss: 3.3728 train_time: 0.8m tok/s: 7794692
+1000/20000 train_loss: 3.2738 train_time: 1.7m tok/s: 7803884
+1500/20000 train_loss: 3.1796 train_time: 2.5m tok/s: 7810469
+2000/20000 train_loss: 3.0687 train_time: 3.4m tok/s: 7810197
+2500/20000 train_loss: 3.1530 train_time: 4.2m tok/s: 7809686
+layer_loop:enabled step:2920 frac:0.500 encoder:[0, 1, 2, 3, 4, 5, 3, 4] decoder:[5, 3, 4, 5, 6, 7, 8, 9, 10]
+3000/20000 train_loss: 2.9542 train_time: 5.1m tok/s: 7712844
+3500/20000 train_loss: 2.9687 train_time: 6.3m tok/s: 7244294
+4000/20000 train_loss: 2.8537 train_time: 7.6m tok/s: 6928887
+4000/20000 val_loss: 2.9091 val_bpb: 1.1262
+4500/20000 train_loss: 2.8798 train_time: 8.8m tok/s: 6702344
+4906/20000 val_loss: 2.8120 val_bpb: 1.0886
+stopping_early: wallclock_cap train_time: 588108ms step: 4906/20000
+peak memory allocated: 39046 MiB reserved: 39070 MiB
+ema:applying EMA weights
+pre-quantization post-ema val_loss:2.81022303 val_bpb:1.08792491 eval_time:6136ms
+Serialized model: 135431033 bytes
+Code size: 18905 bytes
+GPTQ:collecting Hessians from calibration data...
+GPTQ:collected 67 Hessians in 12.8s
+Quantized weights:
+  gptq (int6): blocks.attn.c_k.weight, blocks.attn.c_q.weight, blocks.attn.c_v.weight, blocks.attn.proj.weight, blocks.mlp.fc.weight, blocks.mlp.proj.weight
+  gptq (int8): tok_emb.weight
+  passthrough (float16): blocks.attn.q_gain, blocks.attn_scale, blocks.mlp_scale, blocks.resid_mix, skip_gates, skip_weights
+Serialized model quantized+brotli: 15976667 bytes
+Total submission size quantized+brotli: 15995572 bytes
+quantized val_loss:2.83982165 val_bpb:1.09938346 eval_time:8492ms
+quantized_sliding_window val_loss:2.79678327 val_bpb:1.08272196 eval_time:92032ms
+ngram_tilt:precompute n_tok=40540161 hints=9560451 (23.58%) elapsed=32.6s base_beta=2.0 within_beta=0.0 agree_bonus=0.1
+ttt_sliding:start chunks=1238 chunk_tokens=32768 total_windows=633409 stride=64 ttt_lr=0.005 ttt_epochs=3 freeze_blocks=0
+ttt_sliding:params unfrozen=35944536 frozen=0
+  ttt_chunk [1/1238] bpb=1.114710 time=4.7s
+  ttt_chunk [11/1238] bpb=1.070109 time=9.2s
+  ttt_chunk [21/1238] bpb=1.107351 time=11.8s
+  ttt_chunk [31/1238] bpb=1.101678 time=14.5s
+  ttt_chunk [41/1238] bpb=1.094883 time=17.2s
+  ttt_chunk [51/1238] bpb=1.088358 time=19.9s
+  ttt_chunk [61/1238] bpb=1.079956 time=22.5s
+  ttt_chunk [71/1238] bpb=1.087059 time=25.2s
+  ttt_chunk [81/1238] bpb=1.080453 time=27.9s
+  ttt_chunk [91/1238] bpb=1.077049 time=30.5s
+  ttt_chunk [101/1238] bpb=1.076944 time=33.2s
+  ttt_chunk [111/1238] bpb=1.075273 time=35.9s
+  ttt_chunk [121/1238] bpb=1.078335 time=38.6s
+  ttt_chunk [131/1238] bpb=1.082112 time=41.3s
+  ttt_chunk [141/1238] bpb=1.082712 time=44.0s
+  ttt_chunk [151/1238] bpb=1.082532 time=46.7s
+  ttt_chunk [161/1238] bpb=1.083009 time=49.4s
+  ttt_chunk [171/1238] bpb=1.082947 time=52.0s
+  ttt_chunk [181/1238] bpb=1.081518 time=54.7s
+  ttt_chunk [191/1238] bpb=1.081305 time=57.4s
+  ttt_chunk [201/1238] bpb=1.078884 time=60.1s
+  ttt_chunk [211/1238] bpb=1.083367 time=62.8s
+  ttt_chunk [221/1238] bpb=1.083641 time=65.4s
+  ttt_chunk [231/1238] bpb=1.085334 time=68.1s
+  ttt_chunk [241/1238] bpb=1.083582 time=70.8s
+  ttt_chunk [251/1238] bpb=1.083617 time=73.5s
+  ttt_chunk [261/1238] bpb=1.084597 time=76.2s
+  ttt_chunk [271/1238] bpb=1.085076 time=78.8s
+  ttt_chunk [281/1238] bpb=1.084293 time=81.5s
+  ttt_chunk [291/1238] bpb=1.085463 time=84.2s
+  ttt_chunk [301/1238] bpb=1.085639 time=86.8s
+  ttt_chunk [311/1238] bpb=1.084501 time=89.5s
+  ttt_chunk [321/1238] bpb=1.084323 time=92.8s
+  ttt_chunk [331/1238] bpb=1.084590 time=95.5s
+  ttt_chunk [341/1238] bpb=1.083803 time=98.2s
+  ttt_chunk [351/1238] bpb=1.084569 time=100.9s
+  ttt_chunk [361/1238] bpb=1.083486 time=103.6s
+  ttt_chunk [371/1238] bpb=1.082013 time=106.3s
+  ttt_chunk [381/1238] bpb=1.082418 time=109.0s
+  ttt_chunk [391/1238] bpb=1.082095 time=111.7s
+  ttt_chunk [401/1238] bpb=1.082197 time=114.4s
+  ttt_chunk [411/1238] bpb=1.082733 time=117.2s
+  ttt_chunk [421/1238] bpb=1.082228 time=119.9s
+  ttt_chunk [431/1238] bpb=1.082454 time=122.6s
+  ttt_chunk [441/1238] bpb=1.082535 time=125.3s
+  ttt_chunk [451/1238] bpb=1.083706 time=128.0s
+  ttt_chunk [461/1238] bpb=1.081980 time=130.7s
+  ttt_chunk [471/1238] bpb=1.082016 time=133.4s
+  ttt_chunk [481/1238] bpb=1.082149 time=136.2s
+  ttt_chunk [491/1238] bpb=1.082592 time=139.2s
+  ttt_chunk [501/1238] bpb=1.082222 time=141.9s
+  ttt_chunk [511/1238] bpb=1.081846 time=144.6s
+  ttt_chunk [521/1238] bpb=1.081373 time=147.3s
+  ttt_chunk [531/1238] bpb=1.081342 time=150.7s
+  ttt_chunk [541/1238] bpb=1.081435 time=153.4s
+  ttt_chunk [551/1238] bpb=1.080984 time=156.1s
+  ttt_chunk [561/1238] bpb=1.080287 time=158.9s
+  ttt_chunk [571/1238] bpb=1.079745 time=161.6s
+  ttt_chunk [581/1238] bpb=1.080119 time=164.3s
+  ttt_chunk [591/1238] bpb=1.080319 time=167.0s
+  ttt_chunk [601/1238] bpb=1.080261 time=169.8s
+  ttt_chunk [611/1238] bpb=1.080868 time=172.5s
+  ttt_chunk [621/1238] bpb=1.081742 time=175.2s
+  ttt_chunk [631/1238] bpb=1.081834 time=177.9s
+  ttt_chunk [641/1238] bpb=1.082276 time=180.6s
+  ttt_chunk [651/1238] bpb=1.082624 time=183.3s
+  ttt_chunk [661/1238] bpb=1.081995 time=186.0s
+  ttt_chunk [671/1238] bpb=1.081768 time=188.8s
+  ttt_chunk [681/1238] bpb=1.083097 time=191.5s
+  ttt_chunk [691/1238] bpb=1.083299 time=194.2s
+  ttt_chunk [701/1238] bpb=1.083086 time=196.9s
+  ttt_chunk [711/1238] bpb=1.083778 time=199.6s
+  ttt_chunk [721/1238] bpb=1.084086 time=202.4s
+  ttt_chunk [731/1238] bpb=1.083443 time=205.1s
+  ttt_chunk [741/1238] bpb=1.083141 time=207.8s
+  ttt_chunk [751/1238] bpb=1.082239 time=210.6s
+  ttt_chunk [761/1238] bpb=1.081614 time=213.3s
+  ttt_chunk [771/1238] bpb=1.080587 time=216.0s
+  ttt_chunk [781/1238] bpb=1.080601 time=218.8s
+  ttt_chunk [791/1238] bpb=1.080922 time=221.5s
+  ttt_chunk [801/1238] bpb=1.081196 time=224.2s
+  ttt_chunk [811/1238] bpb=1.080692 time=226.9s
+  ttt_chunk [821/1238] bpb=1.079506 time=229.6s
+  ttt_chunk [831/1238] bpb=1.079193 time=232.4s
+  ttt_chunk [841/1238] bpb=1.078729 time=235.1s
+  ttt_chunk [851/1238] bpb=1.078450 time=237.8s
+  ttt_chunk [861/1238] bpb=1.078108 time=240.6s
+  ttt_chunk [871/1238] bpb=1.078006 time=243.3s
+  ttt_chunk [881/1238] bpb=1.077563 time=246.0s
+  ttt_chunk [891/1238] bpb=1.077057 time=248.7s
+  ttt_chunk [901/1238] bpb=1.077409 time=251.5s
+  ttt_chunk [911/1238] bpb=1.077109 time=254.2s
+  ttt_chunk [921/1238] bpb=1.077399 time=256.9s
+  ttt_chunk [931/1238] bpb=1.078074 time=259.7s
+  ttt_chunk [941/1238] bpb=1.078472 time=262.4s
+  ttt_chunk [951/1238] bpb=1.078387 time=265.1s
+  ttt_chunk [961/1238] bpb=1.079209 time=267.9s
+  ttt_chunk [971/1238] bpb=1.079629 time=270.6s
+  ttt_chunk [981/1238] bpb=1.080004 time=273.3s
+  ttt_chunk [991/1238] bpb=1.079787 time=276.1s
+  ttt_chunk [1001/1238] bpb=1.079834 time=278.8s
+  ttt_chunk [1011/1238] bpb=1.080197 time=281.6s
+  ttt_chunk [1021/1238] bpb=1.080902 time=284.3s
+  ttt_chunk [1031/1238] bpb=1.081390 time=287.1s
+  ttt_chunk [1041/1238] bpb=1.081849 time=289.8s
+  ttt_chunk [1051/1238] bpb=1.081777 time=292.5s
+  ttt_chunk [1061/1238] bpb=1.081795 time=295.2s
+  ttt_chunk [1071/1238] bpb=1.081946 time=298.0s
+  ttt_chunk [1081/1238] bpb=1.081831 time=300.7s
+  ttt_chunk [1091/1238] bpb=1.082025 time=303.4s
+  ttt_chunk [1101/1238] bpb=1.082556 time=306.2s
+  ttt_chunk [1111/1238] bpb=1.082858 time=308.9s
+  ttt_chunk [1121/1238] bpb=1.083037 time=311.6s
+  ttt_chunk [1131/1238] bpb=1.082692 time=314.4s
+  ttt_chunk [1141/1238] bpb=1.082369 time=317.1s
+  ttt_chunk [1151/1238] bpb=1.082419 time=319.8s
+  ttt_chunk [1161/1238] bpb=1.082554 time=322.6s
+  ttt_chunk [1171/1238] bpb=1.082317 time=325.3s
+  ttt_chunk [1181/1238] bpb=1.081870 time=328.1s
+  ttt_chunk [1191/1238] bpb=1.082025 time=330.8s
+  ttt_chunk [1201/1238] bpb=1.082108 time=333.5s
+  ttt_chunk [1211/1238] bpb=1.081810 time=336.3s
+  ttt_chunk [1221/1238] bpb=1.081374 time=339.0s
+  ttt_chunk [1231/1238] bpb=1.080997 time=341.8s
+  ttt_chunk [1238/1238] bpb=1.081003 time=345.8s
+ttt_sliding:done val_loss=2.792247 val_bpb=1.080966 elapsed=345.8s
+legal_ttt_exact val_loss:2.79224669 val_bpb:1.08096571 eval_time:345993ms