Record: SP4096 + Depth Recurrence + Parallel Residuals + Causal SLOT-16 — val_bpb 1.0766 (3-seed mean)#1333
Conversation
…ausal SLOT — val_bpb 1.0766 (3-seed mean) 8 train-time techniques + causal context-only SLOT at eval. 3-seed mean: 1.0766 BPB, delta -0.0381 vs merged SOTA.
Self-assessment: Causal SLOT legalityI want to be transparent about the legality of this submission and invite community review. This submission uses Causal SLOT — a context-only variant of SLOT where the delta vector is optimized on already-scored positions only. Standard SLOT was proven to violate Condition 1 by PR #1240 (100% violation rate). Causal SLOT restricts optimization to context-only positions, which should fix the causal violation. Why I think it's legalThe delta at position t depends only on tokens x_1,...,x_{t-64} (all scored in prior windows). The gradient of the context-only loss w.r.t. delta flows only through context positions — new positions contribute zero gradient because of the context mask. This is the same causal guarantee as score-first TTT: adapt on scored tokens, apply to future predictions. The only difference is delta optimization (512 dims) vs weight updates (34M params). Why it might not be legal
Request@0hq @valerio-oai — could you weigh in on whether context-only SLOT (where optimization loss uses ONLY already-scored positions) satisfies the four conditions from Issue #1017? Several submissions use this approach (PRs #1306, #1322, #1324, and now this one). I also have a fully legal Track A submission at PR #1334 (1.0897 BPB, no SLOT, no TTT, no eval-time adaptation) as a fallback. |
|
> I also have a fully legal Track A submission at PR #1334 (1.0897 BPB, no SLOT, no TTT, no eval-time adaptation) as a fallback. Did the 4096 vocab get approved? I remember custom token sets needed approval before adoption, was there any official movement on it, or did everyone just adopt it right away? |
|
@newjordan Good question. The sp4096 tokenizer wasn't individually "approved" — it was introduced by @clarkkev in PR #1218 with data hosted on their HF repo (kevclark/parameter-golf). The README rule (criterion 2) says: "If changes are made to the tokenizer or dataset, prove with certainty that the val_bpb is correctly calculated. Submissions that edit the tokenizer will be examined much more carefully." I didn't create a new tokenizer — I'm using the same sp4096 SentencePiece BPE model from @clarkkev's export, same as PRs #1218, #1285, #1287, #1291, and several others. The byte-accounting uses the standard That said, I don't think there's been an explicit "sp4096 is approved" statement from the maintainers. It's been adopted organically by ~8 PRs at this point. The competition description does say "tokenizer-agnostic" and encourages "novel tokenizers" as a valid approach. |
Hope it goes through! Looks great and is looking to be strong. I've got a couple tricks to pull on the 11x but it's a hell of a time trying to keep up with the bob on the old engine. |
Record: SP4096 + Depth Recurrence + Parallel Residuals + Causal SLOT-16
val_bpb = 1.0766 (3-seed mean, std 0.0004) | ~16.00 MB | 8×H100 SXM
3-Seed Results
Merged SOTA (PR #1019): 1.1147 BPB. Delta: −0.0381 BPB.
Training (6 techniques)
Evaluation: Causal SLOT (context-only delta optimization)
Per-batch additive delta (dim=512) optimized with AdamW (lr=0.008, 16 steps) on context-only positions. Only already-scored tokens contribute to the optimization loss. Delta re-initialized per batch. Model weights completely frozen.
Provably causal: delta depends only on x_1,...,x_{t-64} (all previously scored). New positions scored with adapted delta but excluded from optimization. Same causal guarantee as score-first TTT but via delta optimization instead of weight updates.
Source: arXiv:2505.12392v2, PR #1306 @resouer (causal variant), PR #1176 @bigbag.
Compliance
Reproduction
Credits
PR #1218 @clarkkev, PR #1285 @dexhunter, PR #1204 @msisovic, PR #1289 @MatoTeziTanka, PR #1260 @dexhunter, PR #1019 @abaybektursun, PR #1287 @dentity007, PR #1217 @bigbag, PR #493 @parinzee, PR #1306 @resouer, PR #1176 @bigbag