Record: 11L XSA-all + 7-gram cache (mean val_bpb=1.0465) by hypery11 · Pull Request #758 · openai/parameter-golf

hypery11 · 2026-03-25T19:22:26Z

Results

Seed	val_bpb
42	1.0467
1337	1.0470
2024	1.0457
Mean	1.0465
Std	0.0007

Artifact: 13.99 MB
Train: 600s on 8xH100 SXM
Eval: ~116s

Method

11-layer transformer with XSA-all (Exclusive Self-Attention on all layers), LeakyReLU(0.5)^2, Value Residual, Gated Attention, BigramHash(10240), SmearGate. GPTQ-lite int6 + zstd-22. EMA(0.997) + Tight SWA + Late QAT.

7-gram backward-looking eval cache (alpha=0.40, 4M buckets). Score-first, deterministic, no TTT.

Architecture builds on community techniques from PRs #609, #549.

8xH100 SXM, train ≤600s
Eval ≤600s (116s)
Artifact ≤16MB (13.99MB)
3-seed validation (std 0.0007)

Seeds: 1.0467 / 1.0470 / 1.0457 (std 0.0007). 11L with XSA-all, LeakyReLU^2, VR, GA, GPTQ-lite int6. 13.99MB artifact. Train 600s, eval 116s.

…ivot - Log PR openai#771 CLOSED (TTT rules violation: adapt-then-score same tokens) - Update competition strategy: pivot from AdamW TTT to n-gram eval cache - Document legal TTT definition (backward-looking only, already-graded chunks) - Track new open PRs: openai#933 (0.0804), openai#758 (1.0465), openai#1028 (0.9984 unstable) - Add Session 4 lessons learned (lessons 17-20) - Update abandoned approaches and key reference PRs in CLAUDE.md https://claude.ai/code/session_0173mhLdyzis2j7NKyvDQ8ST

@valerio-oai

… Parallel Residuals path - PR openai#771 confirmed CLOSED/REJECTED (train-then-score TTT) - N-gram PRs openai#727/openai#741 CLOSED (illegal); openai#758/openai#731 open but same risk - Merged SOTA unchanged at 1.1147 - New high-EV targets: PR openai#1351 (Discriminative TTT, 1.0807) and PR openai#1334 (SP4096 + Depth Recurrence + Parallel Residuals + MuonEq-R, 1.0897) - SLOT still unruled in Issue openai#140 — blocked until @valerio-oai rules - CLAUDE.md updated to v8.0 with corrected strategy and Session 5 lessons https://claude.ai/code/session_01X5rVjJpYyqm8DuWTNy2gkt

…RA TTT doc-independent legal; BPB bug alert - PR openai#1541 (bigbag, 1.07785): Improved Parallel Residuals cross-lane + Muon 0.97 — open, hash embed flag pending - PR openai#1540 (aryanbhosale, 1.0777): VarLen Attention + Doc-Independent LoRA TTT rank-96 (score-first, resets per batch) — appears legal - PR openai#1539 confirmed illegal (Pre-Quant AdamW TTT, same ruling as openai#771) - PR openai#1545 BPB double-counting bug: real score ~1.028 claim is ~1.18 actual - PR openai#758 effectively dead: TTT contradiction + unnormalized n-gram both flagged - Session 10 lessons: MATRIX_LR=0.03 pairs with Muon 0.97; doc-independent LoRA TTT is adoptable - No merged SOTA change (still 1.0810); target remains ≤1.0760 https://claude.ai/code/session_01LgqwEDyFnyHsBbyJiSFUjK

MatoTeziTanka · 2026-04-12T04:52:08Z

Community Review — Record: 11L XSA-all + 7-gram cache (mean val_bpb=1.0465)

BPB: 1.0465 | Compliance: FLAG — hashed n-gram cache with target-in-key (PR #779 family pattern)

What I found in the code (head SHA 5ed06ab2129f, file records/track_10min_16mb/2026-03-25_11L_XSA_7gram/train_gpt.py):

The n-gram lookup key at line 1143 is constructed by XOR-ing the target token into the hash:

line 1143: full_key = <hash> ^ (tgt_np * ng_primes[...]) & mask

This matches the full_key = ((ctx_hash ^ (target * primes[k])) & mask) construction that @valerio-oai ruled disallowed on PR #779 (comment 4145781641, 2026-03-27). Per the mechanism explanation, hashing the target token into the lookup key only reweights the correct token — in the hash-collision limit this drives P(correct) → 1 regardless of the data, which inflates the reported BPB without producing real compression.

Per Issue #1017 condition 1, p_t may depend only on the artifact and x_1...x_{t-1}. Because the lookup key at line 1143 is a function of the target token, the count read at scoring position t depends on x_t itself — which is the core violation the #779 ruling targets.

Cluster context: this same structural pattern has been closed on 15+ PRs under the #779 ruling as of 2026-04-11 (#779 itself, #770, #798, #808, #825, #786, #797, #909, #940, #761, #776, #788, #774, #778, #715, #758, #702 upstream, #1488). The base neural model is unaffected by this flag — in every case where the authors resubmitted without the n-gram cache, the base val_bpb has been in the ~1.10-1.15 range (standard for the SP1024 11L class).

CPU smoke test (CT2038 proteus-engine, 2026-04-11): import OK in 0.10s, dim=512, layers=11, vocab=1024, code=85725 B, SMOKE_TEST_PASS

Verdict: COMPLIANCE FLAG — target-in-key hashed n-gram cache, same family as PR #779.

Recommendation to @cocohearts @valerio-oai @0hq @yuzhougu-oai @notapplica: CLOSE under the same ruling as the rest of the family-bug cluster. A context-only resubmission (drop the target from the lookup key and use a full-vocabulary reweighting from a single context row, per @valerio-oai's suggested legal path on #779) would be welcomed.

Reviewed by @MatoTeziTanka — The Agora. CPU smoke test (CT2038 proteus-engine, 2026-04-11): import OK in 0.10s, dim=512, layers=11, vocab=1024, code=85725 B, SMOKE_TEST_PASS. Classification via deterministic AST-based classify_prs.py (pattern bank derived from ~65 manually-reviewed PRs earlier in the 2026-04-11 sweep). This review was auto-drafted from a template and spot-checked before posting — if the template misread your code, please call it out so I can iterate the classifier.

…1.01710 Merged SOTA changed from 1.1147 to 1.0810 (PR openai#1493, bigbag, 2026-04-09). Six PRs merged in 5 days (PRs openai#1334, openai#1285, openai#1394, openai#1412, openai#1413, openai#1477, openai#1493). New target: ≤1.0760 val_bpb. 18 days to deadline. Key findings: - GDN-Hybrid (PR openai#1564): 1.01710 BPB, no TTT/SLOT — monitor for organizer review - VarLen Attention + Doc-TTT (PR openai#1560): 1.07406 BPB — implement next - TMA Megakernel + Tap-In (PR openai#1555): 1.07636 BPB — add after openai#1560 - PR openai#731 n-gram (dense count + Laplace): reviewer says LOOKS CLEAN, awaiting 3rd seed - PR openai#758: major legality flags, do not implement Updated CLAUDE.md: Competition Strategy, Technique Reference, Lessons Learned (Session 9). Updated logs/daily_research.md: new 2026-04-12 entry prepended. https://claude.ai/code/session_011WyxjcwdigLhMFQDjLL5ss

Record: 11L XSA-all + 7-gram cache (mean val_bpb=1.0465, 3 seeds)

5ed06ab

Seeds: 1.0467 / 1.0470 / 1.0457 (std 0.0007). 11L with XSA-all, LeakyReLU^2, VR, GA, GPTQ-lite int6. 13.99MB artifact. Train 600s, eval 116s.

sunnypatneedi mentioned this pull request Mar 26, 2026

Record: 11-gram Eval Cache + Hedge Mixer (val_bpb: 0.8609) #909

Open

This was referenced Apr 12, 2026

Record: XSA-all + LeakyReLU² + VR + GA + 7-gram cache (val_bpb=1.0337) #715

Open

Record: 1.0240 BPB — Multi-Order N-gram Backoff + Entropy-Adaptive Alpha (100% autonomous research via goldfish) #702

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: 11L XSA-all + 7-gram cache (mean val_bpb=1.0465)#758

Record: 11L XSA-all + 7-gram cache (mean val_bpb=1.0465)#758
hypery11 wants to merge 1 commit intoopenai:mainfrom
hypery11:submission/2026-03-25_11L_XSA_ngram

hypery11 commented Mar 25, 2026

Uh oh!

MatoTeziTanka commented Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hypery11 commented Mar 25, 2026

Results

Method

Uh oh!

MatoTeziTanka commented Apr 12, 2026

Community Review — Record: 11L XSA-all + 7-gram cache (mean val_bpb=1.0465)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants