Record: 11L XSA-all + 7-gram cache (mean val_bpb=1.0465)#758
Record: 11L XSA-all + 7-gram cache (mean val_bpb=1.0465)#758hypery11 wants to merge 1 commit intoopenai:mainfrom
Conversation
Seeds: 1.0467 / 1.0470 / 1.0457 (std 0.0007). 11L with XSA-all, LeakyReLU^2, VR, GA, GPTQ-lite int6. 13.99MB artifact. Train 600s, eval 116s.
…ivot - Log PR openai#771 CLOSED (TTT rules violation: adapt-then-score same tokens) - Update competition strategy: pivot from AdamW TTT to n-gram eval cache - Document legal TTT definition (backward-looking only, already-graded chunks) - Track new open PRs: openai#933 (0.0804), openai#758 (1.0465), openai#1028 (0.9984 unstable) - Add Session 4 lessons learned (lessons 17-20) - Update abandoned approaches and key reference PRs in CLAUDE.md https://claude.ai/code/session_0173mhLdyzis2j7NKyvDQ8ST
… Parallel Residuals path - PR openai#771 confirmed CLOSED/REJECTED (train-then-score TTT) - N-gram PRs openai#727/openai#741 CLOSED (illegal); openai#758/openai#731 open but same risk - Merged SOTA unchanged at 1.1147 - New high-EV targets: PR openai#1351 (Discriminative TTT, 1.0807) and PR openai#1334 (SP4096 + Depth Recurrence + Parallel Residuals + MuonEq-R, 1.0897) - SLOT still unruled in Issue openai#140 — blocked until @valerio-oai rules - CLAUDE.md updated to v8.0 with corrected strategy and Session 5 lessons https://claude.ai/code/session_01X5rVjJpYyqm8DuWTNy2gkt
…RA TTT doc-independent legal; BPB bug alert - PR openai#1541 (bigbag, 1.07785): Improved Parallel Residuals cross-lane + Muon 0.97 — open, hash embed flag pending - PR openai#1540 (aryanbhosale, 1.0777): VarLen Attention + Doc-Independent LoRA TTT rank-96 (score-first, resets per batch) — appears legal - PR openai#1539 confirmed illegal (Pre-Quant AdamW TTT, same ruling as openai#771) - PR openai#1545 BPB double-counting bug: real score ~1.028 claim is ~1.18 actual - PR openai#758 effectively dead: TTT contradiction + unnormalized n-gram both flagged - Session 10 lessons: MATRIX_LR=0.03 pairs with Muon 0.97; doc-independent LoRA TTT is adoptable - No merged SOTA change (still 1.0810); target remains ≤1.0760 https://claude.ai/code/session_01LgqwEDyFnyHsBbyJiSFUjK
Community Review — Record: 11L XSA-all + 7-gram cache (mean val_bpb=1.0465)BPB: 1.0465 | Compliance: FLAG — hashed n-gram cache with target-in-key (PR #779 family pattern) What I found in the code (head SHA The n-gram lookup key at line 1143 is constructed by XOR-ing the target token into the hash: This matches the Per Issue #1017 condition 1, Cluster context: this same structural pattern has been closed on 15+ PRs under the #779 ruling as of 2026-04-11 (#779 itself, #770, #798, #808, #825, #786, #797, #909, #940, #761, #776, #788, #774, #778, #715, #758, #702 upstream, #1488). The base neural model is unaffected by this flag — in every case where the authors resubmitted without the n-gram cache, the base val_bpb has been in the ~1.10-1.15 range (standard for the SP1024 11L class). CPU smoke test (CT2038 proteus-engine, 2026-04-11): import OK in 0.10s, dim=512, layers=11, vocab=1024, code=85725 B, SMOKE_TEST_PASS Verdict: COMPLIANCE FLAG — target-in-key hashed n-gram cache, same family as PR #779. Recommendation to @cocohearts @valerio-oai @0hq @yuzhougu-oai @notapplica: CLOSE under the same ruling as the rest of the family-bug cluster. A context-only resubmission (drop the target from the lookup key and use a full-vocabulary reweighting from a single context row, per @valerio-oai's suggested legal path on #779) would be welcomed. Reviewed by @MatoTeziTanka — The Agora. CPU smoke test (CT2038 proteus-engine, 2026-04-11): import OK in 0.10s, dim=512, layers=11, vocab=1024, code=85725 B, SMOKE_TEST_PASS. Classification via deterministic AST-based |
…1.01710 Merged SOTA changed from 1.1147 to 1.0810 (PR openai#1493, bigbag, 2026-04-09). Six PRs merged in 5 days (PRs openai#1334, openai#1285, openai#1394, openai#1412, openai#1413, openai#1477, openai#1493). New target: ≤1.0760 val_bpb. 18 days to deadline. Key findings: - GDN-Hybrid (PR openai#1564): 1.01710 BPB, no TTT/SLOT — monitor for organizer review - VarLen Attention + Doc-TTT (PR openai#1560): 1.07406 BPB — implement next - TMA Megakernel + Tap-In (PR openai#1555): 1.07636 BPB — add after openai#1560 - PR openai#731 n-gram (dense count + Laplace): reviewer says LOOKS CLEAN, awaiting 3rd seed - PR openai#758: major legality flags, do not implement Updated CLAUDE.md: Competition Strategy, Technique Reference, Lessons Learned (Session 9). Updated logs/daily_research.md: new 2026-04-12 entry prepended. https://claude.ai/code/session_011WyxjcwdigLhMFQDjLL5ss
Results
Method
11-layer transformer with XSA-all (Exclusive Self-Attention on all layers), LeakyReLU(0.5)^2, Value Residual, Gated Attention, BigramHash(10240), SmearGate. GPTQ-lite int6 + zstd-22. EMA(0.997) + Tight SWA + Late QAT.
7-gram backward-looking eval cache (alpha=0.40, 4M buckets). Score-first, deterministic, no TTT.
Architecture builds on community techniques from PRs #609, #549.