Record: 11L + order-adaptive 9-gram backoff (mean val_bpb=0.9059)#788
Record: 11L + order-adaptive 9-gram backoff (mean val_bpb=0.9059)#788hypery11 wants to merge 1 commit intoopenai:mainfrom
Conversation
Seeds: 0.9067 / 0.9059 / 0.9050 (std 0.0009). Order-adaptive entropy gating on 2-9 gram backoff. 13.99MB artifact. Train 600s, eval ~150s.
- Base model is ValCalib GPTQ (1.1142 BPB), not PR openai#549 (1.1194) - Remove stale "not yet deployed" / "we estimate" for EXP-11 - Note α=0.80 (939s) exceeds 600s budget - Fix PR openai#727 score to 0.9674, PR openai#788 to 0.9059 - Fix PR openai#596 BPB to 0.6430 - "Approved" → "Technique deemed legal" for closed PRs - Add bucket sweep and per-token overhead proposal - Replace "neural" with "base LM" throughout Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…s 2007) THE biggest legal technique gap after LEGAL_TTT. Top 30 legal PRs in COMPETITION_SCOPE.md all use multi-order n-gram backoff (openai#788/openai#802/openai#828/openai#761 = 0.91-0.96 BPB). Implementation: at each position, use the HIGHEST-CONFIDENCE n-gram order ONLY: - if peak(4-gram[h]) > T4: use 4-gram with weight 1.0 - elif peak(3-gram[h]) > T3: use 3-gram with weight α=0.4 (Brants 2007) - else: use bigram with weight α²=0.16 The 'peak' = max log-prob across vocab — concentrated distributions = confident counts. Hash-collision noise in lower orders is stripped by using only the most-confident order. Marker: NGRAM_BACKOFF_MARKER. Env: USE_NGRAM_BACKOFF=1, NGRAM_BACKOFF_THRESH4=1.0, NGRAM_BACKOFF_THRESH3=1.0, NGRAM_BACKOFF_ALPHA=0.4. Composes with NGRAM_GATE. Smoke test in /tmp passes: marker present in patched file, syntax-valid Python. EXPECTED_MARKERS now 46 (was 45). Queued L09_ngram_backoff_S2_seed42/seed1337 on Pod C for n=2 cheap-pod validation.
Community Review — Record: 11L + order-adaptive 9-gram backoff (mean val_bpb=0.9059)BPB: 0.9059 | Compliance: FLAG — hashed n-gram cache with target-in-key (PR #779 family pattern) What I found in the code (head SHA The n-gram lookup key at line 1154 is constructed by XOR-ing the target token into the hash: This matches the Per Issue #1017 condition 1, Cluster context: this same structural pattern has been closed on 15+ PRs under the #779 ruling as of 2026-04-11 (#779 itself, #770, #798, #808, #825, #786, #797, #909, #940, #761, #776, #788, #774, #778, #715, #758, #702 upstream, #1488). The base neural model is unaffected by this flag — in every case where the authors resubmitted without the n-gram cache, the base val_bpb has been in the ~1.10-1.15 range (standard for the SP1024 11L class). CPU smoke test (CT2038 proteus-engine, 2026-04-11): import OK in 0.23s, dim=512, layers=11, vocab=1024, code=88116 B, SMOKE_TEST_PASS Verdict: COMPLIANCE FLAG — target-in-key hashed n-gram cache, same family as PR #779. Recommendation to @cocohearts @valerio-oai @0hq @yuzhougu-oai @notapplica: CLOSE under the same ruling as the rest of the family-bug cluster. A context-only resubmission (drop the target from the lookup key and use a full-vocabulary reweighting from a single context row, per @valerio-oai's suggested legal path on #779) would be welcomed. Reviewed by @MatoTeziTanka — The Agora. CPU smoke test (CT2038 proteus-engine, 2026-04-11): import OK in 0.23s, dim=512, layers=11, vocab=1024, code=88116 B, SMOKE_TEST_PASS. Classification via deterministic AST-based |
Results
Method
11-layer transformer with XSA-all, LeakyReLU(0.5)^2, Value Residual, Gated Attention. GPTQ-lite int6 + zstd-22.
Order-adaptive entropy-gated n-gram backoff cache (orders 2-9). Higher-order matches use lower entropy threshold for mixing. Score-first, deterministic, no TTT.