Record: 11L Full GPTQ + Multi-Order N-gram Backoff (fixed-alpha 0.9757 / entropy-adaptive 0.9605, 3-seed)#778
Conversation
…757, 3-seed mean) 3-seed mean val_bpb = 0.9757 (std=0.0002) on 8xH100 SXM. Training 586s + GPTQ 10s = 596s within 600s budget. Multi-order backward-looking n-gram cache (orders 2-7, fixed alpha=0.40). All artifacts under 16MB. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…0.9757) Both variants included with full 3-seed results. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Neural std: 0.00041 (was 0.00033) Fixed std: 0.00024 (was 0.00020) Entropy std: 0.00031 (was 0.00025) All means and individual seed values were already correct. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Community Review — Record: 11L Full GPTQ + Multi-Order N-gram Backoff (fixed-alpha 0.9757 / entropy-adaptive 0.9605, 3-seed)BPB: 0.9757 | Compliance: FLAG — hashed n-gram cache with target-in-key (PR #779 family pattern) What I found in the code (head SHA The n-gram lookup key at line 1036 is constructed by XOR-ing the target token into the hash: This matches the Per Issue #1017 condition 1, Cluster context: this same structural pattern has been closed on 15+ PRs under the #779 ruling as of 2026-04-11 (#779 itself, #770, #798, #808, #825, #786, #797, #909, #940, #761, #776, #788, #774, #778, #715, #758, #702 upstream, #1488). The base neural model is unaffected by this flag — in every case where the authors resubmitted without the n-gram cache, the base val_bpb has been in the ~1.10-1.15 range (standard for the SP1024 11L class). CPU smoke test (CT2038 proteus-engine, 2026-04-11): import OK in 0.05s, dim=512, layers=11, vocab=1024, code=95919 B, SMOKE_TEST_PASS Verdict: COMPLIANCE FLAG — target-in-key hashed n-gram cache, same family as PR #779. Recommendation to @cocohearts @valerio-oai @0hq @yuzhougu-oai @notapplica: CLOSE under the same ruling as the rest of the family-bug cluster. A context-only resubmission (drop the target from the lookup key and use a full-vocabulary reweighting from a single context row, per @valerio-oai's suggested legal path on #779) would be welcomed. Reviewed by @MatoTeziTanka — The Agora. CPU smoke test (CT2038 proteus-engine, 2026-04-11): import OK in 0.05s, dim=512, layers=11, vocab=1024, code=95919 B, SMOKE_TEST_PASS. Classification via deterministic AST-based |
Summary
Fixed alpha (safest): 3-seed mean val_bpb = 0.9757 (std=0.0002)
Entropy-adaptive alpha: 3-seed mean val_bpb = 0.9605 (std=0.0003)
15.92 MB | 8xH100 SXM | Training 596s/600s
3-Seed Results
Two Variants
Variant 1: Fixed alpha (safest legal)
NGRAM_ENTROPY=0, constantalpha=0.40p = 0.60 * model + 0.40 * ngram— same weight for every tokenVariant 2: Entropy-adaptive alpha
NGRAM_ENTROPY=1,alpha = 0.05 + 0.55 * sigmoid(2 * (H - 4.0))Key Techniques
Compliance
Architecture
11L, 512d, GQA 8H/4KV, LeakyReLU(0.5)^2 MLP 3x, XSA-all, VE128, BigramHash(2048), Partial RoPE 16/64, LN Scale, SmearGate, U-Net skips, EMA(0.997), Parallel Muon, Full Hessian GPTQ int6 + LZMA.
Ablation (seed 1337)
Credits