Record: 11L + order-adaptive 9-gram backoff (mean val_bpb=0.9059) by hypery11 · Pull Request #788 · openai/parameter-golf

hypery11 · 2026-03-26T00:16:37Z

Results

Seed	val_bpb
42	0.9067
1337	0.9059
2024	0.9050
Mean	0.9059
Std	0.0009

Artifact: 13.99 MB
Train: 600s on 8xH100 SXM
Eval: ~150s

Method

11-layer transformer with XSA-all, LeakyReLU(0.5)^2, Value Residual, Gated Attention. GPTQ-lite int6 + zstd-22.

Order-adaptive entropy-gated n-gram backoff cache (orders 2-9). Higher-order matches use lower entropy threshold for mixing. Score-first, deterministic, no TTT.

8xH100 SXM, train <=600s
Eval <=600s (~150s)
Artifact <=16MB (13.99MB)
3-seed validation (std 0.0009)

Seeds: 0.9067 / 0.9059 / 0.9050 (std 0.0009). Order-adaptive entropy gating on 2-9 gram backoff. 13.99MB artifact. Train 600s, eval ~150s.

- Base model is ValCalib GPTQ (1.1142 BPB), not PR openai#549 (1.1194) - Remove stale "not yet deployed" / "we estimate" for EXP-11 - Note α=0.80 (939s) exceeds 600s budget - Fix PR openai#727 score to 0.9674, PR openai#788 to 0.9059 - Fix PR openai#596 BPB to 0.6430 - "Approved" → "Technique deemed legal" for closed PRs - Add bucket sweep and per-token overhead proposal - Replace "neural" with "base LM" throughout Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…s 2007) THE biggest legal technique gap after LEGAL_TTT. Top 30 legal PRs in COMPETITION_SCOPE.md all use multi-order n-gram backoff (openai#788/openai#802/openai#828/openai#761 = 0.91-0.96 BPB). Implementation: at each position, use the HIGHEST-CONFIDENCE n-gram order ONLY: - if peak(4-gram[h]) > T4: use 4-gram with weight 1.0 - elif peak(3-gram[h]) > T3: use 3-gram with weight α=0.4 (Brants 2007) - else: use bigram with weight α²=0.16 The 'peak' = max log-prob across vocab — concentrated distributions = confident counts. Hash-collision noise in lower orders is stripped by using only the most-confident order. Marker: NGRAM_BACKOFF_MARKER. Env: USE_NGRAM_BACKOFF=1, NGRAM_BACKOFF_THRESH4=1.0, NGRAM_BACKOFF_THRESH3=1.0, NGRAM_BACKOFF_ALPHA=0.4. Composes with NGRAM_GATE. Smoke test in /tmp passes: marker present in patched file, syntax-valid Python. EXPECTED_MARKERS now 46 (was 45). Queued L09_ngram_backoff_S2_seed42/seed1337 on Pod C for n=2 cheap-pod validation.

MatoTeziTanka · 2026-04-12T04:51:51Z

Community Review — Record: 11L + order-adaptive 9-gram backoff (mean val_bpb=0.9059)

BPB: 0.9059 | Compliance: FLAG — hashed n-gram cache with target-in-key (PR #779 family pattern)

What I found in the code (head SHA 9d25bf3678e9, file records/track_10min_16mb/2026-03-25_11L_order_adaptive_9gram/train_gpt.py):

The n-gram lookup key at line 1154 is constructed by XOR-ing the target token into the hash:

line 1154: full_key = <hash> ^ (tgt_np * ng_primes[...]) & mask

This matches the full_key = ((ctx_hash ^ (target * primes[k])) & mask) construction that @valerio-oai ruled disallowed on PR #779 (comment 4145781641, 2026-03-27). Per the mechanism explanation, hashing the target token into the lookup key only reweights the correct token — in the hash-collision limit this drives P(correct) → 1 regardless of the data, which inflates the reported BPB without producing real compression.

Per Issue #1017 condition 1, p_t may depend only on the artifact and x_1...x_{t-1}. Because the lookup key at line 1154 is a function of the target token, the count read at scoring position t depends on x_t itself — which is the core violation the #779 ruling targets.

Cluster context: this same structural pattern has been closed on 15+ PRs under the #779 ruling as of 2026-04-11 (#779 itself, #770, #798, #808, #825, #786, #797, #909, #940, #761, #776, #788, #774, #778, #715, #758, #702 upstream, #1488). The base neural model is unaffected by this flag — in every case where the authors resubmitted without the n-gram cache, the base val_bpb has been in the ~1.10-1.15 range (standard for the SP1024 11L class).

CPU smoke test (CT2038 proteus-engine, 2026-04-11): import OK in 0.23s, dim=512, layers=11, vocab=1024, code=88116 B, SMOKE_TEST_PASS

Verdict: COMPLIANCE FLAG — target-in-key hashed n-gram cache, same family as PR #779.

Recommendation to @cocohearts @valerio-oai @0hq @yuzhougu-oai @notapplica: CLOSE under the same ruling as the rest of the family-bug cluster. A context-only resubmission (drop the target from the lookup key and use a full-vocabulary reweighting from a single context row, per @valerio-oai's suggested legal path on #779) would be welcomed.

Reviewed by @MatoTeziTanka — The Agora. CPU smoke test (CT2038 proteus-engine, 2026-04-11): import OK in 0.23s, dim=512, layers=11, vocab=1024, code=88116 B, SMOKE_TEST_PASS. Classification via deterministic AST-based classify_prs.py (pattern bank derived from ~65 manually-reviewed PRs earlier in the 2026-04-11 sweep). This review was auto-drafted from a template and spot-checked before posting — if the template misread your code, please call it out so I can iterate the classifier.

Record: 11L + order-adaptive 9-gram (mean val_bpb=0.9059, 3 seeds)

9d25bf3

Seeds: 0.9067 / 0.9059 / 0.9050 (std 0.0009). Order-adaptive entropy gating on 2-9 gram backoff. 13.99MB artifact. Train 600s, eval ~150s.

notapplica mentioned this pull request Mar 26, 2026

Parameter Golf Formerly Live AI Commentary ⛳ + Analysis / Ideas | every 10 minutes. Now disabled #140

Closed

Robby955 mentioned this pull request Mar 26, 2026

Record: 0.2292 BPB — Dirichlet-Multinomial Smoothing + Distributed Prefill + 15-Gram + EBLS #796

Closed

Bortlesboat mentioned this pull request Mar 26, 2026

10L + Two-Pass Order-11 N-gram Backoff (0.5863 BPB) #876

Closed

abaybektursun mentioned this pull request Mar 26, 2026

RFC: How to Clean Up All the Parameter Golf Submissions #886

Open

6 tasks

sofiabod mentioned this pull request Mar 26, 2026

Record: Order-Adaptive 9-gram Backoff + Distributed Prefill — val_bpb 0.4405 (3-seed mean) #890

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: 11L + order-adaptive 9-gram backoff (mean val_bpb=0.9059)#788

Record: 11L + order-adaptive 9-gram backoff (mean val_bpb=0.9059)#788
hypery11 wants to merge 1 commit intoopenai:mainfrom
hypery11:submission/2026-03-25_champion

hypery11 commented Mar 26, 2026

Uh oh!

MatoTeziTanka commented Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hypery11 commented Mar 26, 2026

Results

Method

Uh oh!

MatoTeziTanka commented Apr 12, 2026

Community Review — Record: 11L + order-adaptive 9-gram backoff (mean val_bpb=0.9059)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants