Skip to content

Record: 11L + order-adaptive 9-gram backoff (mean val_bpb=0.9059)#788

Open
hypery11 wants to merge 1 commit intoopenai:mainfrom
hypery11:submission/2026-03-25_champion
Open

Record: 11L + order-adaptive 9-gram backoff (mean val_bpb=0.9059)#788
hypery11 wants to merge 1 commit intoopenai:mainfrom
hypery11:submission/2026-03-25_champion

Conversation

@hypery11
Copy link
Copy Markdown

Results

Seed val_bpb
42 0.9067
1337 0.9059
2024 0.9050
Mean 0.9059
Std 0.0009
  • Artifact: 13.99 MB
  • Train: 600s on 8xH100 SXM
  • Eval: ~150s

Method

11-layer transformer with XSA-all, LeakyReLU(0.5)^2, Value Residual, Gated Attention. GPTQ-lite int6 + zstd-22.

Order-adaptive entropy-gated n-gram backoff cache (orders 2-9). Higher-order matches use lower entropy threshold for mixing. Score-first, deterministic, no TTT.

  • 8xH100 SXM, train <=600s
  • Eval <=600s (~150s)
  • Artifact <=16MB (13.99MB)
  • 3-seed validation (std 0.0009)

Seeds: 0.9067 / 0.9059 / 0.9050 (std 0.0009).
Order-adaptive entropy gating on 2-9 gram backoff.
13.99MB artifact. Train 600s, eval ~150s.
abaybektursun added a commit to abaybektursun/parameter-golf that referenced this pull request Mar 26, 2026
- Base model is ValCalib GPTQ (1.1142 BPB), not PR openai#549 (1.1194)
- Remove stale "not yet deployed" / "we estimate" for EXP-11
- Note α=0.80 (939s) exceeds 600s budget
- Fix PR openai#727 score to 0.9674, PR openai#788 to 0.9059
- Fix PR openai#596 BPB to 0.6430
- "Approved" → "Technique deemed legal" for closed PRs
- Add bucket sweep and per-token overhead proposal
- Replace "neural" with "base LM" throughout

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
taka6745 pushed a commit to taka6745/parameter-golf that referenced this pull request Apr 8, 2026
…s 2007)

THE biggest legal technique gap after LEGAL_TTT. Top 30 legal PRs in COMPETITION_SCOPE.md
all use multi-order n-gram backoff (openai#788/openai#802/openai#828/openai#761 = 0.91-0.96 BPB).

Implementation: at each position, use the HIGHEST-CONFIDENCE n-gram order ONLY:
- if peak(4-gram[h]) > T4: use 4-gram with weight 1.0
- elif peak(3-gram[h]) > T3: use 3-gram with weight α=0.4 (Brants 2007)
- else: use bigram with weight α²=0.16
The 'peak' = max log-prob across vocab — concentrated distributions = confident counts.
Hash-collision noise in lower orders is stripped by using only the most-confident order.

Marker: NGRAM_BACKOFF_MARKER. Env: USE_NGRAM_BACKOFF=1, NGRAM_BACKOFF_THRESH4=1.0,
NGRAM_BACKOFF_THRESH3=1.0, NGRAM_BACKOFF_ALPHA=0.4. Composes with NGRAM_GATE.

Smoke test in /tmp passes: marker present in patched file, syntax-valid Python.
EXPECTED_MARKERS now 46 (was 45).

Queued L09_ngram_backoff_S2_seed42/seed1337 on Pod C for n=2 cheap-pod validation.
@MatoTeziTanka
Copy link
Copy Markdown

Community Review — Record: 11L + order-adaptive 9-gram backoff (mean val_bpb=0.9059)

BPB: 0.9059 | Compliance: FLAG — hashed n-gram cache with target-in-key (PR #779 family pattern)

What I found in the code (head SHA 9d25bf3678e9, file records/track_10min_16mb/2026-03-25_11L_order_adaptive_9gram/train_gpt.py):

The n-gram lookup key at line 1154 is constructed by XOR-ing the target token into the hash:

line 1154: full_key = <hash> ^ (tgt_np * ng_primes[...]) & mask

This matches the full_key = ((ctx_hash ^ (target * primes[k])) & mask) construction that @valerio-oai ruled disallowed on PR #779 (comment 4145781641, 2026-03-27). Per the mechanism explanation, hashing the target token into the lookup key only reweights the correct token — in the hash-collision limit this drives P(correct) → 1 regardless of the data, which inflates the reported BPB without producing real compression.

Per Issue #1017 condition 1, p_t may depend only on the artifact and x_1...x_{t-1}. Because the lookup key at line 1154 is a function of the target token, the count read at scoring position t depends on x_t itself — which is the core violation the #779 ruling targets.

Cluster context: this same structural pattern has been closed on 15+ PRs under the #779 ruling as of 2026-04-11 (#779 itself, #770, #798, #808, #825, #786, #797, #909, #940, #761, #776, #788, #774, #778, #715, #758, #702 upstream, #1488). The base neural model is unaffected by this flag — in every case where the authors resubmitted without the n-gram cache, the base val_bpb has been in the ~1.10-1.15 range (standard for the SP1024 11L class).

CPU smoke test (CT2038 proteus-engine, 2026-04-11): import OK in 0.23s, dim=512, layers=11, vocab=1024, code=88116 B, SMOKE_TEST_PASS

Verdict: COMPLIANCE FLAG — target-in-key hashed n-gram cache, same family as PR #779.

Recommendation to @cocohearts @valerio-oai @0hq @yuzhougu-oai @notapplica: CLOSE under the same ruling as the rest of the family-bug cluster. A context-only resubmission (drop the target from the lookup key and use a full-vocabulary reweighting from a single context row, per @valerio-oai's suggested legal path on #779) would be welcomed.


Reviewed by @MatoTeziTankaThe Agora. CPU smoke test (CT2038 proteus-engine, 2026-04-11): import OK in 0.23s, dim=512, layers=11, vocab=1024, code=88116 B, SMOKE_TEST_PASS. Classification via deterministic AST-based classify_prs.py (pattern bank derived from ~65 manually-reviewed PRs earlier in the 2026-04-11 sweep). This review was auto-drafted from a template and spot-checked before posting — if the template misread your code, please call it out so I can iterate the classifier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants