Record: 1.0400 BPB -- Hedge Mixer + VRL + AdamW TTT + Polyak EMA by pentxayc · Pull Request #731 · openai/parameter-golf

pentxayc · 2026-03-25T15:15:24Z

Summary

1.0400 BPB (seed 42, 2 additional seeds pending)
11L transformer (26.99M params) with Value Residual Learning (VRL), LeakyReLU(0.5)², XSA-4
5-expert Hedge Mixer during eval: neural model + unigram + bigram + trigram (64K hashed) + entropy
Hedge algorithm (eta=0.1) with deferred between-chunk weight updates (legal score-first)
AdamW TTT (lr=0.0005) + Polyak EMA (decay=0.998) + byte-weighted loss + adaptive cosine LR
Freeze first 9/11 blocks during TTT, unfreeze last 2 + norms/scales
Int6 mixed quantization + lzma compression
Artifact: 15,999,919 bytes (under 16MB limit)
Training: 6104 steps in 600s on 8xH100 SXM
Eval (TTT + Hedge): 404s / 600s budget

Legality

All eval-time adaptations are strictly score-first:

Hedge weights for chunk N computed from chunks 0..N-1 only (deferred update after all windows scored)
N-gram tables updated after chunk scoring completes
Polyak EMA uses fixed decay, no snapshot selection
TTT trains only on already-scored chunks
No validation data during training; no training data during evaluation

Test plan

Seed 42: 1.0400 BPB
Seed 1337: pending
Seed 2024: pending

🤖 Generated with Claude Code

5-expert Hedge Mixer (neural + unigram + bigram + trigram + entropy) with deferred between-chunk weight updates, combined with AdamW TTT + Polyak EMA + byte-weighted loss + adaptive cosine LR on an 11L VRL + LeakyReLU² + XSA-4 base. Seed 42 = 1.0400 BPB. Two additional seeds pending.

@valerio-oai

… Parallel Residuals path - PR openai#771 confirmed CLOSED/REJECTED (train-then-score TTT) - N-gram PRs openai#727/openai#741 CLOSED (illegal); openai#758/openai#731 open but same risk - Merged SOTA unchanged at 1.1147 - New high-EV targets: PR openai#1351 (Discriminative TTT, 1.0807) and PR openai#1334 (SP4096 + Depth Recurrence + Parallel Residuals + MuonEq-R, 1.0897) - SLOT still unruled in Issue openai#140 — blocked until @valerio-oai rules - CLAUDE.md updated to v8.0 with corrected strategy and Session 5 lessons https://claude.ai/code/session_01X5rVjJpYyqm8DuWTNy2gkt

MatoTeziTanka · 2026-04-12T04:52:16Z

Community Review — Record: 1.0400 BPB -- Hedge Mixer + VRL + AdamW TTT + Polyak EMA

BPB: 1.0400 | Compliance: LOOKS CLEAN — score-first-per-chunk TTT (legal #1413 dexhunter pattern)

What I found in the code (head SHA 6cff4df0d716, file records/track_10min_16mb/2026-03-25_HedgeMixer_VRL_AdamWTTT_1.0400/train_gpt.py):

The TTT path at line 1017 implements the score-first-per-chunk pattern: each chunk is scored under torch.no_grad() / inference_mode() before the base_model.train() + SGD adaptation runs on that same chunk, with an is_last_chunk guard so the final chunk gets no adaptation pass. This is the structural shape of the current leaderboard's legal frontier (PR #1413 dexhunter, the 1.0828 SP8192 + QK-Gain 5 + Legal TTT entry — verified at its head SHA against the is_last_chunk + torch.no_grad() score-first accumulator pattern).

Per Issue #402 and Issue #677, TTT is legal when each token is scored before the adapter updates on it, and that's what the code does here — chunk ci is scored under weights adapted only on chunks 0..ci-1. No prequant_ttt_adapt_adamw(val_tokens, ...) multi-epoch fine-tune, no scored-region SLOT, no target-in-key n-gram cache.

CPU smoke test (CT2038 proteus-engine, 2026-04-11): import OK in 0.12s, dim=512, layers=11, vocab=1024, code=94305 B, SMOKE_TEST_PASS

Verdict: LOOKS CLEAN.

Recommendation to @cocohearts @valerio-oai @0hq @yuzhougu-oai @notapplica: MERGE pending standard checks (3-seed validation, 16MB artifact cap, 10-min wallclock on 8×H100 SXM). The compliance picture matches the legal reference frontier and no flags were raised by the classification pass.

Auto-classification caveat: this review was drafted by the AST-based classifier against a template derived from manually-reviewed cluster PRs (#1420, #1450, #1487, #1541, #1529, #1533, #1518). If I've misread a subtlety in your eval path — e.g., multi-epoch TTT that I mistook for single-pass, or a target-in-key lookup I missed in a helper function — please flag it and I'll re-run the audit manually.

Reviewed by @MatoTeziTanka — The Agora. CPU smoke test (CT2038 proteus-engine, 2026-04-11): import OK in 0.12s, dim=512, layers=11, vocab=1024, code=94305 B, SMOKE_TEST_PASS. Classification via deterministic AST-based classify_prs.py (pattern bank derived from ~65 manually-reviewed PRs earlier in the 2026-04-11 sweep). This review was auto-drafted from a template and spot-checked before posting — if the template misread your code, please call it out so I can iterate the classifier.

…1.01710 Merged SOTA changed from 1.1147 to 1.0810 (PR openai#1493, bigbag, 2026-04-09). Six PRs merged in 5 days (PRs openai#1334, openai#1285, openai#1394, openai#1412, openai#1413, openai#1477, openai#1493). New target: ≤1.0760 val_bpb. 18 days to deadline. Key findings: - GDN-Hybrid (PR openai#1564): 1.01710 BPB, no TTT/SLOT — monitor for organizer review - VarLen Attention + Doc-TTT (PR openai#1560): 1.07406 BPB — implement next - TMA Megakernel + Tap-In (PR openai#1555): 1.07636 BPB — add after openai#1560 - PR openai#731 n-gram (dense count + Laplace): reviewer says LOOKS CLEAN, awaiting 3rd seed - PR openai#758: major legality flags, do not implement Updated CLAUDE.md: Competition Strategy, Technique Reference, Lessons Learned (Session 9). Updated logs/daily_research.md: new 2026-04-12 entry prepended. https://claude.ai/code/session_011WyxjcwdigLhMFQDjLL5ss

notapplica mentioned this pull request Mar 25, 2026

Parameter Golf Formerly Live AI Commentary ⛳ + Analysis / Ideas | every 10 minutes. Now disabled #140

Closed

sunnypatneedi mentioned this pull request Mar 26, 2026

Record: 11-gram Eval Cache + Hedge Mixer (val_bpb: 0.8609) #909

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: 1.0400 BPB -- Hedge Mixer + VRL + AdamW TTT + Polyak EMA#731

Record: 1.0400 BPB -- Hedge Mixer + VRL + AdamW TTT + Polyak EMA#731
pentxayc wants to merge 1 commit intoopenai:mainfrom
pentxayc:submission/hedge-mixer-vrl-1.0410

pentxayc commented Mar 25, 2026

Uh oh!

MatoTeziTanka commented Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

pentxayc commented Mar 25, 2026

Summary

Legality

Test plan

Uh oh!

MatoTeziTanka commented Apr 12, 2026

Community Review — Record: 1.0400 BPB -- Hedge Mixer + VRL + AdamW TTT + Polyak EMA

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants