Skip to content

Record: 1.0400 BPB -- Hedge Mixer + VRL + AdamW TTT + Polyak EMA#731

Open
pentxayc wants to merge 1 commit intoopenai:mainfrom
pentxayc:submission/hedge-mixer-vrl-1.0410
Open

Record: 1.0400 BPB -- Hedge Mixer + VRL + AdamW TTT + Polyak EMA#731
pentxayc wants to merge 1 commit intoopenai:mainfrom
pentxayc:submission/hedge-mixer-vrl-1.0410

Conversation

@pentxayc
Copy link
Copy Markdown

Summary

  • 1.0400 BPB (seed 42, 2 additional seeds pending)
  • 11L transformer (26.99M params) with Value Residual Learning (VRL), LeakyReLU(0.5)², XSA-4
  • 5-expert Hedge Mixer during eval: neural model + unigram + bigram + trigram (64K hashed) + entropy
  • Hedge algorithm (eta=0.1) with deferred between-chunk weight updates (legal score-first)
  • AdamW TTT (lr=0.0005) + Polyak EMA (decay=0.998) + byte-weighted loss + adaptive cosine LR
  • Freeze first 9/11 blocks during TTT, unfreeze last 2 + norms/scales
  • Int6 mixed quantization + lzma compression
  • Artifact: 15,999,919 bytes (under 16MB limit)
  • Training: 6104 steps in 600s on 8xH100 SXM
  • Eval (TTT + Hedge): 404s / 600s budget

Legality

All eval-time adaptations are strictly score-first:

  1. Hedge weights for chunk N computed from chunks 0..N-1 only (deferred update after all windows scored)
  2. N-gram tables updated after chunk scoring completes
  3. Polyak EMA uses fixed decay, no snapshot selection
  4. TTT trains only on already-scored chunks
  5. No validation data during training; no training data during evaluation

Test plan

  • Seed 42: 1.0400 BPB
  • Seed 1337: pending
  • Seed 2024: pending

🤖 Generated with Claude Code

5-expert Hedge Mixer (neural + unigram + bigram + trigram + entropy) with
deferred between-chunk weight updates, combined with AdamW TTT + Polyak EMA
+ byte-weighted loss + adaptive cosine LR on an 11L VRL + LeakyReLU² + XSA-4
base. Seed 42 = 1.0400 BPB. Two additional seeds pending.
sunnypatneedi pushed a commit to sunnypatneedi/parameter-golf that referenced this pull request Apr 4, 2026
… Parallel Residuals path

- PR openai#771 confirmed CLOSED/REJECTED (train-then-score TTT)
- N-gram PRs openai#727/openai#741 CLOSED (illegal); openai#758/openai#731 open but same risk
- Merged SOTA unchanged at 1.1147
- New high-EV targets: PR openai#1351 (Discriminative TTT, 1.0807) and PR openai#1334
  (SP4096 + Depth Recurrence + Parallel Residuals + MuonEq-R, 1.0897)
- SLOT still unruled in Issue openai#140 — blocked until @valerio-oai rules
- CLAUDE.md updated to v8.0 with corrected strategy and Session 5 lessons

https://claude.ai/code/session_01X5rVjJpYyqm8DuWTNy2gkt
@MatoTeziTanka
Copy link
Copy Markdown

Community Review — Record: 1.0400 BPB -- Hedge Mixer + VRL + AdamW TTT + Polyak EMA

BPB: 1.0400 | Compliance: LOOKS CLEAN — score-first-per-chunk TTT (legal #1413 dexhunter pattern)

What I found in the code (head SHA 6cff4df0d716, file records/track_10min_16mb/2026-03-25_HedgeMixer_VRL_AdamWTTT_1.0400/train_gpt.py):

The TTT path at line 1017 implements the score-first-per-chunk pattern: each chunk is scored under torch.no_grad() / inference_mode() before the base_model.train() + SGD adaptation runs on that same chunk, with an is_last_chunk guard so the final chunk gets no adaptation pass. This is the structural shape of the current leaderboard's legal frontier (PR #1413 dexhunter, the 1.0828 SP8192 + QK-Gain 5 + Legal TTT entry — verified at its head SHA against the is_last_chunk + torch.no_grad() score-first accumulator pattern).

Per Issue #402 and Issue #677, TTT is legal when each token is scored before the adapter updates on it, and that's what the code does here — chunk ci is scored under weights adapted only on chunks 0..ci-1. No prequant_ttt_adapt_adamw(val_tokens, ...) multi-epoch fine-tune, no scored-region SLOT, no target-in-key n-gram cache.

CPU smoke test (CT2038 proteus-engine, 2026-04-11): import OK in 0.12s, dim=512, layers=11, vocab=1024, code=94305 B, SMOKE_TEST_PASS

Verdict: LOOKS CLEAN.

Recommendation to @cocohearts @valerio-oai @0hq @yuzhougu-oai @notapplica: MERGE pending standard checks (3-seed validation, 16MB artifact cap, 10-min wallclock on 8×H100 SXM). The compliance picture matches the legal reference frontier and no flags were raised by the classification pass.

Auto-classification caveat: this review was drafted by the AST-based classifier against a template derived from manually-reviewed cluster PRs (#1420, #1450, #1487, #1541, #1529, #1533, #1518). If I've misread a subtlety in your eval path — e.g., multi-epoch TTT that I mistook for single-pass, or a target-in-key lookup I missed in a helper function — please flag it and I'll re-run the audit manually.


Reviewed by @MatoTeziTankaThe Agora. CPU smoke test (CT2038 proteus-engine, 2026-04-11): import OK in 0.12s, dim=512, layers=11, vocab=1024, code=94305 B, SMOKE_TEST_PASS. Classification via deterministic AST-based classify_prs.py (pattern bank derived from ~65 manually-reviewed PRs earlier in the 2026-04-11 sweep). This review was auto-drafted from a template and spot-checked before posting — if the template misread your code, please call it out so I can iterate the classifier.

sunnypatneedi pushed a commit to sunnypatneedi/parameter-golf that referenced this pull request Apr 12, 2026
…1.01710

Merged SOTA changed from 1.1147 to 1.0810 (PR openai#1493, bigbag, 2026-04-09).
Six PRs merged in 5 days (PRs openai#1334, openai#1285, openai#1394, openai#1412, openai#1413, openai#1477, openai#1493).
New target: ≤1.0760 val_bpb. 18 days to deadline.

Key findings:
- GDN-Hybrid (PR openai#1564): 1.01710 BPB, no TTT/SLOT — monitor for organizer review
- VarLen Attention + Doc-TTT (PR openai#1560): 1.07406 BPB — implement next
- TMA Megakernel + Tap-In (PR openai#1555): 1.07636 BPB — add after openai#1560
- PR openai#731 n-gram (dense count + Laplace): reviewer says LOOKS CLEAN, awaiting 3rd seed
- PR openai#758: major legality flags, do not implement

Updated CLAUDE.md: Competition Strategy, Technique Reference, Lessons Learned (Session 9).
Updated logs/daily_research.md: new 2026-04-12 entry prepended.

https://claude.ai/code/session_011WyxjcwdigLhMFQDjLL5ss
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants