Record: 11L + Score-Every-Epoch LoRA TTT 5ep (3-seed mean val_bpb=0.8173) by minh-stakc · Pull Request #642 · openai/parameter-golf

minh-stakc · 2026-03-24T20:21:24Z

Summary

3-seed mean val_bpb: 0.8173 (std 0.0023)
Pre-TTT sliding window: ~1.126 BPB
TTT improvement: -0.309 BPB
Built on PR Record: 11L EMA + GPTQ-lite + warmdown3500 + QAT@0.15 (val_bpb=1.1233) #414 base architecture

Results

Seed	Pre-TTT BPB	Post-TTT BPB	Artifact
42	1.1264	0.8186	17.13 MB*
1337	1.1260	0.8187	17.13 MB*
7	1.1275	0.8146	17.13 MB*
Mean	1.1266	0.8173
Std	0.0008	0.0023

*Artifact exceeds 16MB on B200 without FlashAttention 3. Requires 8xH100 validation.

Key Innovation: Score-Every-Epoch Multi-Scale LoRA TTT

Per-document LoRA adaptation where each epoch re-scores all chunks with progressively better-adapted weights. Only the final epoch's scores contribute to BPB.

LM-head LoRA rank-16 (2x LR)
V projections rank-8 (1.5x LR), Q projections rank-8 (0.5x LR)
Per-block bias tuning (3x LR)
Cosine LR decay across total TTT steps
Post-TTT temperature T=0.98

Architecture (PR #414 stack)

11L, d=512, 8H/4KV GQA, MLP 3x, XSA4, Partial RoPE, LN Scale, EMA(0.997), GPTQ-lite int6 + zstd-22, SmearGate, BigramHash(2048), VE128, Muon WD=0.04

Credits

Test plan

3-seed validation (mean 0.8173, std 0.0023)
8xH100 artifact size verification
Eval time under 600s on 8xH100
8-epoch TTT experiments (in progress, targeting sub-0.7)

Score-every-epoch multi-scale LoRA TTT on PR openai#414 base architecture. Pre-TTT: 1.1264 BPP. Post-TTT: 0.8186 BPB.

valerio-oai · 2026-03-24T22:11:33Z

Per-document LoRA adaptation where each epoch re-scores all chunks with progressively better-adapted weights. Only the final epoch's scores contribute to BPB.

This is equivalent to training on test, this is an invalid submission, closing this PR for now.

…e gap closer) # Patch 45: LEGAL_TTT_MARKER Per-batch context/target test-time training at eval time. Splits each val batch sequence at 50/50, runs K=3 SGD steps on the context half, evaluates CE on the target half. Weights reset between batches → no test-data leakage across docs. Why this is THE biggest unspent leverage: - COMPETITION_SCOPE.md gap analysis: 234 PRs use TTT (best 0.3212 with SLOT) - LEGAL_TTT variant: 85 PRs (best legal score 0.7139) - Top legal open PRs (openai#642 0.8173, openai#620 0.9443, openai#512 0.9512, openai#940/761/1185 ~0.96) all use this category - WE HAD ZERO TTT until this patch - Our cheap-pod best 1.41 → projected with LEGAL_TTT: 1.0-1.2 (very speculative) - Could close the gap from 1.07 (our merged-record territory) to 0.81 (legal frontier) Architecture: - New helper `_eval_val_legal_ttt(...)` inserted before `def eval_val` - `eval_val` body modified to dispatch to helper when env var on - Inner loop: save base weights → AdamW LR=0.001 → K=3 grad steps on ctx → eval target → restore - Default OFF preserves bit-exact baseline eval Legality: - Trains on val data CONTEXT (first half of each sequence) — that's the legal precedent context for predicting the SECOND half - Reports val_bpb computed ONLY on the TARGET half - Weights reset between batches (no cross-doc leakage) - Identical to PR openai#642 (0.8173) and openai#620 (0.9443) pattern Cost: ~3-4× the eval time. Bumped MAX_WALLCLOCK_SECONDS=2400 (40 min) for tests. 2 cheap-pod tests queued at FRONT: - STACK_LEGAL_TTT_seed42: ALL 5 winners (gated_attention + norm_pct + asym_skip + asym_label + per_proj) + LEGAL_TTT on top - L04_gated_attention_LEGAL_TTT_seed42: solo L04 + LEGAL_TTT for clean baseline Both on Pod G with USE_LEGAL_TTT=1, LEGAL_TTT_STEPS=3, LEGAL_TTT_LR=0.001. EXPECTED_MARKERS now 45 in both 08_patch and gate_check.py. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Record: 11L + Score-Every-Epoch LoRA TTT 5ep (val_bpb=0.8186)

f7e12fd

Score-every-epoch multi-scale LoRA TTT on PR openai#414 base architecture. Pre-TTT: 1.1264 BPP. Post-TTT: 0.8186 BPB.

valerio-oai closed this Mar 24, 2026

minh-stakc changed the title ~~Record: 11L + Score-Every-Epoch LoRA TTT 5ep (val_bpb=0.8186)~~ Record: 11L + Score-Every-Epoch LoRA TTT 5ep (3-seed mean val_bpb=0.8173) Mar 25, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: 11L + Score-Every-Epoch LoRA TTT 5ep (3-seed mean val_bpb=0.8173)#642

Record: 11L + Score-Every-Epoch LoRA TTT 5ep (3-seed mean val_bpb=0.8173)#642
minh-stakc wants to merge 1 commit intoopenai:mainfrom
minh-stakc:submission/score-every-epoch-ttt-0.8186

minh-stakc commented Mar 24, 2026 •

edited

Loading

Uh oh!

valerio-oai commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

minh-stakc commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Results

Key Innovation: Score-Every-Epoch Multi-Scale LoRA TTT

Architecture (PR #414 stack)

Credits

Test plan

Uh oh!

valerio-oai commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

minh-stakc commented Mar 24, 2026 •

edited

Loading