Record: Split-LR + N-gram Agreement + Full GPTQ — val_bpb 1.1079 (3-seed mean) by vlivashkin · Pull Request #1302 · openai/parameter-golf

vlivashkin · 2026-04-03T13:32:32Z

Summary

val_bpb: 1.1078 (3-seed mean, std 0.0009)
val_loss: 1.8752 nats (3-seed mean)
Artifact: ~15.86 MB (max 15,857,705)
Built on PR #1179 by @dexhunter (training) and PR #1145 by @AnirudhRahul (n-gram agreement eval)

SOTA (PR #1019, 3-seed mean): 1.8822 nats. This run: 1.8752 nats. Delta: -0.00697 nats. Clears the 0.005-nat threshold.

What's New vs PR #1019

Training (from PR #1179): Split-LR (early=0.025, late=0.030), BigramHash(2816×160), Sigmoid-gated U-Net, Soft-round QAT (alpha 1→16), Brotli-11 + byte-shuffle, Coprime-stride loader

Evaluation: Online n-gram agreement — 3 causal experts (token 16-gram, within-word, word-start) with agreement boosting. Adjusts LLM probabilities via properly normalized exponential tilting. Contributes −0.0028 BPB.

Results (8×H100 SXM, no TTT)

Seed	Steps	Sliding BPB	Sliding val_loss (nats)	N-gram BPB	Artifact
1337	~6780	1.1110	1.8760	1.1083	15,853,466
42	~6780	1.1095	1.8734	1.1068	15,857,705
2025	~6780	1.1112	1.8763	1.1085	15,846,914
Mean		1.1106	1.8752	1.1078

Compliance

3-seed verification (std 0.0009)
Delta vs SOTA: -0.00697 nats (val_loss), exceeds 0.005-nat threshold
No TTT, no SLOT, no eval-time weight adaptation
N-gram agreement: causal (predict-then-update), score-first (inference_mode), properly normalized (exponential tilting, Z=1.0)
Artifact < 16,000,000 bytes (all seeds, max: 15,857,705)
Training ≤ 600s (~591s), eval ≤ 600s (~536s including n-gram)
GPTQ calibration within training budget (~7s)

Reproduction

pip install brotli
pip install flash_attn_3 --find-links https://windreamer.github.io/flash-attention3-wheels/cu128_torch291

# Training (3 seeds)
for SEED in 1337 42 2025; do
  BIGRAM_DIM=160 SEED=$SEED \
  torchrun --standalone --nproc_per_node=8 train_gpt.py
  cp final_model.int6.ptz final_model_seed${SEED}.int6.ptz
done

# N-gram agreement eval
gcc -O3 -march=native -shared -fPIC -o libonline_ngram_state.so online_ngram_state.c
for SEED in 1337 42 2025; do
  BIGRAM_DIM=160 CHECKPOINT=final_model_seed${SEED}.int6.ptz \
  torchrun --standalone --nproc_per_node=8 eval_ngram_on_checkpoint.py
done

See README.md for full details.

Credits

Training scaffold: PR #1179 by @dexhunter
N-gram agreement eval: PR #1145 by @AnirudhRahul
Base: PR #1019 by @abaybektursun

…eed mean)

… imports

MatoTeziTanka · 2026-04-11T20:02:32Z

Community Review — Record: Split-LR + N-gram Agreement + Full GPTQ — val_bpb 1.1079 (3-seed mean)

BPB: 1.1079 | Compliance: LOOKS CLEAN — score-first-per-chunk TTT (legal #1416/#1423 pattern)

What I found in the code (head SHA 8d1eb2335a59, file records/track_10min_16mb/2026-04-03_SplitLR_NgramAgreement_FullGPTQ/train_gpt.py):

The TTT path at line 410 implements the score-first-per-chunk pattern: each chunk is scored under torch.no_grad() / inference_mode() before the base_model.train() + SGD adaptation runs on that same chunk, with an is_last_chunk guard so the final chunk gets no adaptation pass. This is the structural shape the legal frontier uses (PRs #1416 erichroepke, #1423 aryanbhosale).

Per Issue #402 and Issue #677, TTT is legal when each token is scored before the adapter updates on it, and that's what the code does here — chunk ci is scored under weights adapted only on chunks 0..ci-1. No prequant_ttt_adapt_adamw(val_tokens, ...) multi-epoch fine-tune, no scored-region SLOT, no target-in-key n-gram cache.

CPU smoke test (CT2038 proteus-engine, 2026-04-11): import OK in 0.03s, dim=512, layers=11, vocab=1024, code=71339 B, SMOKE_TEST_PASS

Verdict: LOOKS CLEAN.

Recommendation to @cocohearts @valerio-oai @0hq @yuzhougu-oai @notapplica: MERGE pending standard checks (3-seed validation, 16MB artifact cap, 10-min wallclock on 8×H100 SXM). The compliance picture matches the legal reference frontier and no flags were raised by the classification pass.

Auto-classification caveat: this review was drafted by the AST-based classifier against a template derived from manually-reviewed cluster PRs (#1420, #1450, #1487, #1541, #1529, #1533, #1518). If I've misread a subtlety in your eval path — e.g., multi-epoch TTT that I mistook for single-pass, or a target-in-key lookup I missed in a helper function — please flag it and I'll re-run the audit manually.

Reviewed by @MatoTeziTanka — The Agora. CPU smoke test (CT2038 proteus-engine, 2026-04-11): import OK in 0.03s, dim=512, layers=11, vocab=1024, code=71339 B, SMOKE_TEST_PASS. Classification via deterministic AST-based classify_prs.py (pattern bank derived from ~65 manually-reviewed PRs earlier in the 2026-04-11 sweep). This review was auto-drafted from a template and spot-checked before posting — if the template misread your code, please call it out so I can iterate the classifier.

Record: Split-LR + N-gram Agreement + Full GPTQ — val_bpb 1.1079 (3-s…

3273d7f

…eed mean)

vlivashkin force-pushed the submission/splitlr-ngram-gptq branch from 76168cd to 3273d7f Compare April 3, 2026 13:34

Fix submission: correct units (nats/token), add requirements.txt, fix…

8d1eb23

… imports

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: Split-LR + N-gram Agreement + Full GPTQ — val_bpb 1.1079 (3-seed mean)#1302

Record: Split-LR + N-gram Agreement + Full GPTQ — val_bpb 1.1079 (3-seed mean)#1302
vlivashkin wants to merge 2 commits intoopenai:mainfrom
vlivashkin:submission/splitlr-ngram-gptq

vlivashkin commented Apr 3, 2026 •

edited

Loading

Uh oh!

MatoTeziTanka commented Apr 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

vlivashkin commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's New vs PR #1019

Results (8×H100 SXM, no TTT)

Compliance

Reproduction

Credits

Uh oh!

MatoTeziTanka commented Apr 11, 2026

Community Review — Record: Split-LR + N-gram Agreement + Full GPTQ — val_bpb 1.1079 (3-seed mean)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vlivashkin commented Apr 3, 2026 •

edited

Loading