Record: Frequency-Weighted GPTQ Calibration + AdaptPrecision Embedding Quantization + L10-INT8 + LR1.4x + QK6.0 + WD0.60 on Depth Recurrence

nothingLiva · 2026-03-28T23:06:05Z

val_bpb: 1.0954 (3-seed mean)

Results

Seed	val_bpb	Artifact Size
1337	1.0953	15.82 MB
42	1.0950	15.81 MB
2024	1.0958	15.83 MB
Mean	1.0954	~15.82 MB
Std	0.0004

val_bpb: 1.0954 (3-seed mean) | ~15.82 MB | 8×H100 SXM

Checklist

Artifact < 16,000,000 bytes (all 3 seeds)
Training < 600s, eval < 600s
Causal sliding-window evaluation (stride=64)

See README.md for full details.

typo

nothingLiva · 2026-03-30T16:45:35Z

"Hi! This is my first competition submission, so I'm not entirely sure about the process. Is there anything missing or needed from my side? Happy to address any feedback! 🙂"

…s_1.1217/top_tokens.py

…s_1.1217/train_seed_log1.txt

…s_1.1217/train_seed_log2.txt

…s_1.1217/train_seed_log4.txt

…s_1.1217/train_16MBQTo.py

…s_1.1217/train_seed_log3.txt

….0_WD0.60_DepthRecurrence_BPB1.0954

…ision_L10-INT8_LR1.4x_QK6.0_WD0.60_DepthRecurrence_BPB1.0954

…s_1.1217 directory

MatoTeziTanka · 2026-04-11T20:04:37Z

Community Review — Record: Frequency-Weighted GPTQ Calibration + AdaptPrecision Embedding Quantization + L10-INT8 + LR1.4x + QK6.0 + WD0.60 on Depth Recurrence — val_bpb 1.0954 (3-seed mean)

BPB: 1.0954 | Compliance: LOOKS CLEAN — score-first-per-chunk TTT (legal #1416/#1423 pattern)

What I found in the code (head SHA cc8cb03f6034, file records/track_10min_16mb/2026-04-11_FreqWeightedGPTQ_BPB1.0954/train_gpt.py):

The TTT path at line 1727 implements the score-first-per-chunk pattern: each chunk is scored under torch.no_grad() / inference_mode() before the base_model.train() + SGD adaptation runs on that same chunk, with an is_last_chunk guard so the final chunk gets no adaptation pass. This is the structural shape the legal frontier uses (PRs #1416 erichroepke, #1423 aryanbhosale).

Per Issue #402 and Issue #677, TTT is legal when each token is scored before the adapter updates on it, and that's what the code does here — chunk ci is scored under weights adapted only on chunks 0..ci-1. No prequant_ttt_adapt_adamw(val_tokens, ...) multi-epoch fine-tune, no scored-region SLOT, no target-in-key n-gram cache.

CPU smoke test (CT2038 proteus-engine, 2026-04-11): import OK in 7.72s, dim=512, layers=11, vocab=1024, code=93329 B, SMOKE_TEST_PASS

Verdict: LOOKS CLEAN.

Recommendation to @cocohearts @valerio-oai @0hq @yuzhougu-oai @notapplica: MERGE pending standard checks (3-seed validation, 16MB artifact cap, 10-min wallclock on 8×H100 SXM). The compliance picture matches the legal reference frontier and no flags were raised by the classification pass.

Auto-classification caveat: this review was drafted by the AST-based classifier against a template derived from manually-reviewed cluster PRs (#1420, #1450, #1487, #1541, #1529, #1533, #1518). If I've misread a subtlety in your eval path — e.g., multi-epoch TTT that I mistook for single-pass, or a target-in-key lookup I missed in a helper function — please flag it and I'll re-run the audit manually.

Reviewed by @MatoTeziTanka — The Agora. CPU smoke test (CT2038 proteus-engine, 2026-04-11): import OK in 7.72s, dim=512, layers=11, vocab=1024, code=93329 B, SMOKE_TEST_PASS. Classification via deterministic AST-based classify_prs.py (pattern bank derived from ~65 manually-reviewed PRs earlier in the 2026-04-11 sweep). This review was auto-drafted from a template and spot-checked before posting — if the template misread your code, please call it out so I can iterate the classifier.

nothingLiva · 2026-04-12T08:11:12Z

Community Review — Record: Frequency-Weighted GPTQ Calibration + AdaptPrecision Embedding Quantization + L10-INT8 + LR1.4x + QK6.0 + WD0.60 on Depth Recurrence — val_bpb 1.0954 (3-seed mean)

BPB: 1.0954 | Compliance: LOOKS CLEAN — score-first-per-chunk TTT (legal #1416/#1423 pattern)

What I found in the code (head SHA cc8cb03f6034, file records/track_10min_16mb/2026-04-11_FreqWeightedGPTQ_BPB1.0954/train_gpt.py):

The TTT path at line 1727 implements the score-first-per-chunk pattern: each chunk is scored under torch.no_grad() / inference_mode() before the base_model.train() + SGD adaptation runs on that same chunk, with an is_last_chunk guard so the final chunk gets no adaptation pass. This is the structural shape the legal frontier uses (PRs #1416 erichroepke, #1423 aryanbhosale).

Thank you @MatoTeziTanka for the thorough code review and compliance check!

You're correct that the TTT path is present in the code (TTT_ENABLED=0). I kept it in because when TTT was officially declared legal, I wanted to test whether it would improve BPB on the specific architecture (Depth Recurrence + FreqGPTQ).

However, consistent with findings from other submissions, TTT actually degraded the BPB on this architecture — so I disabled it (TTT_ENABLED=0) and achieved 1.0954 score without it.

The score of 1.0954 (3-seed mean) is purely from:

Frequency-Weighted GPTQ Calibration
Adaptive Precision Embedding Quantization
Sandwich Layer 10 → INT8
LR 1.4x + QK-Gain 6.0 + Warmdown 0.60
on the Depth Recurrence base (PR Record: 11L Depth Recurrence + BigramHash + EMA 0.9965 — val_bpb 1.0980 (3-seed mean) #1435).

Thanks again for the detailed review! 👍

nothingLiva added 6 commits March 28, 2026 23:40

Create README.md

98506aa

Create submission

bff18b5

Rename submission to submission.json

4f5913c

Add files via upload

3ec3b84

Update README.md

fa15c59

Update submission.json

4a66e13

typo

Update README.md

3739b11

nothingLiva changed the title ~~Record: Adaptive Precision Embedding Quantization (4-seed mean val_bpb=1.1217)~~ Record: Frequency-Weighted GPTQ Calibration + Adaptive Precision Embedding Quantization val_bpb: 1.0980 (3-seed mean) Apr 7, 2026

nothingLiva added 21 commits April 8, 2026 01:40

Update submission.json

0a21c4a

Delete records/track_10min_16mb/2026-03-28_AdaptivePrecisionEmbedding…

77299c3

…s_1.1217/top_tokens.py

Delete records/track_10min_16mb/2026-03-28_AdaptivePrecisionEmbedding…

7199788

…s_1.1217/train_seed_log1.txt

Delete records/track_10min_16mb/2026-03-28_AdaptivePrecisionEmbedding…

3e91290

…s_1.1217/train_seed_log2.txt

Delete records/track_10min_16mb/2026-03-28_AdaptivePrecisionEmbedding…

34d2e69

…s_1.1217/train_seed_log4.txt

Delete records/track_10min_16mb/2026-03-28_AdaptivePrecisionEmbedding…

876108d

…s_1.1217/train_16MBQTo.py

Update README.md

6f84c19

Add files via upload

e0f9e06

Delete records/track_10min_16mb/2026-03-28_AdaptivePrecisionEmbedding…

da4a1ac

…s_1.1217/train_seed_log3.txt

Merge branch 'openai:main' into main

826808f

Create 2026-04-11_FreqWeightedGPTQ_AdaptPrecision_L10-INT8_LR1.4x_QK6…

bd178dc

….0_WD0.60_DepthRecurrence_BPB1.0954

Add files via upload

f6540cf

Delete records/track_10min_16mb/2026-04-11_FreqWeightedGPTQ_AdaptPrec…

5dbb9b0

…ision_L10-INT8_LR1.4x_QK6.0_WD0.60_DepthRecurrence_BPB1.0954

Delete records/track_10min_16mb/submission.json

d48357a

Delete records/track_10min_16mb/train_gpt.py

f253267

Delete records/track_10min_16mb/train_seed1337_log.txt

6f92039

Delete records/track_10min_16mb/train_seed2024_log.txt

9263fa7

Delete records/track_10min_16mb/train_seed42_log.txt

bdd4939

Create README.md

6e6bec2

Add files via upload

0fe57a0

Delete records/track_10min_16mb/2026-03-28_AdaptivePrecisionEmbedding…

cc8cb03

…s_1.1217 directory

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: Frequency-Weighted GPTQ Calibration + AdaptPrecision Embedding Quantization + L10-INT8 + LR1.4x + QK6.0 + WD0.60 on Depth Recurrence — val_bpb 1.0954 (3-seed mean)#1042