Skip to content

Record: Frequency-Weighted GPTQ Calibration + AdaptPrecision Embedding Quantization + L10-INT8 + LR1.4x + QK6.0 + WD0.60 on Depth Recurrence — val_bpb 1.0954 (3-seed mean)#1042

Open
nothingLiva wants to merge 28 commits intoopenai:mainfrom
nothingLiva:main

Conversation

@nothingLiva
Copy link
Copy Markdown

@nothingLiva nothingLiva commented Mar 28, 2026

Record: Frequency-Weighted GPTQ Calibration + AdaptPrecision Embedding Quantization + L10-INT8 + LR1.4x + QK6.0 + WD0.60 on Depth Recurrence

val_bpb: 1.0954 (3-seed mean)

Results

Seed val_bpb Artifact Size
1337 1.0953 15.82 MB
42 1.0950 15.81 MB
2024 1.0958 15.83 MB
Mean 1.0954 ~15.82 MB
Std 0.0004

val_bpb: 1.0954 (3-seed mean) | ~15.82 MB | 8×H100 SXM

Checklist

  • Artifact < 16,000,000 bytes (all 3 seeds)
  • Training < 600s, eval < 600s
  • Causal sliding-window evaluation (stride=64)

See README.md for full details.

@nothingLiva
Copy link
Copy Markdown
Author

"Hi! This is my first competition submission, so I'm not entirely sure about the process. Is there anything missing or needed from my side? Happy to address any feedback! 🙂"

@nothingLiva nothingLiva changed the title Record: Adaptive Precision Embedding Quantization (4-seed mean val_bpb=1.1217) Record: Frequency-Weighted GPTQ Calibration + Adaptive Precision Embedding Quantization val_bpb: 1.0980 (3-seed mean) Apr 7, 2026
…ision_L10-INT8_LR1.4x_QK6.0_WD0.60_DepthRecurrence_BPB1.0954
@nothingLiva nothingLiva changed the title Record: Frequency-Weighted GPTQ Calibration + Adaptive Precision Embedding Quantization val_bpb: 1.0980 (3-seed mean) Record: Frequency-Weighted GPTQ Calibration + AdaptPrecision Embedding Quantization + L10-INT8 + LR1.4x + QK6.0 + WD0.60 on Depth Recurrence — val_bpb 1.0954 (3-seed mean) Apr 11, 2026
@MatoTeziTanka
Copy link
Copy Markdown

Community Review — Record: Frequency-Weighted GPTQ Calibration + AdaptPrecision Embedding Quantization + L10-INT8 + LR1.4x + QK6.0 + WD0.60 on Depth Recurrence — val_bpb 1.0954 (3-seed mean)

BPB: 1.0954 | Compliance: LOOKS CLEAN — score-first-per-chunk TTT (legal #1416/#1423 pattern)

What I found in the code (head SHA cc8cb03f6034, file records/track_10min_16mb/2026-04-11_FreqWeightedGPTQ_BPB1.0954/train_gpt.py):

The TTT path at line 1727 implements the score-first-per-chunk pattern: each chunk is scored under torch.no_grad() / inference_mode() before the base_model.train() + SGD adaptation runs on that same chunk, with an is_last_chunk guard so the final chunk gets no adaptation pass. This is the structural shape the legal frontier uses (PRs #1416 erichroepke, #1423 aryanbhosale).

Per Issue #402 and Issue #677, TTT is legal when each token is scored before the adapter updates on it, and that's what the code does here — chunk ci is scored under weights adapted only on chunks 0..ci-1. No prequant_ttt_adapt_adamw(val_tokens, ...) multi-epoch fine-tune, no scored-region SLOT, no target-in-key n-gram cache.

CPU smoke test (CT2038 proteus-engine, 2026-04-11): import OK in 7.72s, dim=512, layers=11, vocab=1024, code=93329 B, SMOKE_TEST_PASS

Verdict: LOOKS CLEAN.

Recommendation to @cocohearts @valerio-oai @0hq @yuzhougu-oai @notapplica: MERGE pending standard checks (3-seed validation, 16MB artifact cap, 10-min wallclock on 8×H100 SXM). The compliance picture matches the legal reference frontier and no flags were raised by the classification pass.

Auto-classification caveat: this review was drafted by the AST-based classifier against a template derived from manually-reviewed cluster PRs (#1420, #1450, #1487, #1541, #1529, #1533, #1518). If I've misread a subtlety in your eval path — e.g., multi-epoch TTT that I mistook for single-pass, or a target-in-key lookup I missed in a helper function — please flag it and I'll re-run the audit manually.


Reviewed by @MatoTeziTankaThe Agora. CPU smoke test (CT2038 proteus-engine, 2026-04-11): import OK in 7.72s, dim=512, layers=11, vocab=1024, code=93329 B, SMOKE_TEST_PASS. Classification via deterministic AST-based classify_prs.py (pattern bank derived from ~65 manually-reviewed PRs earlier in the 2026-04-11 sweep). This review was auto-drafted from a template and spot-checked before posting — if the template misread your code, please call it out so I can iterate the classifier.

@nothingLiva
Copy link
Copy Markdown
Author

Community Review — Record: Frequency-Weighted GPTQ Calibration + AdaptPrecision Embedding Quantization + L10-INT8 + LR1.4x + QK6.0 + WD0.60 on Depth Recurrence — val_bpb 1.0954 (3-seed mean)

BPB: 1.0954 | Compliance: LOOKS CLEAN — score-first-per-chunk TTT (legal #1416/#1423 pattern)

What I found in the code (head SHA cc8cb03f6034, file records/track_10min_16mb/2026-04-11_FreqWeightedGPTQ_BPB1.0954/train_gpt.py):

The TTT path at line 1727 implements the score-first-per-chunk pattern: each chunk is scored under torch.no_grad() / inference_mode() before the base_model.train() + SGD adaptation runs on that same chunk, with an is_last_chunk guard so the final chunk gets no adaptation pass. This is the structural shape the legal frontier uses (PRs #1416 erichroepke, #1423 aryanbhosale).

Thank you @MatoTeziTanka for the thorough code review and compliance check!

You're correct that the TTT path is present in the code (TTT_ENABLED=0). I kept it in because when TTT was officially declared legal, I wanted to test whether it would improve BPB on the specific architecture (Depth Recurrence + FreqGPTQ).

However, consistent with findings from other submissions, TTT actually degraded the BPB on this architecture — so I disabled it (TTT_ENABLED=0) and achieved 1.0954 score without it.

The score of 1.0954 (3-seed mean) is purely from:

Thanks again for the detailed review! 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants