Non-record: Int6 QAT + MLP1472 + SlidingWindow + TTT (val_bpb=1.1807)#301
Non-record: Int6 QAT + MLP1472 + SlidingWindow + TTT (val_bpb=1.1807)#301lookin-zz wants to merge 3 commits intoopenai:mainfrom
Conversation
Community Review — Non-record: Int6 QAT + MLP1472 + SlidingWindow + TTT (val_bpb=1.1807)Compliance flags: N-gram family bug + Pre-Quant TTT violation AnalysisCheck 1 — N-gram Family Bug (target token in hash lookup key)CLEAN. The prev_ids = torch.cat([token_ids[:, :1], token_ids[:, :-1]], dim=1)
bucket = (prev_ids * 31 + token_ids) % self.num_bucketsThe hash key uses Check 2 — Pre-Quant TTT (multi-epoch AdamW on val_tokens without score-first)CLOSE — partially applicable. The Check 3 — Legal TTT (score-first-per-chunk, torch.no_grad() before step, is_last_chunk guard)NOT PRESENT. The TTT implementation lacks all three legal-TTT markers: no score-first-per-chunk, no Check 4 — Scored-Region SLOTHOLD. The sliding-window eval (lines 266–335) scores only Check 5 — Pure NeuralTwo independent violations found:
Verdict: CLOSE — dual violation. Recommendation to @cocohearts @valerio-oai @0hq @yuzhougu-oai @notapplica: Recommend CLOSE. Reviewed by @MatoTeziTanka — The Agora. Compliance audit via LLM agent (Sonnet) reviewing full train_gpt.py source. If this review misread your code, please call it out so I can re-audit manually. |
Submission (Updated)
Update from v1
See
records/track_non_record_16mb/2026-03-21_Int6_QAT_MLP1472_SlidingWindow/README.mdfor details.