Record: SP8192 + Triple Recurrence + Banking + Fused MLP + Muon 0.97 — val_bpb 1.0783 (3-seed mean)#1561
Conversation
…— val_bpb 1.0778 (3-seed mean)
…— val_bpb 1.0783 (3-seed mean) Clean resubmission with fixed LZMA wrapper (linecache + compile). Seeds: 1337 (1.07817), 42 (1.07807), 2024 (1.07876)
There was a problem hiding this comment.
Pull request overview
Adds a new Track B (score-first TTT) 10min/16mb record submission directory for the SP8192 + triple recurrence + banking + fused MLP + Muon(0.97) stack, including reproducibility artifacts and per-seed logs.
Changes:
- Adds a new record folder with 3-seed training/eval logs and a short README describing the stack and results.
- Adds
submission.jsonmetadata (val_bpb mean/std, seeds, artifact sizes, hardware/software). - Adds a packed
train_gpt.pyrunner with an updated LZMA+base85 decompression wrapper intended to improve tracebacks/__file__handling.
Reviewed changes
Copilot reviewed 3 out of 6 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| records/track_10min_16mb/2026-04-12_SP8192_LegalSOTA_Clean/train_gpt.py | Packed entrypoint + decompression wrapper for the submission code |
| records/track_10min_16mb/2026-04-12_SP8192_LegalSOTA_Clean/submission.json | Submission metadata (metrics, seeds, artifact bytes, technique summary) |
| records/track_10min_16mb/2026-04-12_SP8192_LegalSOTA_Clean/README.md | Human-readable record summary + reproduction command |
| records/track_10min_16mb/2026-04-12_SP8192_LegalSOTA_Clean/train_seed42.log | Seed 42 training/quant/TTT evaluation log |
| records/track_10min_16mb/2026-04-12_SP8192_LegalSOTA_Clean/train_seed1337.log | Seed 1337 training/quant/TTT evaluation log |
| records/track_10min_16mb/2026-04-12_SP8192_LegalSOTA_Clean/train_seed2024.log | Seed 2024 training/quant/TTT evaluation log |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| F=__file__+'.__decompressed__.py' | ||
| C.cache[F]=(len(S),None,S.splitlines(True),F) | ||
| exec(compile(S,F,'exec')) |
There was a problem hiding this comment.
exec(compile(S, F, 'exec')) runs the decompressed code in the current module globals, so the decompressed script will still see __file__ as the wrapper path (not F). If the intent is to fix __file__ resolution for the decompressed payload, set __file__ = F in the globals passed to exec (or temporarily override and then restore) so relative-path logic inside the decompressed code resolves correctly.
| | Seed | Pre-quant BPP | Sliding BPP | **TTT BPP** | Artifact | | ||
| |------|---------------|-------------|-------------|----------| | ||
| | 1337 | 1.0859 | 1.0798 | **1.0782** | 15,986,623 | | ||
| | 42 | 1.0856 | 1.0793 | **1.0781** | 15,983,529 | | ||
| | 2024 | 1.0862 | 1.0800 | **1.0788** | 15,986,767 | |
There was a problem hiding this comment.
The results table labels the metric as "BPP" (e.g., "Pre-quant BPP", "Sliding BPP", "TTT BPP"), but this repo’s record READMEs consistently use "BPB" / val_bpb. Consider renaming these headers to "BPB" to avoid confusion about what metric is being reported.
Summary
val_bpb: 1.0783 (3-seed mean, std 0.0004) | ~15.99 MB | 8xH100 SXM
Clean resubmission of #1523 with fixed LZMA decompression wrapper.
Stack
Compliance (Track B — Score-First TTT)
Per Issue #1017: each chunk scored under
no_grad()before any TTT update. Single left-to-right pass, no rescoring. No SLOT, no pre-quant TTT, no n-gram caches.Fix from #1523
LZMA wrapper now uses
linecache+compilefor proper tracebacks and__file__resolution. CUTLASS EVT fusion is optional (graceful fallback to PyTorch).Credits
PR #1420 @abaybektursun, PR #1394 @clarkkev, PR #1471 @X-Abhishek-X, PR #1477 @aryanbhosale, PR #1460 @resouer, PR #399 @abaybektursun, PR #1514 @dexhunter