Skip to content

Record: SP8192 + Triple Recurrence + Banking + Fused MLP + Muon 0.97 — val_bpb 1.0783 (3-seed mean)#1561

Open
EthanYangTW wants to merge 4 commits intoopenai:mainfrom
EthanYangTW:submission/sp8192-legal-sota-clean
Open

Record: SP8192 + Triple Recurrence + Banking + Fused MLP + Muon 0.97 — val_bpb 1.0783 (3-seed mean)#1561
EthanYangTW wants to merge 4 commits intoopenai:mainfrom
EthanYangTW:submission/sp8192-legal-sota-clean

Conversation

@EthanYangTW
Copy link
Copy Markdown

Summary

val_bpb: 1.0783 (3-seed mean, std 0.0004) | ~15.99 MB | 8xH100 SXM

Clean resubmission of #1523 with fixed LZMA decompression wrapper.

Seed Sliding BPB TTT BPB Artifact
1337 1.0798 1.0782 15,986,623
42 1.0793 1.0781 15,983,529
2024 1.0800 1.0788 15,986,767

Stack

  • SP8192 tokenizer, 11 physical / 17 virtual layers (triple depth recurrence)
  • Parameter banking + Parallel Muon (15x optimizer step speedup)
  • Fused MLP Triton TMA kernel + CUTLASS EVT backward fusion
  • Muon momentum 0.97, EMA 0.997, QK-Gain 5.0
  • SDClip GPTQ int6 + int8 embeddings + brotli compression
  • Score-first TTT: SGD lr=0.01, 3 epochs, 32K chunks
  • Eval-time hash embedding: 16384x512, zero-init

Compliance (Track B — Score-First TTT)

Per Issue #1017: each chunk scored under no_grad() before any TTT update. Single left-to-right pass, no rescoring. No SLOT, no pre-quant TTT, no n-gram caches.

Fix from #1523

LZMA wrapper now uses linecache + compile for proper tracebacks and __file__ resolution. CUTLASS EVT fusion is optional (graceful fallback to PyTorch).

Credits

PR #1420 @abaybektursun, PR #1394 @clarkkev, PR #1471 @X-Abhishek-X, PR #1477 @aryanbhosale, PR #1460 @resouer, PR #399 @abaybektursun, PR #1514 @dexhunter

…— val_bpb 1.0783 (3-seed mean)

Clean resubmission with fixed LZMA wrapper (linecache + compile).
Seeds: 1337 (1.07817), 42 (1.07807), 2024 (1.07876)
@EthanYangTW EthanYangTW marked this pull request as ready for review April 12, 2026 06:15
Copilot AI review requested due to automatic review settings April 12, 2026 06:15
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new Track B (score-first TTT) 10min/16mb record submission directory for the SP8192 + triple recurrence + banking + fused MLP + Muon(0.97) stack, including reproducibility artifacts and per-seed logs.

Changes:

  • Adds a new record folder with 3-seed training/eval logs and a short README describing the stack and results.
  • Adds submission.json metadata (val_bpb mean/std, seeds, artifact sizes, hardware/software).
  • Adds a packed train_gpt.py runner with an updated LZMA+base85 decompression wrapper intended to improve tracebacks/__file__ handling.

Reviewed changes

Copilot reviewed 3 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
records/track_10min_16mb/2026-04-12_SP8192_LegalSOTA_Clean/train_gpt.py Packed entrypoint + decompression wrapper for the submission code
records/track_10min_16mb/2026-04-12_SP8192_LegalSOTA_Clean/submission.json Submission metadata (metrics, seeds, artifact bytes, technique summary)
records/track_10min_16mb/2026-04-12_SP8192_LegalSOTA_Clean/README.md Human-readable record summary + reproduction command
records/track_10min_16mb/2026-04-12_SP8192_LegalSOTA_Clean/train_seed42.log Seed 42 training/quant/TTT evaluation log
records/track_10min_16mb/2026-04-12_SP8192_LegalSOTA_Clean/train_seed1337.log Seed 1337 training/quant/TTT evaluation log
records/track_10min_16mb/2026-04-12_SP8192_LegalSOTA_Clean/train_seed2024.log Seed 2024 training/quant/TTT evaluation log

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +3 to +5
F=__file__+'.__decompressed__.py'
C.cache[F]=(len(S),None,S.splitlines(True),F)
exec(compile(S,F,'exec'))
Copy link

Copilot AI Apr 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exec(compile(S, F, 'exec')) runs the decompressed code in the current module globals, so the decompressed script will still see __file__ as the wrapper path (not F). If the intent is to fix __file__ resolution for the decompressed payload, set __file__ = F in the globals passed to exec (or temporarily override and then restore) so relative-path logic inside the decompressed code resolves correctly.

Copilot uses AI. Check for mistakes.
Comment on lines +7 to +11
| Seed | Pre-quant BPP | Sliding BPP | **TTT BPP** | Artifact |
|------|---------------|-------------|-------------|----------|
| 1337 | 1.0859 | 1.0798 | **1.0782** | 15,986,623 |
| 42 | 1.0856 | 1.0793 | **1.0781** | 15,983,529 |
| 2024 | 1.0862 | 1.0800 | **1.0788** | 15,986,767 |
Copy link

Copilot AI Apr 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The results table labels the metric as "BPP" (e.g., "Pre-quant BPP", "Sliding BPP", "TTT BPP"), but this repo’s record READMEs consistently use "BPB" / val_bpb. Consider renaming these headers to "BPB" to avoid confusion about what metric is being reported.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants