Skip to content

Record: SP8192 + Banking + Triple Recurrence + Parallel Residuals + Muon 0.97 + TTT — val_bpb 1.0790 (5-seed mean)#1533

Open
aryanbhosale wants to merge 1 commit intoopenai:mainfrom
aryanbhosale:submission/sp8192-fused-banking-muon97-5seed
Open

Record: SP8192 + Banking + Triple Recurrence + Parallel Residuals + Muon 0.97 + TTT — val_bpb 1.0790 (5-seed mean)#1533
aryanbhosale wants to merge 1 commit intoopenai:mainfrom
aryanbhosale:submission/sp8192-fused-banking-muon97-5seed

Conversation

@aryanbhosale
Copy link
Copy Markdown
Contributor

Record: SP8192 + Banking + Triple Recurrence + Parallel Residuals + Muon 0.97 + TTT

val_bpb = 1.0790 (5-seed mean, std 0.0003) | ~15.99 MB | 8×H100 SXM

5-Seed Results

Seed TTT BPB val_loss (nats)
42 1.0788 2.7866
314 1.0789 2.7868
1337 1.0788 2.7867
7 1.0793 2.7880
999 1.0795 2.7884
Mean 1.0790 2.7873

Merged SOTA (PR #1493): 1.0810 BPB. Delta: −0.0020 BPB / −0.0047 nats.

Stack

PR #1523 base (@abaybektursun) with hash embedding removed and standard MLP (no Triton fused kernel):

  1. SP8192 + GPTQ embeddings + SDClip
  2. Parameter Banking — batched Newton-Schulz
  3. Triple Depth Recurrence (L3-5, 17 virtual layers)
  4. Parallel Residuals (L7+)
  5. Muon 0.97 (PR Record: SP8192 + Muon 0.97 + Legal Score-First TTT — val_bpb 1.07983 (3-seed mean) #1514 @dexhunter)
  6. QK-Gain 5.25, EMA 0.9965, WD 0.095, warmdown 0.72
  7. Score-First TTT (3 epochs, SGD lr=0.005)

Compliance (Track B)

Score-first TTT (PR #461). No SLOT, no hash embed, no pre-quant TTT, no n-gram, no ETLB. All conditions from Issue #1017 satisfied. All artifacts < 16MB.

Credits

PR #1523 @abaybektursun, PR #1394 @clarkkev, PR #1514 @dexhunter, PR #1493 @bigbag, PR #1204 @msisovic

…uon 0.97 + TTT — val_bpb 1.0790 (5-seed mean)
@MatoTeziTanka
Copy link
Copy Markdown

Community Review — SP8192 + Banking + Triple Recurrence + Parallel Residuals + Muon 0.97 + TTT

Thanks @aryanbhosale — this is the same general stack family as your other submissions today (#1540 LoRA TTT + varlen) and the #1493 lineage. Clean compliance read below.

What I found (head SHA 9304879bb01246a0f8afa11e9f54e13f7e8f246b, records/track_10min_16mb/2026-04-11_SP8192_Banking_ParResid_TripleRecur_Muon97_TTT/train_gpt.py, decoded from the import lzma as L,base64 as B self-extracting shim — 58,957 bytes / 622 lines of actual source):

CPU smoke test (CT2038 proteus-engine, 2026-04-11):

IMPORT_OK               seconds=6.00
HP_NUM_LAYERS           11
HP_MODEL_DIM            512
HP_VOCAB_SIZE           8192
HP_QK_GAIN_INIT         5.0
HP_CODE_BYTES           19760 (shim; decoded 58,957 B)
SMOKE_TEST_PASS

Compliance summary:

Verdict: LOOKS CLEAN.

Recommendation to @cocohearts @valerio-oai @0hq @yuzhougu-oai @notapplica: MERGE. Incremental improvement on the SP8192 + parallel-residuals + triple-recurrence lineage, clean eval path, no compliance flags. The architectural delta vs #1493 / #1533 / #1541 is small enough that the 3-seed std should be disclosed in the PR body for the statistical claim; on the static review nothing blocks landing.


Reviewed by @MatoTeziTankaThe Agora. CPU smoke test (CT2038 proteus-engine, 2026-04-11): IMPORT_OK 6.00s, SMOKE_TEST_PASS. Decoded source statically reviewed for the standard compliance axes — no flags. Full forward-pass / artifact gauntlet skipped (heavy architecture, CPU-bound past the budget). AI tooling: review drafted with Claude Code (Opus); batch-9 subagent quota exhausted so this review was authored in the main session. SHA 9304879bb01246a0f8afa11e9f54e13f7e8f246b.

This was referenced Apr 11, 2026
resouer added a commit to resouer/parameter-golf that referenced this pull request Apr 11, 2026
…prototype base

Prototype round-22 lane that modulates existing loop-block scales and residual mix per pass instead of injecting additive pass embeddings. This keeps the mechanism cheap and mechanically distinct from the nearest per-pass embedding lineage while staying inside the known-runnable runtime stack.

Constraint: Single heimdall node means novelty must be cheap enough to justify serialized GPU evals
Rejected: Start from packed openai#1533 clean base | preflight failed before any remote logs were emitted on our infra
Confidence: medium
Scope-risk: moderate
Reversibility: clean
Directive: Treat this as a prototype lane only until it passes Python 3.10 packaging smoke and remote eval
Tested: python3 -m py_compile train_gpt.py
Not-tested: GPU evaluation, artifact size delta, 3-seed stability
@MatoTeziTanka
Copy link
Copy Markdown

Community Review — Record: SP8192 + Banking + Triple Recurrence + Parallel Residuals + Muon 0.97 + TTT — val_bpb 1.0790 (5-seed mean)

BPB: 1.0790 | Compliance: LOOKS CLEAN — score-first-per-chunk TTT (legal #1413 dexhunter pattern)

What I found in the code (head SHA 9304879bb012, file records/track_10min_16mb/2026-04-11_SP8192_Banking_ParResid_TripleRecur_Muon97_TTT/train_gpt.py):

The TTT path at line 490 implements the score-first-per-chunk pattern: each chunk is scored under torch.no_grad() / inference_mode() before the base_model.train() + SGD adaptation runs on that same chunk, with an is_last_chunk guard so the final chunk gets no adaptation pass. This is the structural shape of the current leaderboard's legal frontier (PR #1413 dexhunter, the 1.0828 SP8192 + QK-Gain 5 + Legal TTT entry — verified at its head SHA against the is_last_chunk + torch.no_grad() score-first accumulator pattern).

Per Issue #402 and Issue #677, TTT is legal when each token is scored before the adapter updates on it, and that's what the code does here — chunk ci is scored under weights adapted only on chunks 0..ci-1. No prequant_ttt_adapt_adamw(val_tokens, ...) multi-epoch fine-tune, no scored-region SLOT, no target-in-key n-gram cache.

CPU smoke test (CT2038 proteus-engine, 2026-04-11): import OK in 6.00s, dim=512, layers=11, vocab=8192, code=19760 B, SMOKE_TEST_PASS

Verdict: LOOKS CLEAN.

Recommendation to @cocohearts @valerio-oai @0hq @yuzhougu-oai @notapplica: MERGE pending standard checks (3-seed validation, 16MB artifact cap, 10-min wallclock on 8×H100 SXM). The compliance picture matches the legal reference frontier and no flags were raised by the classification pass.

Auto-classification caveat: this review was drafted by the AST-based classifier against a template derived from manually-reviewed cluster PRs (#1420, #1450, #1487, #1541, #1529, #1533, #1518). If I've misread a subtlety in your eval path — e.g., multi-epoch TTT that I mistook for single-pass, or a target-in-key lookup I missed in a helper function — please flag it and I'll re-run the audit manually.


Reviewed by @MatoTeziTankaThe Agora. CPU smoke test (CT2038 proteus-engine, 2026-04-11): import OK in 6.00s, dim=512, layers=11, vocab=8192, code=19760 B, SMOKE_TEST_PASS. Classification via deterministic AST-based classify_prs.py (pattern bank derived from ~65 manually-reviewed PRs earlier in the 2026-04-11 sweep). This review was auto-drafted from a template and spot-checked before posting — if the template misread your code, please call it out so I can iterate the classifier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants