Skip to content

Record: SP8192 + Banking + Triple Recurrence + Parallel Residuals + Muon 0.97 + TTT — val_bpb 1.0790 (5-seed mean)#1533

Open
aryanbhosale wants to merge 1 commit intoopenai:mainfrom
aryanbhosale:submission/sp8192-fused-banking-muon97-5seed
Open

Record: SP8192 + Banking + Triple Recurrence + Parallel Residuals + Muon 0.97 + TTT — val_bpb 1.0790 (5-seed mean)#1533
aryanbhosale wants to merge 1 commit intoopenai:mainfrom
aryanbhosale:submission/sp8192-fused-banking-muon97-5seed

Conversation

@aryanbhosale
Copy link
Copy Markdown
Contributor

Record: SP8192 + Banking + Triple Recurrence + Parallel Residuals + Muon 0.97 + TTT

val_bpb = 1.0790 (5-seed mean, std 0.0003) | ~15.99 MB | 8×H100 SXM

5-Seed Results

Seed TTT BPB val_loss (nats)
42 1.0788 2.7866
314 1.0789 2.7868
1337 1.0788 2.7867
7 1.0793 2.7880
999 1.0795 2.7884
Mean 1.0790 2.7873

Merged SOTA (PR #1493): 1.0810 BPB. Delta: −0.0020 BPB / −0.0047 nats.

Stack

PR #1523 base (@abaybektursun) with hash embedding removed and standard MLP (no Triton fused kernel):

  1. SP8192 + GPTQ embeddings + SDClip
  2. Parameter Banking — batched Newton-Schulz
  3. Triple Depth Recurrence (L3-5, 17 virtual layers)
  4. Parallel Residuals (L7+)
  5. Muon 0.97 (PR Record: SP8192 + Muon 0.97 + Legal Score-First TTT — val_bpb 1.07983 (3-seed mean) #1514 @dexhunter)
  6. QK-Gain 5.25, EMA 0.9965, WD 0.095, warmdown 0.72
  7. Score-First TTT (3 epochs, SGD lr=0.005)

Compliance (Track B)

Score-first TTT (PR #461). No SLOT, no hash embed, no pre-quant TTT, no n-gram, no ETLB. All conditions from Issue #1017 satisfied. All artifacts < 16MB.

Credits

PR #1523 @abaybektursun, PR #1394 @clarkkev, PR #1514 @dexhunter, PR #1493 @bigbag, PR #1204 @msisovic

…uon 0.97 + TTT — val_bpb 1.0790 (5-seed mean)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant