Skip to content

Record: MuonEq-R + 3-Layer Recurrence + WD=0.095 + MLR=0.022 + All-Int6 — val_bpb 1.0900 (3-seed mean)#1331

Open
dexhunter wants to merge 1 commit intoopenai:mainfrom
dexhunter:muoneqr-3layer-recurrence-wd095-mlr022
Open

Record: MuonEq-R + 3-Layer Recurrence + WD=0.095 + MLR=0.022 + All-Int6 — val_bpb 1.0900 (3-seed mean)#1331
dexhunter wants to merge 1 commit intoopenai:mainfrom
dexhunter:muoneqr-3layer-recurrence-wd095-mlr022

Conversation

@dexhunter
Copy link
Copy Markdown

Summary

  • val_bpb = 1.0900 (3-seed mean, std 0.0005) | 2.5077 nats | ~15.96 MB | 8xH100 SXM, 590s | No TTT
  • 3-layer depth recurrence (layers 3,4,5) with WD-LR synergy: WD=0.095 compresses for headroom, MLR=0.022 recovers quality
  • All seeds under 16MB with 36K+ margins
  • No SLOT, no TTT, no eval-time adaptation

Key Innovation: 3-Layer Recurrence + WD-LR Synergy

Extends 2-layer recurrence (PR #1285) to 3 layers. The extra virtual layer needs more artifact budget, compensated by:

  • Higher WD (0.095 vs 0.090) → better compression → headroom for 3-layer recurrence
  • Higher MLR (0.022 vs 0.020) → recovers quality lost from WD increase

Results

Seed Sliding BPB val_loss (nats) Artifact
42 1.0898 2.50733 15,961,029
0 1.0895 2.50672 15,955,962
7 1.0905 2.50901 15,964,018
Mean 1.0900 2.50769 15,960,336

Changes from PR #1285 (1.0912)

PR #1285 This
val_bpb 1.09124 1.08995 (-0.00129)
Recurrence 2-layer (4,5) 3-layer (3,4,5)
WD 0.090 0.095
Matrix LR 0.020 0.022

Credits

…b 1.0900 (3-seed mean)

3-layer depth recurrence (layers 3,4,5) with WD-LR synergy:
higher WD (0.095) compresses for all-int6 headroom, higher MLR (0.022)
recovers quality. All 66 layers at int6 precision.

3-seed mean: 1.0900 BPP / 2.5077 nats (seeds 42, 0, 7)
All seeds under 16MB with 36K+ margins.
No TTT, no SLOT, no eval-time adaptation.

Improves PR openai#1285 (1.0912) by 0.0013 BPP. Beats PR openai#1218 by 0.0079.
Built on PR openai#1218 by @clarkkev.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant