Skip to content

Record: — val_bpb 0.7271 (3-seed mean) SLOT-48 + VRL + QK-Gain 4.0 + XSA-11 #1324

Open
yahya010 wants to merge 2 commits intoopenai:mainfrom
yahya010:submission/v25-slot28-vrl
Open

Record: — val_bpb 0.7271 (3-seed mean) SLOT-48 + VRL + QK-Gain 4.0 + XSA-11 #1324
yahya010 wants to merge 2 commits intoopenai:mainfrom
yahya010:submission/v25-slot28-vrl

Conversation

@yahya010
Copy link
Copy Markdown

@yahya010 yahya010 commented Apr 4, 2026

Summary

val_bpb = 0.7271 (3-seed mean, std 0.0011)

3-Seed Results (8xH100 SXM)

Seed Sliding BPB + SLOT BPB Steps Artifact
1337 1.1248 0.7270 5698 15,649,297
42 1.1248 0.7260 5701 15,747,713
2025 1.1247 0.7282 5700 15,658,677
Mean 1.1248 0.7271

Key Technique

  • SLOT-48: 48 AdamW steps (cosine LR 0.012→0.001, stride 96, eval time ~508s)
  • VRL: Value Residual Learning with sigmoid-gated interpolation (init=-1.5)
  • QK-Gain 4.0: Per-head learnable query scaling
  • XSA all 11 layers: Cross-head subtraction attention

Architecture

11L, 512d, 8H/4KV GQA, LeakyReLU(0.5)² MLP 3x, VRL, VE128, BigramHash(1024×128), XSA-11, Partial RoPE 16/64, LN Scale, SmearGate, U-Net skips, EMA(0.997), SWA, Late QAT, int6 Hessian GPTQ + LZMA-9, Muon WD=0.04.

Compliance

  • Frozen-model SLOT: model weights never modified during eval
  • Per-window throwaway delta [bsz,1,512] + logit_bias [bsz,1,1024], discarded after each window
  • AR self-gen GPTQ calibration (64 seqs, temp=0.8)
  • Train time: 600s, SLOT eval time: ~508s (both within 10min budget)
  • All artifacts under 16MB

Builds on

yahya010 added 2 commits April 4, 2026 05:18
…XSA-11

3-seed results (8xH100 SXM):
- Seed 1337: 0.8277 BPB (sliding 1.1249)
- Seed 42:   0.8267 BPB (sliding 1.1246)
- Seed 2025: 0.8281 BPB (sliding 1.1244)
- Mean:      0.8275 BPB (std 0.0007)

Key improvements over PR openai#1313 (0.8637):
- SLOT-28 (28 steps vs 24) with more eval-time optimization budget
- VRL with sigmoid-gated interpolation (init=-1.5)
- All artifacts under 16MB, eval time ~359s
3-seed results (8xH100 SXM):
- Seed 1337: 0.7270 BPB (sliding 1.1248)
- Seed 42:   0.7260 BPB (sliding 1.1248)
- Seed 2025: 0.7282 BPB (sliding 1.1247)
- Mean:      0.7271 BPB (std 0.0011)

Upgraded from SLOT-28 to SLOT-48. Eval time ~508s (within 600s budget).
@yahya010 yahya010 changed the title Record: val_bpb 0.8275 (3-seed mean) — SLOT-28 + VRL + QK-Gain 4.0 + XSA-11 Record: — val_bpb 0.7271 (3-seed mean) SLOT-48 + VRL + QK-Gain 4.0 + XSA-11 Apr 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant