-
Notifications
You must be signed in to change notification settings - Fork 3k
Record: SLOT-48 — val_bpb 0.7406 (3-seed mean) #1321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 2 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,67 @@ | ||
| # SLOT-48 — val_bpb 0.7406 (3-seed mean) | ||
|
|
||
| **val_bpb = 0.7406** (3-seed mean, std 0.0051) | 15.75-15.82 MB | 8xH100 SXM | ||
|
|
||
| ## 3-Seed Results | ||
|
|
||
| | Seed | Sliding BPB | + SLOT BPB | Steps | Artifact | | ||
| |------|------------|------------|-------|----------| | ||
| | 1337 | 1.126 | **0.7450** | 6034 | 15,815,983 | | ||
| | 42 | 1.121 | **0.7350** | 6563 | 15,751,595 | | ||
| | 2024 | 1.122 | **0.7416** | 6568 | 15,793,375 | | ||
| | **Mean** | **1.123** | **0.7406** | | | | ||
|
|
||
| Beats PR #1313 (0.8637) by 0.123 BPB. Beats best pending (#1229, 0.9300) by 0.190 BPB. | ||
|
|
||
| ## What Changed vs PR #1313 | ||
|
|
||
| Only SLOT step count — same model, same training, same LR, same stride: | ||
|
|
||
| | Parameter | PR #1313 | This PR | | ||
| |-----------|----------|---------| | ||
| | SLOT_STEPS | 24 | **48** | | ||
|
|
||
| ## Architecture | ||
|
|
||
| 11L, 512d, 8H/4KV GQA, LeakyReLU(0.5)^2 MLP 3x, VRL, VE128, BigramHash(1024), XSA all 11 layers, QK-Gain 4.0, Partial RoPE 16/64, LN Scale, SmearGate, U-Net skips, EMA(0.997), Late QAT, int6+lzma, FA3 Hopper, Muon WD=0.04. | ||
|
|
||
| ## SLOT-48 Details | ||
|
|
||
| - Per-sample hidden delta [bsz, 1, 512] + logit bias [bsz, 1, 1024] | ||
| - Scored-position masking (last stride=96 tokens per non-first window) | ||
| - 48 AdamW steps, cosine LR 0.012 -> 0.001 | ||
| - Model weights frozen, delta optimized through detached hidden states | ||
| - Eval time: ~409s on 8xH100 (under 10-min eval budget) | ||
|
|
||
| ## SLOT Scaling Behavior | ||
|
|
||
| | Steps | BPB (seed 1337) | Delta | | ||
| |-------|-----------------|-------| | ||
| | 16 | 0.949 | baseline | | ||
| | 24 | 0.868 | -0.081 | | ||
| | **48** | **0.745** | **-0.123** | | ||
|
|
||
| SLOT continues to improve well beyond the 24-32 step range. No sign of convergence at 48 steps. | ||
|
|
||
| ## Compliance | ||
|
|
||
| - **Frozen-model SLOT**: model weights are never modified during evaluation. Only per-window throwaway delta and logit_bias parameters are optimized, then discarded. Same evaluation pattern as accepted PRs #1176 and #1229. | ||
| - No n-gram cache, no eval-time GPTQ | ||
| - Self-contained, no network calls | ||
| - All seeds within time and size budgets | ||
|
|
||
| ## Reproduction | ||
|
|
||
| ```bash | ||
| torchrun --standalone --nproc_per_node=8 train_gpt.py | ||
| ``` | ||
|
|
||
| Training: ~600s. Eval: ~409s. Total: ~17 min. | ||
|
|
||
| ## Credits | ||
|
|
||
| - Base: PR #175, PR #1303, PR #1313 (@anthony-maio) | ||
| - SLOT: Hu et al. arXiv:2505.12392v2, PR #1176 (@bigbag), PR #1229 (@resouer) | ||
| - QK-Gain 4.0: PR #1125 | ||
| - XSA: PR #1176 (@bigbag) | ||
| - VRL: ResFormer (arXiv:2410.17897) | ||
| Original file line number | Diff line number | Diff line change | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,19 @@ | ||||||||||||||
| { | ||||||||||||||
| "name": "SLOT48_LR012_Stride96", | ||||||||||||||
| "author": "Anthony Maio", | ||||||||||||||
| "github_id": "anthony-maio", | ||||||||||||||
| "date": "2026-04-03", | ||||||||||||||
| "track": "10min_16mb", | ||||||||||||||
| "num_gpus": 8, | ||||||||||||||
| "gpu_type": "H100 SXM", | ||||||||||||||
| "training_time_seconds": 600, | ||||||||||||||
| "seed_results": { | ||||||||||||||
| "1337": {"val_loss": 1.25793247, "val_bpb": 0.74502015, "steps": 6034, "artifact_bytes": 15815983}, | ||||||||||||||
| "42": {"val_loss": 1.24104846, "val_bpb": 0.73502047, "steps": 6563, "artifact_bytes": 15751595}, | ||||||||||||||
| "2024": {"val_loss": 1.25222813, "val_bpb": 0.74164171, "steps": 6568, "artifact_bytes": 15793375} | ||||||||||||||
|
||||||||||||||
| "1337": {"val_loss": 1.25793247, "val_bpb": 0.74502015, "steps": 6034, "artifact_bytes": 15815983}, | |
| "42": {"val_loss": 1.24104846, "val_bpb": 0.73502047, "steps": 6563, "artifact_bytes": 15751595}, | |
| "2024": {"val_loss": 1.25222813, "val_bpb": 0.74164171, "steps": 6568, "artifact_bytes": 15793375} | |
| "1337": {"val_loss": 1.25793247, "val_bpb": 0.74502015, "steps": 6578, "artifact_bytes": 15815983}, | |
| "42": {"val_loss": 1.24104846, "val_bpb": 0.73502047, "steps": 6576, "artifact_bytes": 15751595}, | |
| "2024": {"val_loss": 1.25222813, "val_bpb": 0.74164171, "steps": 6588, "artifact_bytes": 15793375} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The README’s “Steps” column doesn’t match the actual training stop steps in the included logs (e.g., seed 42 stops at 6576 in
train_seed42.log, seed 2024 at 6588, seed 1337 at 6578). Please update the table so the reported step counts are consistent with the logs.