Skip to content
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
# SLOT-48 — val_bpb 0.7406 (3-seed mean)

**val_bpb = 0.7406** (3-seed mean, std 0.0051) | 15.75-15.82 MB | 8xH100 SXM

## 3-Seed Results

| Seed | Sliding BPB | + SLOT BPB | Steps | Artifact |
|------|------------|------------|-------|----------|
| 1337 | 1.126 | **0.7450** | 6034 | 15,815,983 |
| 42 | 1.121 | **0.7350** | 6563 | 15,751,595 |
| 2024 | 1.122 | **0.7416** | 6568 | 15,793,375 |
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The README’s “Steps” column doesn’t match the actual training stop steps in the included logs (e.g., seed 42 stops at 6576 in train_seed42.log, seed 2024 at 6588, seed 1337 at 6578). Please update the table so the reported step counts are consistent with the logs.

Suggested change
| 1337 | 1.126 | **0.7450** | 6034 | 15,815,983 |
| 42 | 1.121 | **0.7350** | 6563 | 15,751,595 |
| 2024 | 1.122 | **0.7416** | 6568 | 15,793,375 |
| 1337 | 1.126 | **0.7450** | 6578 | 15,815,983 |
| 42 | 1.121 | **0.7350** | 6576 | 15,751,595 |
| 2024 | 1.122 | **0.7416** | 6588 | 15,793,375 |

Copilot uses AI. Check for mistakes.
| **Mean** | **1.123** | **0.7406** | | |

Beats PR #1313 (0.8637) by 0.123 BPB. Beats best pending (#1229, 0.9300) by 0.190 BPB.

## What Changed vs PR #1313

Only SLOT step count — same model, same training, same LR, same stride:

| Parameter | PR #1313 | This PR |
|-----------|----------|---------|
| SLOT_STEPS | 24 | **48** |

## Architecture

11L, 512d, 8H/4KV GQA, LeakyReLU(0.5)^2 MLP 3x, VRL, VE128, BigramHash(1024), XSA all 11 layers, QK-Gain 4.0, Partial RoPE 16/64, LN Scale, SmearGate, U-Net skips, EMA(0.997), Late QAT, int6+lzma, FA3 Hopper, Muon WD=0.04.

## SLOT-48 Details

- Per-sample hidden delta [bsz, 1, 512] + logit bias [bsz, 1, 1024]
- Scored-position masking (last stride=96 tokens per non-first window)
- 48 AdamW steps, cosine LR 0.012 -> 0.001
- Model weights frozen, delta optimized through detached hidden states
- Eval time: ~409s on 8xH100 (under 10-min eval budget)

## SLOT Scaling Behavior

| Steps | BPB (seed 1337) | Delta |
|-------|-----------------|-------|
| 16 | 0.949 | baseline |
| 24 | 0.868 | -0.081 |
| **48** | **0.745** | **-0.123** |

SLOT continues to improve well beyond the 24-32 step range. No sign of convergence at 48 steps.

## Compliance

- **Frozen-model SLOT**: model weights are never modified during evaluation. Only per-window throwaway delta and logit_bias parameters are optimized, then discarded. Same evaluation pattern as accepted PRs #1176 and #1229.
- No n-gram cache, no eval-time GPTQ
- Self-contained, no network calls
- All seeds within time and size budgets

## Reproduction

```bash
torchrun --standalone --nproc_per_node=8 train_gpt.py
```

Training: ~600s. Eval: ~409s. Total: ~17 min.

## Credits

- Base: PR #175, PR #1303, PR #1313 (@anthony-maio)
- SLOT: Hu et al. arXiv:2505.12392v2, PR #1176 (@bigbag), PR #1229 (@resouer)
- QK-Gain 4.0: PR #1125
- XSA: PR #1176 (@bigbag)
- VRL: ResFormer (arXiv:2410.17897)
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
{
"name": "SLOT48_LR012_Stride96",
"author": "Anthony Maio",
"github_id": "anthony-maio",
"date": "2026-04-03",
"track": "10min_16mb",
"num_gpus": 8,
"gpu_type": "H100 SXM",
"training_time_seconds": 600,
"seed_results": {
"1337": {"val_loss": 1.25793247, "val_bpb": 0.74502015, "steps": 6034, "artifact_bytes": 15815983},
"42": {"val_loss": 1.24104846, "val_bpb": 0.73502047, "steps": 6563, "artifact_bytes": 15751595},
"2024": {"val_loss": 1.25222813, "val_bpb": 0.74164171, "steps": 6568, "artifact_bytes": 15793375}
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The steps values in seed_results don’t match the actual stop steps shown in the corresponding train_seed*.log files (e.g., seed 42 stops at step 6576, seed 2024 at 6588, seed 1337 at 6578). Please update the JSON to reflect the logged training steps (or clarify what steps represents if it’s intentionally different).

Suggested change
"1337": {"val_loss": 1.25793247, "val_bpb": 0.74502015, "steps": 6034, "artifact_bytes": 15815983},
"42": {"val_loss": 1.24104846, "val_bpb": 0.73502047, "steps": 6563, "artifact_bytes": 15751595},
"2024": {"val_loss": 1.25222813, "val_bpb": 0.74164171, "steps": 6568, "artifact_bytes": 15793375}
"1337": {"val_loss": 1.25793247, "val_bpb": 0.74502015, "steps": 6578, "artifact_bytes": 15815983},
"42": {"val_loss": 1.24104846, "val_bpb": 0.73502047, "steps": 6576, "artifact_bytes": 15751595},
"2024": {"val_loss": 1.25222813, "val_bpb": 0.74164171, "steps": 6588, "artifact_bytes": 15793375}

Copilot uses AI. Check for mistakes.
},
"mean_val_loss": 1.2504,
"mean_val_bpb": 0.7406,
"std_val_bpb": 0.0051,
"blurb": "SLOT-48 with LR 0.012 (cosine to 0.001), stride=96, per-sample delta + logit bias, scored-position masked. Same architecture as PRs #1303 and #1313."
}
Loading
Loading