Skip to content

Record: SLOT-48 — val_bpb 0.7406 (3-seed mean)#1321

Open
anthony-maio wants to merge 3 commits intoopenai:mainfrom
anthony-maio:submission/slot48-aggressive
Open

Record: SLOT-48 — val_bpb 0.7406 (3-seed mean)#1321
anthony-maio wants to merge 3 commits intoopenai:mainfrom
anthony-maio:submission/slot48-aggressive

Conversation

@anthony-maio
Copy link
Copy Markdown

Summary

  • val_bpb: 0.7406 (3-seed mean, std 0.0051)
  • Artifact: 15.75-15.82 MB (all seeds < 16MB)
  • Training: 600s on 8xH100 SXM | Eval: ~409s (sliding + SLOT)

3-Seed Results

Seed Sliding BPB + SLOT BPB Artifact
1337 1.126 0.7450 15,815,983
42 1.121 0.7350 15,751,595
2024 1.122 0.7416 15,793,375
Mean 1.123 0.7406

Beats merged SOTA (1.1147) by 0.374 BPB. Beats best pending (#1229, 0.9300) by 0.190 BPB.

What Changed vs PR #1313 (0.8637)

One parameter: SLOT_STEPS increased from 24 to 48. Same model, same training, same architecture.

SLOT Scaling (same model, different step counts)

Steps BPB Delta
16 (PR #1303) 0.946
24 (PR #1313) 0.864 -0.082
48 (this PR) 0.741 -0.123

SLOT-48 Details

  • Per-sample hidden delta [bsz, 1, 512] + logit bias [bsz, 1, 1024]
  • Scored-position masking (last stride=96 tokens per non-first window)
  • 48 AdamW steps, cosine LR 0.012 -> 0.001, weight_decay=1e-8
  • Model weights frozen, delta optimized through detached hidden states
  • Eval: ~409s (under 10-min eval budget)

Compliance

Reproduction

torchrun --standalone --nproc_per_node=8 train_gpt.py

Training: ~600s. Eval: ~409s. Total: ~17 min.

Credits

3-seed: 1337=0.7450, 42=0.7350, 2024=0.7416. All under 16MB.
Same model as openai#1313, only SLOT_STEPS increased 24->48.
Eval time 409s, within 10-min budget.
Copilot AI review requested due to automatic review settings April 4, 2026 04:07
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new 10min/16mb record entry for SLOT-48 evaluation-time tuning, reporting a 3-seed mean val_bpb of 0.7406 with artifacts under 16MB.

Changes:

  • Introduces a new record folder with the training/eval script (train_gpt.py) configured for SLOT_STEPS=48 by default.
  • Adds per-seed training logs and a submission.json summarizing 3-seed results/metadata.
  • Adds a README documenting results, deltas vs prior SLOT-24, and reproduction instructions.

Reviewed changes

Copilot reviewed 3 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
records/track_10min_16mb/2026-04-03_SLOT48_LR012_Stride96/train_gpt.py Training + eval script for the SLOT-48 record run (incl. SLOT eval path).
records/track_10min_16mb/2026-04-03_SLOT48_LR012_Stride96/train_seed42.log Seed 42 training/eval log used as evidence for reported metrics.
records/track_10min_16mb/2026-04-03_SLOT48_LR012_Stride96/train_seed2024.log Seed 2024 training/eval log used as evidence for reported metrics.
records/track_10min_16mb/2026-04-03_SLOT48_LR012_Stride96/train_seed1337.log Seed 1337 training/eval log used as evidence for reported metrics.
records/track_10min_16mb/2026-04-03_SLOT48_LR012_Stride96/submission.json Machine-readable result summary for the record submission.
records/track_10min_16mb/2026-04-03_SLOT48_LR012_Stride96/README.md Human-readable summary of results, changes vs prior PRs, and reproduction steps.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +11 to +13
"1337": {"val_loss": 1.25793247, "val_bpb": 0.74502015, "steps": 6034, "artifact_bytes": 15815983},
"42": {"val_loss": 1.24104846, "val_bpb": 0.73502047, "steps": 6563, "artifact_bytes": 15751595},
"2024": {"val_loss": 1.25222813, "val_bpb": 0.74164171, "steps": 6568, "artifact_bytes": 15793375}
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The steps values in seed_results don’t match the actual stop steps shown in the corresponding train_seed*.log files (e.g., seed 42 stops at step 6576, seed 2024 at 6588, seed 1337 at 6578). Please update the JSON to reflect the logged training steps (or clarify what steps represents if it’s intentionally different).

Suggested change
"1337": {"val_loss": 1.25793247, "val_bpb": 0.74502015, "steps": 6034, "artifact_bytes": 15815983},
"42": {"val_loss": 1.24104846, "val_bpb": 0.73502047, "steps": 6563, "artifact_bytes": 15751595},
"2024": {"val_loss": 1.25222813, "val_bpb": 0.74164171, "steps": 6568, "artifact_bytes": 15793375}
"1337": {"val_loss": 1.25793247, "val_bpb": 0.74502015, "steps": 6578, "artifact_bytes": 15815983},
"42": {"val_loss": 1.24104846, "val_bpb": 0.73502047, "steps": 6576, "artifact_bytes": 15751595},
"2024": {"val_loss": 1.25222813, "val_bpb": 0.74164171, "steps": 6588, "artifact_bytes": 15793375}

Copilot uses AI. Check for mistakes.
Comment on lines +9 to +11
| 1337 | 1.126 | **0.7450** | 6034 | 15,815,983 |
| 42 | 1.121 | **0.7350** | 6563 | 15,751,595 |
| 2024 | 1.122 | **0.7416** | 6568 | 15,793,375 |
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The README’s “Steps” column doesn’t match the actual training stop steps in the included logs (e.g., seed 42 stops at 6576 in train_seed42.log, seed 2024 at 6588, seed 1337 at 6578). Please update the table so the reported step counts are consistent with the logs.

Suggested change
| 1337 | 1.126 | **0.7450** | 6034 | 15,815,983 |
| 42 | 1.121 | **0.7350** | 6563 | 15,751,595 |
| 2024 | 1.122 | **0.7416** | 6568 | 15,793,375 |
| 1337 | 1.126 | **0.7450** | 6578 | 15,815,983 |
| 42 | 1.121 | **0.7350** | 6576 | 15,751,595 |
| 2024 | 1.122 | **0.7416** | 6588 | 15,793,375 |

Copilot uses AI. Check for mistakes.
Comment on lines +948 to +952
num_layers_total = max(
(int(k.split(".")[1]) for k in state_dict if k.startswith("blocks.")),
default=0,
) + 1

Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

num_layers_total is computed here but never used, which makes the quantization path harder to read/maintain. Please remove it (or use it if it’s intended for validation/metadata).

Suggested change
num_layers_total = max(
(int(k.split(".")[1]) for k in state_dict if k.startswith("blocks.")),
default=0,
) + 1

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants