Skip to content

Record: SP8192 + Parallel Residuals + Coprime-Stride — val_bpb 1.08459 (3-seed mean)#9

Closed
resouer wants to merge 1 commit intomainfrom
submission/sp8192-parallel-coprime
Closed

Record: SP8192 + Parallel Residuals + Coprime-Stride — val_bpb 1.08459 (3-seed mean)#9
resouer wants to merge 1 commit intomainfrom
submission/sp8192-parallel-coprime

Conversation

@resouer
Copy link
Copy Markdown
Owner

@resouer resouer commented Apr 6, 2026

Summary

3-seed mean val_bpb: 1.08459 (std 0.00069) | 15.99 MB | 8xH100 SXM | ~115s eval

Merged SOTA (PR openai#1019, 3-seed mean): 2.88218 nats (1.1147 BPB). This run: 2.80160 nats. Delta: -0.0806 nats. Clears the 0.005-nat threshold.

Results (3-seed)

Seed BPB val_loss (nats) Artifact
1337 1.08414 2.80045 15,985,531
42 1.08424 2.80070 15,989,295
2025 1.08538 2.80365 15,986,932
Mean 1.08459 2.80160

Changes from Base (PR openai#1394)

1. Parallel Residuals (from layer 7)

Layers 7-10 execute attention and MLP in parallel (PaLM-style). Zero additional parameters. Nearest PR: openai#1334 (parallel residuals on SP4096). Different: applied to SP8192 + depth recurrence stack.

2. Coprime-Stride Data Loader

Coprime-stride shard traversal for better data diversity. Not present in any SP8192 submission.

Compliance

  • All training-side architecture changes. No eval-time adaptation.
  • No SLOT, no TTT, no n-gram caches.
  • Eval under torch.inference_mode(). Weights frozen. GPTQ calibration on train data only.

Reproduction

pip install brotli
pip install flash_attn_3 --no-deps --find-links https://windreamer.github.io/flash-attention3-wheels/cu128_torch291/
torchrun --standalone --nproc_per_node=8 train_gpt.py

Credits

Base: PR openai#1394 (@clarkkev). Parallel residuals: PR openai#1334 (@aryanbhosale).

Risk

Codex independent review: compliance CLEAN, novelty CHALLENGED ("inherited ancestry plus tuning"). Submitting for internal review.

…9 (3-seed mean)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@resouer
Copy link
Copy Markdown
Owner Author

resouer commented Apr 6, 2026

Superseded by R12 result (1.08235 with TTT). New PR incoming.

@resouer resouer closed this Apr 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant