Record: SP4096 + Depth Recurrence + Parallel Residuals + MuonEq-R + QK-Gain 5.0 — val_bpb 1.0897 (3-seed mean)#1296
Open
aryanbhosale wants to merge 4 commits intoopenai:mainfrom
Conversation
…0940 (3-seed mean) 4096-vocab + MLP 4x + WD 0.090 + depth recurrence (layers 4,5) + MuonEq-R + full GPTQ int6 + brotli + selective pruning. 3-seed mean: 1.0940 BPB, beating merged SOTA (PR openai#1019, 1.1147 BPB) by 0.0208 BPB.
…d mean) LZMA self-extracting code wrapper (24KB vs 81KB) frees 57KB for model precision. No pruning needed. 3-seed mean improves from 1.0940 to 1.0926.
Added parallel residuals from layer 7+ (separate attn/MLP lanes). 3-seed mean improves from 1.0926 to 1.0904.
QK-Gain from 4.0 to 5.0 plus parallel residuals and depth recurrence. 3-seed mean: 1.0897 BPB (std 0.0003), delta -0.0250 vs merged SOTA.
resouer
pushed a commit
to resouer/parameter-golf
that referenced
this pull request
Apr 3, 2026
Port depth recurrence from PR openai#1290 and parallel residuals from PR openai#1296. - Depth recurrence: layers 3,4 repeated in forward pass via virtual layer mapping - Parallel residuals: attn+mlp computed in parallel from layer 6 onward - Configurable via RECUR_LAYERS, RECUR_START_STEP, PARALLEL_START_LAYER env vars
resouer
pushed a commit
to resouer/parameter-golf
that referenced
this pull request
Apr 3, 2026
Ports parallel residuals from PR openai#1296 to openai#1290 base: - Block.__init__ accepts parallel flag - Block.forward() computes attn+mlp in parallel when parallel=True - GPT.__init__ passes parallel_start_layer to Block constructors - Layers 7-10 run parallel, layers 0-6 sequential (default PARALLEL_START_LAYER=7) - Both base_model and eval_model wired up
resouer
pushed a commit
to resouer/parameter-golf
that referenced
this pull request
Apr 4, 2026
- QK_GAIN_INIT: 1.5 -> 5.0 (matches openai#1296 proven config) - WARMDOWN_ITERS: already 4000 (matches openai#1290 run command) - MULTIRES_ENABLED: 1 -> 0 (multi-res failed: only 1.13x speedup) - BIGRAM: revert to 2048x128 (3072x112 exceeded 16MB artifact limit)
resouer
pushed a commit
to resouer/parameter-golf
that referenced
this pull request
Apr 5, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Record: SP4096 + Depth Recurrence + Parallel Residuals + MuonEq-R + QK-Gain 5.0
val_bpb = 1.0897 (3-seed mean, std 0.0003) | ~15.99 MB | 8×H100 SXM
3-Seed Results
Merged SOTA (PR #1019): 1.1147 BPB. Delta: −0.0250 BPB.
Key Techniques
Compliance
Reproduction
Credits