Record: Coprime-Stride Loader + Full GPTQ + XSA-all — val_bpb 1.1133 (3-seed mean)#1099
Closed
Bortlesboat wants to merge 2 commits intoopenai:mainfrom
Closed
Record: Coprime-Stride Loader + Full GPTQ + XSA-all — val_bpb 1.1133 (3-seed mean)#1099Bortlesboat wants to merge 2 commits intoopenai:mainfrom
Bortlesboat wants to merge 2 commits intoopenai:mainfrom
Conversation
…(3-seed mean) 3-seed results: 1.1136/1.1133/1.1139 (mean 1.1136, std 0.0003) Built on PR openai#549 + PR openai#1060 with optimized GPTQ reserve (10s vs 14s).
Improved from 1.1136 to 1.1133 by reducing GPTQ reserve from 10s to 9s. Seeds: 1.1133/1.1132/1.1133 (mean 1.1133, std 0.0001) All artifacts under 16MB.
theLightArchitect
added a commit
to theLightArchitect/parameter-golf
that referenced
this pull request
Mar 30, 2026
Single innovation: coprime-stride shard traversal. Instead of reading shards 0,1,2,...,79, reads 0,7,14,...,77,4,11,... where stride=7 is coprime to 80 shards. Prevents repeated token sequences across epochs. PR openai#1099 gets 1.1136 with this (vs 1.1217 baseline). 12 lines added. Zero HP changes. Zero architecture changes. Same quantization path. Artifact unchanged. Co-Authored-By: Kevin Tan <kft@lightarchitects.io> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Author
|
Superseded by #1169 (better score). Closing. |
taka6745
pushed a commit
to taka6745/parameter-golf
that referenced
this pull request
Apr 7, 2026
…IP, I overrode to PASS Subagent found arxiv:2505.15134 (Entropy Minimization at Inference, NeurIPS 2025) and recommended ship. I reversed to PASS after working out the math: EM-INF is equivalent to temperature sharpening, and cross-entropy for a calibrated MLE model is minimized at T=1 by definition. Moving T away from 1 in either direction strictly increases in-distribution NLL. Same class of trap as Patch 14 (entropy-adaptive, already falsified). No push. Better directions logged for next fire: PR openai#1437 N-gram Tilt (multiplicative not sharpening), BPE-8192 tables, Coprime-Stride from merged record openai#1099. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
taka6745
pushed a commit
to taka6745/parameter-golf
that referenced
this pull request
Apr 7, 2026
EL2 cycle-2 = 3.2742 (only +0.0008 above champion 3.2734) reversed the audit fire openai#1 verdict that EngramLite was falsified. Adding 4 new EL multi-seed experiments to confirm: - EL3 (seed 1337), EL4 (seed 999), EL5 (seed 7) - EL6 with L5 weights (0.15/0.20/0.15) — new combination Removed 15 dead/falsified configs that wasted cycle 2 compute: EA*, BG*, NG*, TH*, MEGA, MTP0/2/3, MTP1_seed999, PR2/3, EL0. Also captured EMA(0.997) canonical spec from 6 merged records (openai#287, openai#315, openai#414, openai#1019, openai#1099) — DEFERRED actual Patch 17 ship because EMA only affects final val_bpb (not loop train_loss) and training-loop anchoring is risky without reading train_gpt.py. Queue now cycles in ~100 min (vs 185 min) leaving more compute for the EL family expansion. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
taka6745
pushed a commit
to taka6745/parameter-golf
that referenced
this pull request
Apr 7, 2026
…PEC captured Subagent extracted percentile-based int6 quantization pattern from PR openai#1099, openai#1019, openai#1444 (3+ merged records). No Hessian needed, ~130 LOC, lzma-22 instead of zlib for ~0.5MB size headroom. Direct BPB gain is only -0.0003 (within noise) — the real value is freed size budget that could fund extra model capacity. DEFERRED actual Patch 23 ship: same metric problem as Tilt + EMA (loop train_loss unaffected by serialization), plus serialization code is the highest-risk path to break before submission. Captured spec is drop-in ready for next H100 escalation cycle. Three specs now queued for combined H100 escalation: - USE_NGRAM_TILT_EVAL (task openai#53) - USE_EMA (task openai#45) - USE_INT6_GPTQ (new) Combined estimated gain: +0.003 to +0.008 BPB. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
taka6745
pushed a commit
to taka6745/parameter-golf
that referenced
this pull request
Apr 7, 2026
…eferred (upstream stateless) Two-subagent investigation of coprime-stride loader from PR openai#1099/openai#1060. First subagent confirmed 26 PRs use it, top merged record uses it, ~0.01 BPB estimated gain. Second subagent extracted exact upstream DistributedTokenLoader code: it's COMPLETELY STATELESS (~10 lines, just slices TokenStream). PR openai#1099's implementation is NOT a small patch — it's a fundamental rewrite adding stateful per-shard cursor management. Real implementation is 60-100 LOC, needs to interact with TokenStream class I haven't read yet. DEFERRED because data loader is on the critical path — buggy patch could silently corrupt training data. Better to validate existing MS3/EL/MR cycle 2+3 results first. Spec captured for next focused research fire. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
taka6745
pushed a commit
to taka6745/parameter-golf
that referenced
this pull request
Apr 7, 2026
…prime stride sampling Inspired by PR openai#1099/openai#1060/openai#1135 which use TOKEN-level coprime stride. Token-level needs 60+ LOC rewrite of TokenStream (no random access). Shipping the SHARD-LEVEL variant: modify _advance_file() to use a coprime stride instead of +1, so nearby training steps see topically-different shards rather than adjacent similar ones. Implementation: 13 LOC, two anchors in TokenStream class (none of the existing 24 patches touch TokenStream — verified via grep). Gated by USE_COPRIME_STRIDE=1, falls back to stride=1 default. Idempotent via COPRIME_STRIDE_MARKER. Effect: with N shards and gcd(s,N)=1, iterates 0->s->2s->... covering all shards before repeating. Max spacing diversity = better gradient noise reduction. Smaller benefit than full token-level (~25% per PR openai#1099 logic), but ships TODAY at near-zero risk vs. 60+ LOC structural rewrite. 4 CS experiments queued: CS0_alone, CS1_seed42, CS2_L4weights, CS3_with_engram. This is the FIRST data-side patch in our 24-patch stack. Tests a completely new vector after the "neutrality plateau" of architectural/optimizer/training-time patches. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
taka6745
pushed a commit
to taka6745/parameter-golf
that referenced
this pull request
Apr 7, 2026
…ntified as top missing technique Patches 15/16/21 + NEW Patch 20 USE_COPRIME_STRIDE all uncontested in 150+ open + 20 closed PRs (7 consecutive audits for the original 3, first confirmation for Patch 20 just shipped 3h ago). CRITICAL FINDING: XSA (Cross-Sequence Attention) is in 4+ MERGED records (PR openai#1019, openai#287, openai#315, openai#265, latest openai#1099) and we have ZERO attention-mask variants. Most-validated missing technique. ~200 LOC moderate port — too big for a single research fire but worth a focused 30-45 min investigation if we can find a minimal variant. SLOT (Score-First TTT) is the openai#2 missing (PR openai#549, ~100 LOC) but it's eval-time, joins the H100 escalation bundle category. H100 escalation candidate updated: NEW: CHAMP_L4 + COPRIME_STRIDE + EL + (EMA + Tilt + INT6 GPTQ) OLD: CHAMP_L4 + EL + (EMA + Tilt + INT6 GPTQ) Need CS2 cycle 2+3 for n=3 mean confirmation before escalating. PR openai#1430 still OPEN, 0 comments, no comp owner activity for 16h+. Spend ~$4.00/$36 (11.1%). Pod healthy at 7h 50min uptime. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
taka6745
pushed a commit
to taka6745/parameter-golf
that referenced
this pull request
Apr 7, 2026
…port from 100+ PRs From arxiv:2603.09078 + PR openai#1099 (latest merged) + 4+ other merged records. ~12 LOC inline insert in CausalSelfAttention.forward after GATED_ATTENTION block. 0 new params. Removes self-value projection from attention output. 4 XSA experiments queued: alone, seed42, +coprime, full stack. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
3-Seed Results
What's New
Stack
Compliance
See README.md for full details.