[Non Record] Fractal recurrent primitive hybrid - SP1024 1xH100#1569
Closed
abbudjoe wants to merge 6 commits intoopenai:mainfrom
Closed
[Non Record] Fractal recurrent primitive hybrid - SP1024 1xH100#1569abbudjoe wants to merge 6 commits intoopenai:mainfrom
abbudjoe wants to merge 6 commits intoopenai:mainfrom
Conversation
Updated README to clarify seed confirmation details.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Non-record research submission for a custom Fractal recurrent primitive hybrid in the Parameter Golf stack.
This is intentionally not a leaderboard claim. It packages a controlled 1xH100 ablation where a single middle transformer block was replaced with a recurrent primitive (
AAAAAPAAAAA) while keeping the SP1024 tokenizer/data path, optimizer, evaluation path, and quantization machinery fixed.The headline comparison is a single matching seed (
seed=42), not a 3-seed mean. We do not yet have a 3-seed confirmation for this exact configuration.Attribution
This work was guided by the public leaderboard meta, especially the current #1 record. The recurrent primitive is the new variable here; the surrounding stack borrows several proven ingredients from prior public work.
Headline 10-Minute Result (single matching seed)
AAAAAPAAAAAExtended 60-Minute Probe
We also ran a non-competition-timed trajectory probe with schedule
AAAPAAAAPAA, warmdown 4000, EMA 0.9965, mixed int6 clipsearch + zstd, and compile/prewarm held outside the training timer.The hour run is not leaderboard-eligible: it is outside the 10-minute rule and slightly over 16MB. It is included as research signal because it shows the recurrent primitive continued improving substantially when given more wall-clock time.
TL;DR Learning
The recurrent primitive is close enough to be useful research signal, but not close enough to beat pure attention as a direct block replacement. The most actionable result is that its quantization damage can be mostly removed under all-large-int8: post-minus-pre BPB improves from about +0.0198 to +0.0014 while staying under 16MB in the 10-minute checkpoint requantization.
What This PR Claims
What This PR Does Not Claim
This is offered as a transparent negative/control result so others can avoid repeating the naive replacement experiment and focus on the more plausible recurrent side-channel/context path.