Skip to content

[Non Record] Fractal recurrent primitive hybrid - SP1024 1xH100#1569

Closed
abbudjoe wants to merge 6 commits intoopenai:mainfrom
abbudjoe:codex/p20-hybrid-nonrecord
Closed

[Non Record] Fractal recurrent primitive hybrid - SP1024 1xH100#1569
abbudjoe wants to merge 6 commits intoopenai:mainfrom
abbudjoe:codex/p20-hybrid-nonrecord

Conversation

@abbudjoe
Copy link
Copy Markdown

@abbudjoe abbudjoe commented Apr 12, 2026

Summary

Non-record research submission for a custom Fractal recurrent primitive hybrid in the Parameter Golf stack.

This is intentionally not a leaderboard claim. It packages a controlled 1xH100 ablation where a single middle transformer block was replaced with a recurrent primitive (AAAAAPAAAAA) while keeping the SP1024 tokenizer/data path, optimizer, evaluation path, and quantization machinery fixed.

The headline comparison is a single matching seed (seed=42), not a 3-seed mean. We do not yet have a 3-seed confirmation for this exact configuration.

Attribution

This work was guided by the public leaderboard meta, especially the current #1 record. The recurrent primitive is the new variable here; the surrounding stack borrows several proven ingredients from prior public work.

Used ingredient Credit How it appears here
Mixed int6/int8 quantization pressure and protected higher-precision export variants PR #1394 @clarkkev The source runs use mixed int6 clipsearch + zstd, and the best 10-minute export is an all-large-int8/zstd protection sweep.
Learnable per-head QK gain machinery #1 record stack The transformer attention path includes learnable per-head query scaling.
EMA 0.9965 and warmdown-style schedules PR #1445 @X-Abhishek-X Both the 10-minute and 60-minute recurrent runs use EMA decay 0.9965; the 60-minute probe also uses a longer warmdown schedule.

Headline 10-Minute Result (single matching seed)

Model Quant/export Pre BPB Post BPB Post loss Bytes
Pure attention control all-large-int8 1.343710 1.344724 2.270510 14,966,424
Fractal recurrent hybrid AAAAAPAAAAA all-large-int8 1.356221 1.357619 2.292283 14,440,584

Extended 60-Minute Probe

We also ran a non-competition-timed trajectory probe with schedule AAAPAAAAPAA, warmdown 4000, EMA 0.9965, mixed int6 clipsearch + zstd, and compile/prewarm held outside the training timer.

Run Steps Train time Pre BPB Post BPB Post loss Bytes
Fractal recurrent hybrid 60-minute probe 5,342 3,600.034s 1.230186 1.241819 2.701189 16,179,345

The hour run is not leaderboard-eligible: it is outside the 10-minute rule and slightly over 16MB. It is included as research signal because it shows the recurrent primitive continued improving substantially when given more wall-clock time.

TL;DR Learning

The recurrent primitive is close enough to be useful research signal, but not close enough to beat pure attention as a direct block replacement. The most actionable result is that its quantization damage can be mostly removed under all-large-int8: post-minus-pre BPB improves from about +0.0198 to +0.0014 while staying under 16MB in the 10-minute checkpoint requantization.

What This PR Claims

  • A reusable Fractal recurrent hybrid baseline for future recurrent/context-state experiments.
  • Evidence that direct attention replacement is not the right insertion contract for this surface.
  • Evidence that recurrent/state matrices need quantization protection.
  • A clean next direction: keep the transformer stack and use recurrent state as a side-channel, looped adapter, context-state module, or TTT/memory-efficiency mechanism.

What This PR Does Not Claim

  • It does not claim a record.
  • It does not claim the recurrent primitive currently beats pure attention.
  • It does not claim the recurrent path is exhausted.
  • It does not claim 3-seed significance for the headline result.

This is offered as a transparent negative/control result so others can avoid repeating the naive replacement experiment and focus on the more plausible recurrent side-channel/context path.

@abbudjoe abbudjoe changed the title [Non Record] P20 recurrent primitive hybrid - SP1024 1xH100 [Non Record] Fractal recurrent primitive hybrid - SP1024 1xH100 Apr 12, 2026
@abbudjoe abbudjoe closed this Apr 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant