Non-record: GDN Hybrid (E2E TTT / State-Space Model) — val_bpb 1.14502 by andrewbaggio1 · Pull Request #1479 · openai/parameter-golf

andrewbaggio1 · 2026-04-08T18:29:41Z

Non-record: GDN Hybrid — Gated DeltaNet as E2E TTT / State-Space Model

val_bpb: 1.14502 (seed 1234, 8xH100, 600s)

Summary

Replaces 8 of 10 attention layers with Gated DeltaNet (Yang et al., ICLR 2025). GDN is mathematically equivalent to E2E TTT-Linear with MSE loss — the delta rule update S_t = α·S·(I - β·k·kᵀ) + β·v·kᵀ is exactly one step of SGD on L = 0.5·‖S·k - v‖², trained end-to-end.

Targets bounty items: E2E TTT + State-space models.

Architecture

8 GDN layers + 2 softmax attention (positions 4, 8)
dim=512, 8 heads, MLP 3x, SP8192, GPTQ int6/int8, SDClip, EMA
FLA v0.4.2 GatedDeltaNet with chunk-parallel Triton kernels
37.4M params, 13.83 MB artifact

Results

Not competitive with softmax attention at 10-min budget: 4.91M tok/s (GDN) vs 6.93M tok/s (attention), yielding 3673 vs 4624 steps. The 20% training deficit is not compensated by GDN's per-step learning advantage at this scale. However, training is stable, GPTQ works cleanly, and PR #1370 showed 1.003 BPB is achievable with unlimited compute.

Credits

Builds on @clarkkev's #1394, FLA library by @sustcsonglin, and PureGDN work by @Christopher-Lee-McClendon (#1370).

8 Gated DeltaNet layers + 2 softmax attention layers. GDN is mathematically equivalent to E2E TTT-Linear with MSE loss. First competitive GDN hybrid in the 10-min budget. Targets bounty items: E2E TTT + State-space models. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-record: GDN Hybrid (E2E TTT / State-Space Model) — val_bpb 1.14502#1479

Non-record: GDN Hybrid (E2E TTT / State-Space Model) — val_bpb 1.14502#1479
andrewbaggio1 wants to merge 1 commit intoopenai:mainfrom
andrewbaggio1:nonrecord/gdn-hybrid-e2e-ttt

andrewbaggio1 commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

andrewbaggio1 commented Apr 8, 2026

Non-record: GDN Hybrid — Gated DeltaNet as E2E TTT / State-Space Model

Summary

Architecture

Results

Credits

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant