Skip to content

[Non-Record] Whirlpool v5b — Non-Euclidean Lorentzian Attention on the Hyperboloid Manifold#1239

Open
tmancino wants to merge 1 commit intoopenai:mainfrom
tmancino:whirlpool-v5b-submission
Open

[Non-Record] Whirlpool v5b — Non-Euclidean Lorentzian Attention on the Hyperboloid Manifold#1239
tmancino wants to merge 1 commit intoopenai:mainfrom
tmancino:whirlpool-v5b-submission

Conversation

@tmancino
Copy link
Copy Markdown

@tmancino tmancino commented Apr 1, 2026

Summary

Non-record submission exploring non-Euclidean geometry in attention. To our knowledge, the first submission to replace dot-product attention with Minkowski inner products on a hyperboloid manifold.

  • val_bpb: 1.5918 (SEED=314) | 12.2 MB artifact | 8xH100 SXM
  • Key finding: Lorentzian attention is trainable with proper stabilization (scale clamping + 20% warmup = -0.88 BPB)
  • Custom Flash Lorentz Attention Triton kernel — fused hyperboloid projection + Minkowski inner + centroid aggregation
  • 3 shared blocks x 8 curvature-progressive orbits (23.9M stored params, 8x depth via weight sharing)

What's Novel

  1. Lorentzian attention: Queries/keys projected onto the hyperboloid, scored via Minkowski inner product instead of dot product
  2. Flash Lorentz Attention kernel: Custom Triton kernel fusing the full non-Euclidean pipeline — registered as custom_op for torch.compile compatibility
  3. Progressive curvature orbits: Each orbit sees a different curvature (0.1→2.0), from nearly-flat local patterns to highly-curved hierarchical dependencies
  4. Parallel GPU eval: After training, each GPU independently runs a different TTT hyperparameter — best reported

Architecture

Component Setting
d_model 768
Attention Lorentzian (Minkowski inner product on hyperboloid)
Heads 12 GQA 6:1, head_dim=64
MLP 5x LeakyReLU(0.5)² (fused Triton kernel)
Depth 8 orbits through 3 shared blocks
Curvature 0.1 to 2.0 (progressive)
Optimizer MuonAdamW (lr=0.04, wd=0.12)

Files

  • submission.json — leaderboard metadata
  • README.md — full architecture writeup, results, limitations, development journey
  • train_gpt.py — single-file submission (73KB)
  • train_seed314.log — full training + eval log

…e Hyperboloid Manifold

Non-record submission exploring non-Euclidean geometry in attention.
First submission to replace dot-product attention with Minkowski inner
products on a hyperboloid manifold.

val_bpb: 1.5918 (SEED=314) | 12.2 MB artifact | 8xH100 SXM
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant