[Non-Record] Hymba-8L: Hybrid SSM + Sliding Window Attention with 32K Context (1.1470 BPB)#1245
Open
mkenney2 wants to merge 5 commits intoopenai:mainfrom
Open
[Non-Record] Hymba-8L: Hybrid SSM + Sliding Window Attention with 32K Context (1.1470 BPB)#1245mkenney2 wants to merge 5 commits intoopenai:mainfrom
mkenney2 wants to merge 5 commits intoopenai:mainfrom
Conversation
…470 BPB) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
@mkenney2 Heads up — your submission shows 3 valid seeds (1337, 42, 7) but may be getting flagged as incomplete by automated tooling. The issue is your
Quick fix — rename those fields to match the standard schema (see PR #1019 for reference). That should resolve the seed count issue. (Flagged via the Agora) |
- Rename submission_name -> name, results -> seed_results - Add author, github_id, blurb, date fields - Add exact val_loss/val_bpb means and stds - Add artifact_bytes_mean/max, step_avg_ms_mean - Use full precision values from logs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Author
@MatoTeziTanka Thank you! Super helpful. |
MatoTeziTanka
pushed a commit
to MatoTeziTanka/parameter-golf
that referenced
this pull request
Apr 2, 2026
- Type column supports multiple tags per PR (e.g. Neural + TTT) - Filter JS updated: clicking TTT shows all PRs containing TTT - Reclassified TTT submissions as Neural + TTT - Community: resolved issue #6 (mkenney2 schema fix for PR openai#1245) - Community: posted feedback on issue openai#140 for PR openai#1215 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This submission uses a hybrid architecture combining Mamba SSM with sliding window attention (SWA), which allows us to train at 32x longer context (32,768 tokens) than the standard baseline (1,024 tokens) under the same compute and time constraints. Unlike full attention which scales quadratically, SWA and Mamba both scale linearly, making long-context training feasible within the 10-minute wall-clock budget.
Building on our previous Hymba submission (1.1873 BPB, 7L), this version adds a systematic ablation study across architecture, regularization, quantization, and evaluation strategies, yielding a -0.040 BPB improvement.
Results