Skip to content

[Non-Record] Hymba-8L: Hybrid SSM + Sliding Window Attention with 32K Context (1.1470 BPB)#1245

Open
mkenney2 wants to merge 5 commits intoopenai:mainfrom
mkenney2:hymba-ssm4-submission
Open

[Non-Record] Hymba-8L: Hybrid SSM + Sliding Window Attention with 32K Context (1.1470 BPB)#1245
mkenney2 wants to merge 5 commits intoopenai:mainfrom
mkenney2:hymba-ssm4-submission

Conversation

@mkenney2
Copy link
Copy Markdown

@mkenney2 mkenney2 commented Apr 2, 2026

Summary

This submission uses a hybrid architecture combining Mamba SSM with sliding window attention (SWA), which allows us to train at 32x longer context (32,768 tokens) than the standard baseline (1,024 tokens) under the same compute and time constraints. Unlike full attention which scales quadratically, SWA and Mamba both scale linearly, making long-context training feasible within the 10-minute wall-clock budget.

Building on our previous Hymba submission (1.1873 BPB, 7L), this version adds a systematic ablation study across architecture, regularization, quantization, and evaluation strategies, yielding a -0.040 BPB improvement.

Results

Seed val_bpb val_loss Steps Artifact Size
1337 1.1474 1.9374 6,621 15.7 MB
42 1.1469 1.9366 6,620 15.6 MB
7 1.1468 1.9363 6,606 15.3 MB
Mean 1.1470 ± 0.0003
  • Training: 600s on 8xH100 SXM, ~90.7 ms/step
  • Evaluation: Score-first TTT (25 epochs), ~580s
  • Artifact: int8 + zstd-22, under 16 MB

mkenney2 and others added 4 commits April 1, 2026 19:17
…470 BPB)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@MatoTeziTanka
Copy link
Copy Markdown

@mkenney2 Heads up — your submission shows 3 valid seeds (1337, 42, 7) but may be getting flagged as incomplete by automated tooling. The issue is your submission.json uses a non-standard schema:

  1. "results" should be "seed_results" — every merged submission uses this key
  2. "submission_name" should be "name"
  3. Missing "author" and "github_id" fields

Quick fix — rename those fields to match the standard schema (see PR #1019 for reference). That should resolve the seed count issue.

(Flagged via the Agora)

- Rename submission_name -> name, results -> seed_results
- Add author, github_id, blurb, date fields
- Add exact val_loss/val_bpb means and stds
- Add artifact_bytes_mean/max, step_avg_ms_mean
- Use full precision values from logs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@mkenney2
Copy link
Copy Markdown
Author

mkenney2 commented Apr 2, 2026

@mkenney2 Heads up — your submission shows 3 valid seeds (1337, 42, 7) but may be getting flagged as incomplete by automated tooling. The issue is your submission.json uses a non-standard schema:

  1. "results" should be "seed_results" — every merged submission uses this key
  2. "submission_name" should be "name"
  3. Missing "author" and "github_id" fields

Quick fix — rename those fields to match the standard schema (see PR #1019 for reference). That should resolve the seed count issue.

(Flagged via the Agora)

@MatoTeziTanka Thank you! Super helpful.

MatoTeziTanka pushed a commit to MatoTeziTanka/parameter-golf that referenced this pull request Apr 2, 2026
- Type column supports multiple tags per PR (e.g. Neural + TTT)
- Filter JS updated: clicking TTT shows all PRs containing TTT
- Reclassified TTT submissions as Neural + TTT
- Community: resolved issue #6 (mkenney2 schema fix for PR openai#1245)
- Community: posted feedback on issue openai#140 for PR openai#1215

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants