Skip to content

Record: dTTT + BigramHash 3072×112 — val_bpb 1.0800 (3-seed mean)#1408

Open
aamodbhatt wants to merge 1 commit intoopenai:mainfrom
aamodbhatt:record-2026-04-06-dttt-bh3072
Open

Record: dTTT + BigramHash 3072×112 — val_bpb 1.0800 (3-seed mean)#1408
aamodbhatt wants to merge 1 commit intoopenai:mainfrom
aamodbhatt:record-2026-04-06-dttt-bh3072

Conversation

@aamodbhatt
Copy link
Copy Markdown

Record Summary

Final submitted score: val_bpb 1.0800 (std 0.0002)
Reference neural roundtrip: 1.09935 (std 0.00007)

Hardware: 8×H100 SXM | Artifact: ≤15.9 MB | Training: ≤600s

What changed

3-Seed Results

Seed final val_bpb roundtrip val_bpb train_s eval_s bytes_total
1337 1.08017 1.09941 600 102 15,873,363
42 1.07980 1.09926 600 102 15,895,227
2025 1.08018 1.09938 600 78 15,865,471
Mean 1.0800 1.09935 - - -
Std 0.0002 0.00007 - - -

Submission Checklist

  • One folder added: records/track_10min_16mb/2026-04-06_dTTT_BH3072_11L_8xH100/
  • README.md, submission.json, train_gpt.py, 3 seed logs present
  • Training ≤ 600s (all seeds stopped at wallclock cap)
  • All artifacts ≤ 16,000,000 bytes
  • No tokenizer or dataset edits
  • Track A — no eval-time adaptation: standard autoregressive sliding-window eval only

Metric Verification

  • Score from final_int6_sliding_window_exact in each seed log
  • Roundtrip from final_int6_roundtrip_exact in each seed log
  • Artifact size from Total submission size int6+lzma in each seed log

Credits

Discriminative pre-quant AdamW TTT (per-block LR 0.3x-1.0x, 10 epochs,
freeze=0) on BigramHash 3072x112 base. Builds on PR openai#1351 dTTT framework;
BigramHash scaled from 2048x128 to 3072x112. 3-seed mean 1.0800 (std 0.0002),
all artifacts under 16MB.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant