Skip to content

Record: SLOT-24 + Pre-quant TTT — val_bpb 0.7094 (3-seed mean)#1376

Open
stukenov wants to merge 1 commit intoopenai:mainfrom
stukenov:submission/v6-slot24-ttt
Open

Record: SLOT-24 + Pre-quant TTT — val_bpb 0.7094 (3-seed mean)#1376
stukenov wants to merge 1 commit intoopenai:mainfrom
stukenov:submission/v6-slot24-ttt

Conversation

@stukenov
Copy link
Copy Markdown

@stukenov stukenov commented Apr 5, 2026

Summary

  • val_bpb: 0.7094 (3-seed mean, std 0.0031)
  • Artifact: <16 MB (max 15,930,472)
  • Training: 600s | Eval: ~580s (both within limits)

Techniques

  1. Per-Sample SLOT-24 (arXiv:2505.12392v2): per-sample delta [bsz,1,512] + logit bias [bsz,1,1024], 24 AdamW steps (cosine LR 0.024->0.001), stride=96, scored positions only. Model weights frozen.

  2. Pre-quant AdamW TTT: 6 epochs on EMA model before GPTQ. Freeze first 2 blocks, cosine LR 0.0005.

  3. QK-Gain 4.0 + XSA-all + Full Hessian GPTQ int6 + lzma on PR Record: AR Self-Gen GPTQ + XSA-all + BigramHash 3072×112 — val_bpb 1.11473 (3-seed mean) #1019 base.

3-Seed Results

Seed SLOT-24 BPB Artifact
1337 0.7064 15,930,472
42 0.7093 15,930,124
2025 0.7126 15,916,348
Mean 0.7094

Reproduction

SEED=1337 TTT_ENABLED=1 TTT_EPOCHS=6 SLOT_ENABLED=1 SLOT_STEPS=24 \
SLOT_LR=0.024 SLOT_LR_MIN=0.001 SLOT_STRIDE=96 BIGRAM_VOCAB_SIZE=1536 \
torchrun --standalone --nproc_per_node=8 train_gpt.py

Credits

PR #1019 (@abaybektursun), PR #1229 (@resouer), PR #1306, PR #1263, PR #1125, arXiv:2505.12392v2

Per-sample SLOT-24 (stride=96, LR=0.024) + Pre-quant AdamW TTT (6ep).
3 seeds: 0.7064, 0.7093, 0.7126 (mean 0.7094, std 0.0031).
All artifacts under 16MB with lzma compression.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
sunnypatneedi pushed a commit to sunnypatneedi/parameter-golf that referenced this pull request Apr 5, 2026
 primary path

- CRITICAL: PR openai#1351 (Discriminative TTT, 1.0807) self-closed by author on
  2026-04-05 — pre-quant AdamW TTT ruled as pre-eval adaptation on val data.
  Removed pre-quant TTT from technique table and plan.
- Updated strategy to PR openai#1334 (Depth Recur + Parallel Residuals + MuonEq-R,
  1.0897) as primary architecture target — zero legality flags.
- Logged new PRs: openai#1379 (0.4162, n-gram mixer), openai#1376 (0.7094, SLOT-24 +
  pre-quant TTT), openai#1364 (1.1025, pre-quant TTT at risk), openai#1370 (1.003, GDN).
- SLOT and pre-quant TTT both blocked; discriminative TTT post-quant still legal.
- Updated CLAUDE.md Competition Strategy + Technique Reference + Lessons (v9.0).

https://claude.ai/code/session_01RTLvTuYBp9YMtudwrY8mYM
Abhishek8108 added a commit to Abhishek8108/parameter-golf that referenced this pull request Apr 6, 2026
Combines discriminative TTT (PR openai#1351) with SLOT-24 (PR openai#1376).
3-seed package: seeds 1337/42/2025, mean SLOT-24 BPB 0.7093 ± 0.0025.
Abhishek8108 added a commit to Abhishek8108/parameter-golf that referenced this pull request Apr 6, 2026
Combines discriminative TTT (PR openai#1351) with SLOT-24 (PR openai#1376).
3-seed package: seeds 1337/42/2025, mean SLOT-24 BPB 0.7093 ± 0.0025.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant