Commit cd7405e

and

committed

FiLM+SLOT implementation + SLOT24 baseline for comparison

SLOT (Scored-position Learnable Optimization at Test-time): - Per-sample delta [bsz,1,dim] + logit_bias [bsz,1,vocab] - 24 AdamW steps with cosine LR on frozen hidden states - Architecture-agnostic — works on any model with _encode() PR openai#1313 (SLOT-24) achieves 0.8637 BPB on 8×H100. PR openai#1229 achieves 0.9300 BPB. Both use SLOT on SOTA architecture. Running SLOT24 baseline on our 1×H100 for fair comparison. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

1 parent 6601b83 commit cd7405eCopy full SHA for cd7405e

3 files changed

+3109

-0

lines changed

experiments
- film_slot
  - train_gpt.py
- slot24_baseline
  - run_1gpu.log
  - train_gpt.py

3 files changed

+3109

-0

lines changed

Comments

(0)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit cd7405e

3 files changed

3 files changed

File tree

3 files changed

3 files changed

0 commit comments