Skip to content

Non-record: SLOT eval-time delta optimization + QK-Gain (val_bpb=1.1179)#1236

Open
ibarrajo wants to merge 1 commit intoopenai:mainfrom
ibarrajo:approach-e
Open

Non-record: SLOT eval-time delta optimization + QK-Gain (val_bpb=1.1179)#1236
ibarrajo wants to merge 1 commit intoopenai:mainfrom
ibarrajo:approach-e

Conversation

@ibarrajo
Copy link
Copy Markdown

@ibarrajo ibarrajo commented Apr 1, 2026

Summary

  • SLOT (Stochastic Logit Offset Tuning): eval-time delta optimization that adjusts logit biases per-token, achieving -0.009 BPB improvement over base sliding window eval
  • QK-Gain: scaling Q/K projections by 4.0x — hurt slightly, included for documentation
  • Based on Approach B architecture (d=576, 11L, int5 GPTQ)
  • TTT disabled; improvement comes entirely from SLOT

Results

Metric Value
val_bpb (SLOT) 1.1179
val_bpb (base sliding window) 1.1267
SLOT improvement -0.009 BPB
Artifact size 15.2 MB (762 KB headroom)
Eval time 419s
Current SOTA 1.1147

Key Findings

  • SLOT works: eval-time logit delta optimization gives a reliable -0.009 BPB without any training changes
  • QK-Gain 4.0 hurts slightly: scaling Q/K projections didn't improve quality
  • Non-record: 1.1179 does not beat SOTA of 1.1147

Rule Compliance

  • Training time < 600s
  • Eval time < 600s (419s)
  • Artifact < 16MB (15.2MB)
  • No val tokens in artifact
  • No pre-eval TTT (TTT disabled, SLOT-only)
  • Single-pass evaluation

🤖 Generated with Claude Code

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant