Non-record: Does SLOT violate causal dependence? (empirical test + question) by andrewbaggio1 · Pull Request #1240 · openai/parameter-golf

andrewbaggio1 · 2026-04-02T00:33:22Z

What this is

I wrote a script with a coding agent that tests whether SLOT violates causal
dependence. I think the answer is yes.

My Test

Take a window of tokens, run SLOT, and record the NLL at each scored position.
Flip a single target token somewhere in the window and re-run SLOT.
If the NLL changes at other positions, those predictions depended on the token
we flipped.

Without SLOT, changing a target has no effect on other positions because the
model is causal. With SLOT, every single scored position is affected, and I
found a 100% violation rate across 240 tested pairs.

I also checked self-prediction: is P(x_{t+1}) better when x_{t+1} is actually
in the optimization targets vs when it's swapped for a random token? Yes, by
+0.24 nats (shared delta) and +0.73 nats (per-sample + logit bias).

I'm flagging my own submission (PR #1209).

I looked at PR #1229, which pushes SLOT to its logical extreme: per-sample
delta, per-sample logit bias, and scored-position masking. It gets 0.9300 BPB,
which is the best on the non-verified leaderboard and 0.19 below merged SOTA.
In defense of it, the mechanism is the same as shared-delta SLOT, just with
more parameters to memorize the evaluation targets.

Counterargument

@AnubhavBharadwaaj correctly points out that in stride=64 sliding window,
1984/2048 tokens are already-scored context. So ~97% of the gradient comes
from known tokens. I think this is fair because the leakage in shared-delta
SLOT is small. But it's not zero, and "a little bit of future information"
is still future information.

Reproducing

# No GPU needed. ~30 seconds on CPU.
python prove_slot_causal_violation.py

Request for a decision

@0hq @valerio-oai SLOT has been debated across PRs #1084, #1128, #1172,
#1176, #1209 without a ruling. I'd really appreciate you weighing in!

Full writeup in the README generated by Claude Code.

Combines Full Hessian GPTQ, legal score-first chunked TTT (3 epochs), and SLOT delta optimization (8 AdamW steps per batch). All eval-time techniques are single-pass, score-before-update compliant. 3-seed mean: 1.1064 +/- 0.0004 BPB on 8xH100 SXM. Beats verified SOTA (openai#1019, 1.1147) by 0.0083 BPB. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Empirical test showing SLOT violates causal dependence (Issue openai#1017 Rule 1). Includes reproducible proof script, output log, and full writeup. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

andrewbaggio1 and others added 2 commits March 31, 2026 23:48

Non-record: SLOT causal dependence analysis + empirical proof

26136c0

Empirical test showing SLOT violates causal dependence (Issue openai#1017 Rule 1). Includes reproducible proof script, output log, and full writeup. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-record: Does SLOT violate causal dependence? (empirical test + question)#1240

Non-record: Does SLOT violate causal dependence? (empirical test + question)#1240
andrewbaggio1 wants to merge 2 commits intoopenai:mainfrom
andrewbaggio1:nonrecord/slot-causal-dependence-analysis

andrewbaggio1 commented Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

andrewbaggio1 commented Apr 2, 2026

What this is

My Test

Counterargument

Reproducing

Request for a decision

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant