Invalid submissions due to information leakage during TTT

Don't train on eval tokens your model hasn't scored yet!

---

"Proper" TTT goes like this:

1. For each `1 <= t <= T`:
    1.1. Score on eval token `t`
    1.2. Adapt weights based on eval token `<= t`
---

What y'all are doing is something like this:
1. For each `1 <= t <= T`:
    1.1. Adapt weights based on eval token `<= t`
2. For each `1 <= t <= T`:
    2.1. Score on eval token `t`

But this is equivalent to appending the eval tokens to the training tokens and switching training strategies before eval! Also see: https://github.com/openai/parameter-golf/pull/152#issuecomment-4101741099

---

Potentially invalid submissions:
| PR  | comment | status |
| --- | --- | --- |
| https://github.com/openai/parameter-golf/pull/136 | TTT on half of the batch; eval on full batch | [ ] open |
| https://github.com/openai/parameter-golf/pull/152 | TTT on all eval tokens before evaluation | [x] closed |
| https://github.com/openai/parameter-golf/pull/254 | TTT on multiple parts of eval sequence for multiple epochs | [ ] open |
| https://github.com/openai/parameter-golf/pull/264 | TTT before eval | [ ] open |
| https://github.com/openai/parameter-golf/pull/338 | TTT on multiple parts of eval sequence for multiple epochs | [ ] open |
| https://github.com/openai/parameter-golf/pull/398 | TTT on all eval tokens before evaluation | [ ] Open |
| https://github.com/openai/parameter-golf/pull/421 | TTT before evals | [ ] open |
| https://github.com/openai/parameter-golf/pull/417 | TTT for multiple epochs before evals | [ ] open |
| https://github.com/openai/parameter-golf/pull/442 | TTT before evals | [ ] open |

cc @0hq 

---

Please feel free to correct me if I'm wrong.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Invalid submissions due to information leakage during TTT #402

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

PR	comment	status
#136	TTT on half of the batch; eval on full batch	[ ] open
#152	TTT on all eval tokens before evaluation	[x] closed
#254	TTT on multiple parts of eval sequence for multiple epochs	[ ] open
#264	TTT before eval	[ ] open
#338	TTT on multiple parts of eval sequence for multiple epochs	[ ] open
#398	TTT on all eval tokens before evaluation	[ ] Open
#421	TTT before evals	[ ] open
#417	TTT for multiple epochs before evals	[ ] open
#442	TTT before evals	[ ] open

Invalid submissions due to information leakage during TTT #402

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions