Don't train on eval tokens your model hasn't scored yet!
"Proper" TTT goes like this:
- For each
1 <= t <= T:
1.1. Score on eval token t
1.2. Adapt weights based on eval token <= t
What y'all are doing is something like this:
- For each
1 <= t <= T:
1.1. Adapt weights based on eval token <= t
- For each
1 <= t <= T:
2.1. Score on eval token t
But this is equivalent to appending the eval tokens to the training tokens and switching training strategies before eval! Also see: #152 (comment)
Potentially invalid submissions:
| PR |
comment |
status |
| #136 |
TTT on half of the batch; eval on full batch |
[ ] open |
| #152 |
TTT on all eval tokens before evaluation |
[x] closed |
| #254 |
TTT on multiple parts of eval sequence for multiple epochs |
[ ] open |
| #264 |
TTT before eval |
[ ] open |
| #338 |
TTT on multiple parts of eval sequence for multiple epochs |
[ ] open |
| #398 |
TTT on all eval tokens before evaluation |
[ ] Open |
| #421 |
TTT before evals |
[ ] open |
| #417 |
TTT for multiple epochs before evals |
[ ] open |
| #442 |
TTT before evals |
[ ] open |
cc @0hq
Please feel free to correct me if I'm wrong.
Don't train on eval tokens your model hasn't scored yet!
"Proper" TTT goes like this:
1 <= t <= T:1.1. Score on eval token
t1.2. Adapt weights based on eval token
<= tWhat y'all are doing is something like this:
1 <= t <= T:1.1. Adapt weights based on eval token
<= t1 <= t <= T:2.1. Score on eval token
tBut this is equivalent to appending the eval tokens to the training tokens and switching training strategies before eval! Also see: #152 (comment)
Potentially invalid submissions:
cc @0hq
Please feel free to correct me if I'm wrong.