Skip to content

Invalid submissions due to information leakage during TTT #402

@leloykun

Description

@leloykun

Don't train on eval tokens your model hasn't scored yet!


"Proper" TTT goes like this:

  1. For each 1 <= t <= T:
    1.1. Score on eval token t
    1.2. Adapt weights based on eval token <= t

What y'all are doing is something like this:

  1. For each 1 <= t <= T:
    1.1. Adapt weights based on eval token <= t
  2. For each 1 <= t <= T:
    2.1. Score on eval token t

But this is equivalent to appending the eval tokens to the training tokens and switching training strategies before eval! Also see: #152 (comment)


Potentially invalid submissions:

PR comment status
#136 TTT on half of the batch; eval on full batch [ ] open
#152 TTT on all eval tokens before evaluation [x] closed
#254 TTT on multiple parts of eval sequence for multiple epochs [ ] open
#264 TTT before eval [ ] open
#338 TTT on multiple parts of eval sequence for multiple epochs [ ] open
#398 TTT on all eval tokens before evaluation [ ] Open
#421 TTT before evals [ ] open
#417 TTT for multiple epochs before evals [ ] open
#442 TTT before evals [ ] open

cc @0hq


Please feel free to correct me if I'm wrong.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions