Commit b5835f1
Revert DyT to RMSNorm + SGD momentum SLOT (novel eval improvement)
DyT hurt (1.1307 vs 1.1263 sliding). Back to RMSNorm.
Novel: replace AdamW with SGD+momentum(0.9) for SLOT optimization.
PR openai#995 showed SGD+momentum beats AdamW for TTT by 0.036 BPB.
Nobody has tried SGD SLOT specifically.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>1 parent ee34ab6 commit b5835f1
1 file changed
Lines changed: 4 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
540 | 540 | | |
541 | 541 | | |
542 | 542 | | |
543 | | - | |
544 | | - | |
| 543 | + | |
| 544 | + | |
545 | 545 | | |
546 | 546 | | |
547 | 547 | | |
| |||
874 | 874 | | |
875 | 875 | | |
876 | 876 | | |
877 | | - | |
| 877 | + | |
| 878 | + | |
878 | 879 | | |
879 | 880 | | |
880 | 881 | | |
| |||
0 commit comments