Commit 01724f3
Fix TTT: use eval_model (int6 artifact) not base_model, honor EVAL_STRIDE
P1: TTT was running on the pre-quantization base_model instead of the
int6 round-tripped eval_model. This overstated TTT gains since the
artifact model has quantization noise. Now matches PR openai#473's approach.
P2: TTT hardcoded stride=64 instead of using args.eval_stride. Now
honors the configured stride so TTT results stay consistent with
the sliding window eval path.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>1 parent c3442df commit 01724f3
1 file changed
Lines changed: 3 additions & 2 deletions
Lines changed: 3 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1564 | 1564 | | |
1565 | 1565 | | |
1566 | 1566 | | |
1567 | | - | |
| 1567 | + | |
1568 | 1568 | | |
1569 | | - | |
| 1569 | + | |
| 1570 | + | |
1570 | 1571 | | |
1571 | 1572 | | |
1572 | 1573 | | |
| |||
0 commit comments