Skip to content

Non-record: QAT + Neural Cache + LoRA TTT#304

Closed
Bortlesboat wants to merge 1 commit intoopenai:mainfrom
Bortlesboat:submission/neural-cache-ttt-qat
Closed

Non-record: QAT + Neural Cache + LoRA TTT#304
Bortlesboat wants to merge 1 commit intoopenai:mainfrom
Bortlesboat:submission/neural-cache-ttt-qat

Conversation

@Bortlesboat
Copy link
Copy Markdown

Summary

Non-record submission exploring three eval-time techniques stacked on the #1 training recipe (Int5-MLP + BigramHash + SWA by @thwu1):

  1. Quantization-Aware Training (QAT): STE fake-quantization during training matching int5/int6 export format
  2. Neural Cache: Hidden-state similarity lookup with logaddexp interpolation during sliding window eval
  3. LoRA Test-Time Training: Per-document rank-8 LoRA adaptation with entropy-gated updates

Results

Seed Pre-quant val_bpb Post-quant sliding val_bpb Steps
1337 1.1739 1.4245 5109

Known Issues

The QAT implementation has a mismatch: STE uses symmetric clipping while the export uses percentile-based per-row scaling. This causes a 0.25 BPB quantization penalty instead of the expected ~0.02. Submitting as non-record to document the approach — the neural cache and LoRA TTT implementations are validated and working, and will show gains once the QAT bug is fixed.

Test plan

  • Full 8xH100 training run completed (5109 steps, 600s)
  • SWA applied (24 checkpoints)
  • Sliding window eval completed on 1xH100
  • Neural cache eval validated on H100
  • Fix QAT export mismatch
  • 3+ seeds for statistical significance

Explores stacking eval-time techniques (neural cache, LoRA TTT) and
quantization-aware training on top of the openai#1 recipe. QAT has an export
mismatch bug resulting in high quantization penalty — submitting as
non-record to document the approach for iteration.
@Bortlesboat
Copy link
Copy Markdown
Author

Superseded by #1169 (better score). Closing.

@Bortlesboat Bortlesboat closed this Apr 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant