Releases: Ryuketsukami/turboquant-compression
Releases · Ryuketsukami/turboquant-compression
v0.1.0 - Initial Release
TurboQuant Compression v0.1.0
Python implementation of the TurboQuant algorithm (ICLR 2026) for near-optimal LLM KV cache compression.
Features
- PolarQuant (Stage 1): Random orthogonal rotation + Lloyd-Max scalar quantization
- QJL (Stage 2): 1-bit residual correction for unbiased inner products
- TurboQuantKVCache: Compressed KV cache with attention score computation
- 3-bit quantization: 10.7x compression, 0.98+ cosine similarity (d=256)
- 27 pytest tests, CI on Python 3.10-3.13
- pip-installable package
Paper
Zandieh et al., ICLR 2026. arXiv:2504.19874