Skip to content

Releases: Ryuketsukami/turboquant-compression

v0.1.0 - Initial Release

28 Mar 15:53

Choose a tag to compare

TurboQuant Compression v0.1.0

Python implementation of the TurboQuant algorithm (ICLR 2026) for near-optimal LLM KV cache compression.

Features

  • PolarQuant (Stage 1): Random orthogonal rotation + Lloyd-Max scalar quantization
  • QJL (Stage 2): 1-bit residual correction for unbiased inner products
  • TurboQuantKVCache: Compressed KV cache with attention score computation
  • 3-bit quantization: 10.7x compression, 0.98+ cosine similarity (d=256)
  • 27 pytest tests, CI on Python 3.10-3.13
  • pip-installable package

Paper

Zandieh et al., ICLR 2026. arXiv:2504.19874