Tag
KV cache compression
2 articles

Research/Apr 29
TurboQuant brings near-optimal online vector quantization
TurboQuant is an online, accelerator-friendly vector quantizer that targets near-optimal MSE and inner-product distortion.

Tools & Apps/Apr 3
TurboQuant, Fast Cold Starts, and Rust on GPUs
TurboQuant cuts KV cache use 4.6x, GPU state restoration slashes cold starts, and Rust is moving deeper into CUDA work.