Tag
vector quantization
Vector quantization compresses high-dimensional embeddings into compact codes, reducing memory and bandwidth in LLM KV caches, vector search, and inference pipelines. Recent work such as TurboQuant focuses on online, accelerator-friendly schemes that balance MSE, inner-product distortion, and throughput.
2 articles

Research/Apr 29
TurboQuant brings near-optimal online vector quantization
TurboQuant is an online, accelerator-friendly vector quantizer that targets near-optimal MSE and inner-product distortion.

Research/Apr 3
Google's TurboQuant Cuts LLM Memory Costs
Google says TurboQuant uses QJL and PolarQuant to shrink vector-quantization memory and speed up LLM inference by up to 8x.