Tag

vector quantization

Vector quantization compresses high-dimensional embeddings into compact codes, reducing memory and bandwidth in LLM KV caches, vector search, and inference pipelines. Recent work such as TurboQuant focuses on online, accelerator-friendly schemes that balance MSE, inner-product distortion, and throughput.

2 articles

Research/Apr 29

TurboQuant brings near-optimal online vector quantization

TurboQuant is an online, accelerator-friendly vector quantizer that targets near-optimal MSE and inner-product distortion.

Research/Apr 3

Google's TurboQuant Cuts LLM Memory Costs

Google says TurboQuant uses QJL and PolarQuant to shrink vector-quantization memory and speed up LLM inference by up to 8x.