Back to home

Tag

vector quantization

Vector quantization compresses high-dimensional embeddings into compact codes, reducing memory and bandwidth in LLM KV caches, vector search, and inference pipelines. Recent work such as TurboQuant focuses on online, accelerator-friendly schemes that balance MSE, inner-product distortion, and throughput.

2 articles