Tag

PolarQuant

PolarQuant is a vector-quantization approach aimed at reducing memory overhead in LLM inference, embedding storage, and ANN search. It matters because lower footprint can translate into faster serving, cheaper hardware, and more practical deployment of retrieval and search systems.

2 articles

Research/May 6

Why TurboQuant changes the KV cache debate

TurboQuant makes KV cache compression a theoretical win, not just an engineering trick.

Research/Apr 3

Google's TurboQuant Cuts LLM Memory Costs

Google says TurboQuant uses QJL and PolarQuant to shrink vector-quantization memory and speed up LLM inference by up to 8x.