Tag

TurboQuant

TurboQuant targets the KV-cache bottleneck in LLM inference, using low-bit and vector quantization to reduce memory pressure and server cost. The topic also connects to QJL, PolarQuant, benchmark fairness, and citation disputes.

9 articles

Research/May 15

TurboQuant and the SEO Shift for Small Sites

TurboQuant is a rumored Google search system that could widen the pool of pages ranked, giving smaller sites a better shot.

Research/May 15

TurboQuant vs FP8: vLLM’s first broad test

vLLM found FP8 KV-cache quantization beats TurboQuant on speed, while TurboQuant’s strongest variants hurt accuracy.

Research/May 6

Why TurboQuant changes the KV cache debate

TurboQuant makes KV cache compression a theoretical win, not just an engineering trick.

Research/Apr 29

TurboQuant, EDEN, and the citation fight

TurboQuant’s KV-cache quantization claims are under fire: EDEN authors say the paper reuses older ideas, weaker scales, and shaky benchmarks.

Research/Apr 3

TurboQuant cuts memory use 6x without accuracy loss

Google Research’s TurboQuant claims 6x less memory and 8x faster inference with no accuracy loss, jolting AI inference economics.

Research/Apr 3

TurboQuant Explained: Why Google’s New Paper Matters

Google’s TurboQuant paper targets KV cache bottlenecks with lower-bit quantization, aiming to cut LLM memory use and inference costs.

Research/Apr 3

Google's TurboQuant Cuts LLM Memory Costs

Google says TurboQuant uses QJL and PolarQuant to shrink vector-quantization memory and speed up LLM inference by up to 8x.

Tools & Apps/Apr 3

TurboQuant, Fast Cold Starts, and Rust on GPUs

TurboQuant cuts KV cache use 4.6x, GPU state restoration slashes cold starts, and Rust is moving deeper into CUDA work.

Research/Apr 2

TurboQuant Won’t Fix the Memory Crunch

Google’s TurboQuant can cut KV-cache memory use 6x, but longer contexts may keep DRAM and NAND demand climbing.