Back to home

Tag

TurboQuant

TurboQuant targets the KV-cache bottleneck in LLM inference, using low-bit and vector quantization to reduce memory pressure and server cost. The topic also connects to QJL, PolarQuant, benchmark fairness, and citation disputes.

9 articles

TurboQuant and the SEO Shift for Small Sites
Research/May 15

TurboQuant and the SEO Shift for Small Sites

TurboQuant is a rumored Google search system that could widen the pool of pages ranked, giving smaller sites a better shot.

TurboQuant vs FP8: vLLM’s first broad test
Research/May 15

TurboQuant vs FP8: vLLM’s first broad test

vLLM found FP8 KV-cache quantization beats TurboQuant on speed, while TurboQuant’s strongest variants hurt accuracy.

Why TurboQuant changes the KV cache debate
Research/May 6

Why TurboQuant changes the KV cache debate

TurboQuant makes KV cache compression a theoretical win, not just an engineering trick.

TurboQuant, EDEN, and the citation fight
Research/Apr 29

TurboQuant, EDEN, and the citation fight

TurboQuant’s KV-cache quantization claims are under fire: EDEN authors say the paper reuses older ideas, weaker scales, and shaky benchmarks.

TurboQuant cuts memory use 6x without accuracy loss
Research/Apr 3

TurboQuant cuts memory use 6x without accuracy loss

Google Research’s TurboQuant claims 6x less memory and 8x faster inference with no accuracy loss, jolting AI inference economics.

TurboQuant Explained: Why Google’s New Paper Matters
Research/Apr 3

TurboQuant Explained: Why Google’s New Paper Matters

Google’s TurboQuant paper targets KV cache bottlenecks with lower-bit quantization, aiming to cut LLM memory use and inference costs.

Google's TurboQuant Cuts LLM Memory Costs
Research/Apr 3

Google's TurboQuant Cuts LLM Memory Costs

Google says TurboQuant uses QJL and PolarQuant to shrink vector-quantization memory and speed up LLM inference by up to 8x.

TurboQuant, Fast Cold Starts, and Rust on GPUs
Tools & Apps/Apr 3

TurboQuant, Fast Cold Starts, and Rust on GPUs

TurboQuant cuts KV cache use 4.6x, GPU state restoration slashes cold starts, and Rust is moving deeper into CUDA work.

TurboQuant Won’t Fix the Memory Crunch
Research/Apr 2

TurboQuant Won’t Fix the Memory Crunch

Google’s TurboQuant can cut KV-cache memory use 6x, but longer contexts may keep DRAM and NAND demand climbing.