[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"tag-turboquant":3},{"tag":4,"articles":11},{"id":5,"name":6,"slug":7,"article_count":8,"description_zh":9,"description_en":10},"d8bd452a-7bae-471e-99b0-b081e34f288d","TurboQuant","turboquant",13,"TurboQuant 聚焦 LLM 推論時最吃記憶體的 KV cache，透過低位元量化與向量量化降低佔用，進而壓低伺服器成本並提升吞吐量；同時也牽涉到 QJL、PolarQuant、benchmark 公平性與引用爭議。","TurboQuant targets the KV-cache bottleneck in LLM inference, using low-bit and vector quantization to reduce memory pressure and server cost. The topic also connects to QJL, PolarQuant, benchmark fairness, and citation disputes.",[12,21,28,35,42,49,56,63,71],{"id":13,"slug":14,"title":15,"summary":16,"category":17,"image_url":18,"cover_image":18,"language":19,"created_at":20},"94994abd-e24d-4fd1-b941-942d03d19acf","turboquant-seo-shift-small-sites-en","TurboQuant and the SEO Shift for Small Sites","TurboQuant is a rumored Google search system that could widen the pool of pages ranked, giving smaller sites a better shot.","research","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778840455122-jfce.png","en","2026-05-15T10:20:28.134545+00:00",{"id":22,"slug":23,"title":24,"summary":25,"category":17,"image_url":26,"cover_image":26,"language":19,"created_at":27},"670a7f69-911f-41e8-a18b-7d3491253a19","turboquant-vllm-comparison-fp8-kv-cache-en","TurboQuant vs FP8: vLLM’s first broad test","vLLM found FP8 KV-cache quantization beats TurboQuant on speed, while TurboQuant’s strongest variants hurt accuracy.","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778839858405-b5ao.png","2026-05-15T10:10:37.219158+00:00",{"id":29,"slug":30,"title":31,"summary":32,"category":17,"image_url":33,"cover_image":33,"language":19,"created_at":34},"a259bf3b-e800-46fa-8550-605b5b8f4115","why-turboquant-changes-kv-cache-debate-en","Why TurboQuant changes the KV cache debate","TurboQuant makes KV cache compression a theoretical win, not just an engineering trick.","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778016643980-zx6u.png","2026-05-05T21:30:24.349733+00:00",{"id":36,"slug":37,"title":38,"summary":39,"category":17,"image_url":40,"cover_image":40,"language":19,"created_at":41},"d7b529f2-02b7-4d5b-bf82-490aa5fe8362","turboquant-eden-citation-fight-en","TurboQuant, EDEN, and the citation fight","TurboQuant’s KV-cache quantization claims are under fire: EDEN authors say the paper reuses older ideas, weaker scales, and shaky benchmarks.","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1777467061610-ug4x.png","2026-04-29T12:50:47.131528+00:00",{"id":43,"slug":44,"title":45,"summary":46,"category":17,"image_url":47,"cover_image":47,"language":19,"created_at":48},"6c80feee-7f7d-4518-bd06-3c04b8c46054","turboquant-cuts-memory-use-without-accuracy-loss-en","TurboQuant cuts memory use 6x without accuracy loss","Google Research’s TurboQuant claims 6x less memory and 8x faster inference with no accuracy loss, jolting AI inference economics.","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1775161136573-e0cb.png","2026-04-02T20:18:39.999171+00:00",{"id":50,"slug":51,"title":52,"summary":53,"category":17,"image_url":54,"cover_image":54,"language":19,"created_at":55},"fdb997e1-6691-46c5-bb2d-e1ca3f730c25","turboquant-google-paper-explained-en","TurboQuant Explained: Why Google’s New Paper Matters","Google’s TurboQuant paper targets KV cache bottlenecks with lower-bit quantization, aiming to cut LLM memory use and inference costs.","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1775160958409-7jj5.png","2026-04-02T20:15:40.601225+00:00",{"id":57,"slug":58,"title":59,"summary":60,"category":17,"image_url":61,"cover_image":61,"language":19,"created_at":62},"6fd1f021-a7ca-4fa7-9aae-6ca84b22dc6c","googles-turboquant-cuts-llm-memory-costs-en","Google's TurboQuant Cuts LLM Memory Costs","Google says TurboQuant uses QJL and PolarQuant to shrink vector-quantization memory and speed up LLM inference by up to 8x.","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1775160776347-4esa.png","2026-04-02T20:12:32.387326+00:00",{"id":64,"slug":65,"title":66,"summary":67,"category":68,"image_url":69,"cover_image":69,"language":19,"created_at":70},"b2de41c7-a1bf-414d-b843-97a3d0d1283b","turboquant-fast-cold-starts-rust-gpu-en","TurboQuant, Fast Cold Starts, and Rust on GPUs","TurboQuant cuts KV cache use 4.6x, GPU state restoration slashes cold starts, and Rust is moving deeper into CUDA work.","tools","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1775146375773-h6or.png","2026-04-02T16:12:38.879237+00:00",{"id":72,"slug":73,"title":74,"summary":75,"category":17,"image_url":76,"cover_image":76,"language":19,"created_at":77},"d4867ede-353b-4812-aac7-aebe28ef3613","turboquant-wont-fix-memory-crunch-en","TurboQuant Won’t Fix the Memory Crunch","Google’s TurboQuant can cut KV-cache memory use 6x, but longer contexts may keep DRAM and NAND demand climbing.","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1775132152400-1kew.png","2026-04-02T12:15:32.095995+00:00"]