[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"tag-polarquant":3},{"tag":4,"articles":11},{"id":5,"name":6,"slug":7,"article_count":8,"description_zh":9,"description_en":10},"a8439559-f2d5-4d07-82cd-4fe0f24132f5","PolarQuant","polarquant",3,"PolarQuant 是一種向量量化與記憶體壓縮方法，常見於 LLM 推論、向量檢索與資料庫索引。它的重點在於降低 embedding 與權重的儲存成本，同時盡量保留搜尋與推論品質。","PolarQuant is a vector-quantization approach aimed at reducing memory overhead in LLM inference, embedding storage, and ANN search. It matters because lower footprint can translate into faster serving, cheaper hardware, and more practical deployment of retrieval and search systems.",[12,21],{"id":13,"slug":14,"title":15,"summary":16,"category":17,"image_url":18,"cover_image":18,"language":19,"created_at":20},"a259bf3b-e800-46fa-8550-605b5b8f4115","why-turboquant-changes-kv-cache-debate-en","Why TurboQuant changes the KV cache debate","TurboQuant makes KV cache compression a theoretical win, not just an engineering trick.","research","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778016643980-zx6u.png","en","2026-05-05T21:30:24.349733+00:00",{"id":22,"slug":23,"title":24,"summary":25,"category":17,"image_url":26,"cover_image":26,"language":19,"created_at":27},"6fd1f021-a7ca-4fa7-9aae-6ca84b22dc6c","googles-turboquant-cuts-llm-memory-costs-en","Google's TurboQuant Cuts LLM Memory Costs","Google says TurboQuant uses QJL and PolarQuant to shrink vector-quantization memory and speed up LLM inference by up to 8x.","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1775160776347-4esa.png","2026-04-02T20:12:32.387326+00:00"]