[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"tag-vector-quantization":3},{"tag":4,"articles":11},{"id":5,"name":6,"slug":7,"article_count":8,"description_zh":9,"description_en":10},"f9a5a73c-df99-4e43-8252-6263761e4037","vector quantization","vector-quantization",4,"向量量化是把高維向量壓成更小表示的核心技術，常見於 LLM KV cache、向量搜尋與推論加速。近期焦點在 TurboQuant 這類線上量化方法，強調在 MSE、inner product 失真與記憶體成本之間取得更好的平衡。","Vector quantization compresses high-dimensional embeddings into compact codes, reducing memory and bandwidth in LLM KV caches, vector search, and inference pipelines. Recent work such as TurboQuant focuses on online, accelerator-friendly schemes that balance MSE, inner-product distortion, and throughput.",[12,21],{"id":13,"slug":14,"title":15,"summary":16,"category":17,"image_url":18,"cover_image":18,"language":19,"created_at":20},"bc8a4577-e218-43ae-a08b-4898abf26e2a","turboquant-online-vector-quantization-near-optimal-en","TurboQuant brings near-optimal online vector quantization","TurboQuant is an online, accelerator-friendly vector quantizer that targets near-optimal MSE and inner-product distortion.","research","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1777467656845-z759.png","en","2026-04-29T13:00:40.593903+00:00",{"id":22,"slug":23,"title":24,"summary":25,"category":17,"image_url":26,"cover_image":26,"language":19,"created_at":27},"6fd1f021-a7ca-4fa7-9aae-6ca84b22dc6c","googles-turboquant-cuts-llm-memory-costs-en","Google's TurboQuant Cuts LLM Memory Costs","Google says TurboQuant uses QJL and PolarQuant to shrink vector-quantization memory and speed up LLM inference by up to 8x.","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1775160776347-4esa.png","2026-04-02T20:12:32.387326+00:00"]