[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"tag-quantization":3},{"tag":4,"articles":10},{"id":5,"name":6,"slug":6,"article_count":7,"description_zh":8,"description_en":9},"5ae33974-b603-48f8-b294-3ce82f8ee748","quantization",3,"量化是把模型權重、KV cache 或啟動值壓縮成更低位元表示的技術，目標是在記憶體、延遲與成本之間取得平衡。從 4-bit 混合格式到針對 LLM 推論的低位元方案，它直接影響部署效率與可擴充性。","Quantization compresses model weights, activations, or KV cache into lower-bit formats to reduce memory and inference cost. Recent work spans 4-bit hybrid schemes and lower-bit LLM inference methods that target bottlenecks without sacrificing too much accuracy.",[11,20,28,35,43,50,57],{"id":12,"slug":13,"title":14,"summary":15,"category":16,"image_url":17,"cover_image":17,"language":18,"created_at":19},"034b5552-6ad2-4a5f-960c-870f30d7be22","5-turboquant-lessons-for-vector-search-teams-en","5 TurboQuant lessons for vector search teams","5 takeaways on Qdrant TurboQuant: how rotation changes compression, where recall holds up, and when safer quantizers fit better.","industry","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780157892244-w7me.png","en","2026-05-30T16:17:39.721708+00:00",{"id":21,"slug":22,"title":23,"summary":24,"category":25,"image_url":26,"cover_image":26,"language":18,"created_at":27},"68b3843b-ea46-49f5-9c1c-7364193d5dc3","shannon-scaling-law-llm-overtraining-en","Shannon Scaling Law explains LLM overtraining","A Shannon-based scaling law explains why LLMs can get worse as compute rises under noise.","research","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1779689763312-ksv4.png","2026-05-25T06:15:32.249486+00:00",{"id":29,"slug":30,"title":31,"summary":32,"category":25,"image_url":33,"cover_image":33,"language":18,"created_at":34},"aed5cbda-77cf-4dfe-8606-c8463a64403e","turboquant-quantization-accuracy-performance-study-en","TurboQuant shows how 4-bit beats guesswork","I break down TurboQuant’s quantization study into a practical playbook for choosing 8-bit, 4-bit, PTQ, or QAT.","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1779287080437-oo8r.png","2026-05-20T14:24:12.033527+00:00",{"id":36,"slug":37,"title":38,"summary":39,"category":40,"image_url":41,"cover_image":41,"language":18,"created_at":42},"49dbda12-d94e-4e41-99d0-200d57eb97a9","turboquant-vllm-kv-cache-3bit-storage-en","TurboQuant turns vLLM KV cache into 3-bit storage","I break down TurboQuant’s vLLM cache compression and give you a copy-ready setup for 3-bit KV cache and fallback paths.","tools","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1779286502445-214g.png","2026-05-20T14:14:37.831446+00:00",{"id":44,"slug":45,"title":46,"summary":47,"category":25,"image_url":48,"cover_image":48,"language":18,"created_at":49},"6c80feee-7f7d-4518-bd06-3c04b8c46054","turboquant-cuts-memory-use-without-accuracy-loss-en","TurboQuant cuts memory use 6x without accuracy loss","Google Research’s TurboQuant claims 6x less memory and 8x faster inference with no accuracy loss, jolting AI inference economics.","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1775161136573-e0cb.png","2026-04-02T20:18:39.999171+00:00",{"id":51,"slug":52,"title":53,"summary":54,"category":25,"image_url":55,"cover_image":55,"language":18,"created_at":56},"fdb997e1-6691-46c5-bb2d-e1ca3f730c25","turboquant-google-paper-explained-en","TurboQuant Explained: Why Google’s New Paper Matters","Google’s TurboQuant paper targets KV cache bottlenecks with lower-bit quantization, aiming to cut LLM memory use and inference costs.","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1775160958409-7jj5.png","2026-04-02T20:15:40.601225+00:00",{"id":58,"slug":59,"title":60,"summary":61,"category":25,"image_url":62,"cover_image":62,"language":18,"created_at":63},"6954fa2b-8b66-4839-884b-e46f89fa1bc3","adaptive-block-scaled-data-types-en","IF4: Smarter 4-Bit Quantization That Adapts to Your Data","MIT researchers propose a hybrid data format that switches between floating-point and integer representations, improving accuracy in 4-bit neural network quantization.","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1774939577640-6i9x.png","2026-03-31T06:00:36.65963+00:00"]