[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"tag-quantization":3},{"tag":4,"articles":10},{"id":5,"name":6,"slug":6,"article_count":7,"description_zh":8,"description_en":9},"5ae33974-b603-48f8-b294-3ce82f8ee748","quantization",3,"量化是把模型權重、KV cache 或啟動值壓縮成更低位元表示的技術，目標是在記憶體、延遲與成本之間取得平衡。從 4-bit 混合格式到針對 LLM 推論的低位元方案，它直接影響部署效率與可擴充性。","Quantization compresses model weights, activations, or KV cache into lower-bit formats to reduce memory and inference cost. Recent work spans 4-bit hybrid schemes and lower-bit LLM inference methods that target bottlenecks without sacrificing too much accuracy.",[11,20],{"id":12,"slug":13,"title":14,"summary":15,"category":16,"image_url":17,"cover_image":17,"language":18,"created_at":19},"04d71e47-d4ad-45bf-b678-5bcbdb1de0ee","shannon-scaling-law-llm-overtraining-zh","香農尺度律解釋 LLM 過訓練","這篇論文把 LLM 訓練看成帶雜訊的資訊傳輸，說明為何算力增加時，模型在噪聲下反而可能變差。","research","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1779689757133-oarp.png","zh","2026-05-25T06:15:31.356036+00:00",{"id":21,"slug":22,"title":23,"summary":24,"category":16,"image_url":25,"cover_image":25,"language":18,"created_at":26},"456ad15d-693b-4a13-8896-23d26e57c4de","turboquant-quantization-accuracy-performance-study-zh","TurboQuant 讓 4-bit 不再亂猜","我把 TurboQuant 的量化研究拆成一套可直接照抄的選型流程，幫你判斷 8-bit、4-bit、PTQ、QAT 怎麼選。","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1779287080305-zb4c.png","2026-05-20T14:24:10.883063+00:00"]