[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"tag-量化":3},{"tag":4,"articles":10},{"id":5,"name":6,"slug":6,"article_count":7,"description_zh":8,"description_en":9},"a045e526-4ad5-4abd-afac-067f4f8cfd66","量化",5,"量化在 AI 推論裡多半指把權重或 KV cache 轉成更低位元表示，以換取更少記憶體、更低延遲與更高吞吐。近期焦點集中在 TurboQuant 這類方法，及其對長上下文、伺服器成本與 benchmark 公平性的影響。","Quantization in AI inference usually means storing weights or KV cache in lower-bit formats to cut memory use, latency, and cost. Recent coverage centers on TurboQuant-style methods and their trade-offs for long-context workloads, server economics, and benchmark fairness.",[11,20,28,36,43,50,57],{"id":12,"slug":13,"title":14,"summary":15,"category":16,"image_url":17,"cover_image":17,"language":18,"created_at":19},"06774dfe-08eb-4a53-a8f7-36389b462c2b","llama-3-1-70b-specs-benchmarks-deployment-zh","Llama 3.1 70B：規格與部署","Meta 的 Llama 3.1 70B 仍是 128K 長上下文的自架文字模型，適合內部聊天、RAG 與 API 編排，重點在成本控制與部署自主性。","model-release","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780395481064-5yri.png","zh","2026-06-02T10:17:33.072306+00:00",{"id":21,"slug":22,"title":23,"summary":24,"category":25,"image_url":26,"cover_image":26,"language":18,"created_at":27},"e4150272-a31a-45c4-b63c-91095bebfb82","5-turboquant-zh","5 個 TurboQuant 向量搜尋重點","5 個重點帶你看懂 TurboQuant 如何在向量搜尋中省記憶體、保品質，並判斷 4-bit、2-bit、標量與二值量化怎麼選。","industry","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780157886592-x73h.png","2026-05-30T16:17:39.14006+00:00",{"id":29,"slug":30,"title":31,"summary":32,"category":33,"image_url":34,"cover_image":34,"language":18,"created_at":35},"4242e1bf-4f38-488d-9f92-ccb4f5b70319","turboquant-eden-citation-fight-zh","TurboQuant、EDEN 與引用爭議","TurboQuant 主打 KV-cache 6x 壓縮，卻被指和 DRIVE、EDEN 同源，還有 scale 選擇與 benchmark 公平性爭議。","research","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1777467063814-l8dk.png","2026-04-29T12:50:45.096442+00:00",{"id":37,"slug":38,"title":39,"summary":40,"category":33,"image_url":41,"cover_image":41,"language":18,"created_at":42},"82766fdc-4368-445d-bb4a-03377726df02","turboquant-cuts-memory-use-without-accuracy-loss-zh","TurboQuant 省 6 倍記憶體，還不掉準確率","Google Research 發表 TurboQuant，主打記憶體用量降到 1\u002F6、推論快 8 倍，且在報告測試中沒有準確率損失。這篇看它怎麼改 AI 伺服器成本。","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1775161134112-ftrj.png","2026-04-02T20:18:39.266389+00:00",{"id":44,"slug":45,"title":46,"summary":47,"category":33,"image_url":48,"cover_image":48,"language":18,"created_at":49},"fdb08bdf-a3bd-4c4d-acaf-ce8035f24449","turboquant-google-paper-explained-zh","TurboQuant 是什麼？Google 新論文重點","Google 的 TurboQuant 盯上 LLM 的 KV cache 瓶頸，用低位元量化降低記憶體用量與推論成本。這篇帶你看它在解什麼問題、和其他優化法差在哪。","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1775160957331-6iua.png","2026-04-02T20:15:40.07166+00:00",{"id":51,"slug":52,"title":53,"summary":54,"category":33,"image_url":55,"cover_image":55,"language":18,"created_at":56},"9d1ed0f2-aace-46ce-9b0a-0c0d8655e8e8","turboquant-wont-fix-memory-crunch-zh","TurboQuant 解不了記憶體荒","Google 的 TurboQuant 可把 KV-cache 記憶體用量降到 6 倍，但更長上下文、更多 agent 與更高吞吐，可能把 DRAM 和 NAND 需求繼續往上推。","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1775132150405-6fvw.png","2026-04-02T12:15:31.810812+00:00",{"id":58,"slug":59,"title":60,"summary":61,"category":33,"image_url":62,"cover_image":62,"language":18,"created_at":63},"cf046742-efb2-4753-aef9-caed5da5e32e","adaptive-block-scaled-data-types-zh","IF4：神經網路量化的聰明選擇","MIT研究團隊提出混合式資料格式，可在浮點與整數表示法間動態切換，改善4位元量化的精度。","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1774939628942-3028.png","2026-03-31T06:00:36.990273+00:00"]