[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"article-turboquant-eden-citation-fight-zh":3,"tags-turboquant-eden-citation-fight-zh":33,"related-lang-turboquant-eden-citation-fight-zh":44,"related-posts-turboquant-eden-citation-fight-zh":48,"series-research-4242e1bf-4f38-488d-9f92-ccb4f5b70319":85},{"id":4,"title":5,"content":6,"summary":7,"source":8,"source_url":9,"author":10,"image_url":11,"keywords":12,"language":21,"translated_content":10,"views":22,"is_premium":23,"created_at":24,"updated_at":24,"cover_image":11,"published_at":25,"rewrite_status":26,"rewrite_error":10,"rewritten_from_id":27,"slug":28,"category":29,"related_article_id":30,"status":31,"google_indexed_at":32,"x_posted_at":10,"tweet_text":10,"title_rewritten_at":10,"title_original":10,"key_takeaways":10,"topic_cluster_id":10,"embedding":10,"is_canonical_seed":23},"4242e1bf-4f38-488d-9f92-ccb4f5b70319","TurboQuant、EDEN 與引用爭議","\u003Cp>TurboQuant 一開始很吸睛。它主打 KV-cache 6x 壓縮。這種數字很容易讓人停下來看。因為在 LLM 推論裡，記憶體和延遲都很貴。\u003C\u002Fp>\u003Cp>但話題很快歪掉。爭議不在壓縮比，而在引用。EDEN 團隊直接說，TurboQuant 很像舊方法的縮小版。這種說法很刺耳，但也很常見。\u003C\u002Fp>\u003Cp>講白了，這不是只有學術圈在吵。KV-cache 壓縮會影響推論成本。也會影響 token throughput。對跑服務的團隊來說，差一點點就可能差很多。\u003C\u002Fp>\u003Ch2>TurboQuant 到底在做什麼\u003C\u002Fh2>\u003Cp>先講技術本體。\u003Ca href=\"https:\u002F\u002Fdocs.vllm.ai\u002Fen\u002Flatest\u002Fapi\u002Fvllm\u002Fmodel_executor\u002Flayers\u002Fquantization\u002Fturboquant.html\" target=\"_blank\" rel=\"noopener\">TurboQuant\u003C\u002Fa> 是拿來壓縮 transformer 推論時的 KV-cache。KV-cache 會存過去 token 的 key 和 v\u003Ca href=\"\u002Fnews\u002Ftsallis-loss-reasoning-model-training-zh\">al\u003C\u002Fa>ue。這樣模型不用每次重算前文。\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1777467063814-l8dk.png\" alt=\"TurboQuant、EDEN 與引用爭議\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>問題是，context 越長，cache 就越大。這會吃掉更多顯存。也會讓推論成本往上跑。於是大家開始玩量化，想把記憶體壓下來。\u003C\u002Fp>\u003Cp>TurboQuant 的爭議點，在於它看起來不像全新量化器。批評者說，它比較像舊方法的組合。只是寫法更簡單，說法更好懂。\u003C\u002Fp>\u003Cul>\u003Cli>TurboQuant 主打 KV-cache 壓縮。\u003C\u002Fli>\u003Cli>批評者說它像 \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2110.02170\" target=\"_blank\" rel=\"noopener\">DRIVE\u003C\u002Fa> 的延伸。\u003C\u002Fli>\u003Cli>也有人說它和 \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2206.15421\" target=\"_blank\" rel=\"noopener\">EDEN\u003C\u002Fa> 很接近。\u003C\u002Fli>\u003Cli>爭點集中在 scale 與 residual 設計。\u003C\u002Fli>\u003C\u002Ful>\u003Cp>這裡的重點很現實。方法可以舊，但應用可以新。這沒問題。問題是，你不能把舊骨架包成新發明。尤其在 AI 論文裡，這種包裝太常見了。\u003C\u002Fp>\u003Cp>更麻煩的是，這種方法很難只看標題判斷。壓縮比看起來漂亮，不代表細節也漂亮。scale 怎麼設。b\u003Ca href=\"\u002Fnews\u002Fcoding-agent-skills-form-factor-shift-zh\">it\u003C\u002Fa> 怎麼分。誤差怎麼累積。這些都會影響最後結果。\u003C\u002Fp>\u003Cp>所以，TurboQuant 真正讓人皺眉的，不是它有沒有用。是它到底新在哪裡。這個問題沒講清楚，後面所有數字都會變得很尷尬。\u003C\u002Fp>\u003Ch2>為什麼 EDEN 團隊會不爽\u003C\u002Fh2>\u003Cp>這場爭議的核心，是引用順序。\u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2110.02170\" target=\"_blank\" rel=\"noopener\">DRIVE\u003C\u002Fa> 早在 2021 年就做了 post-rot\u003Ca href=\"\u002Fnews\u002Fwhy-bitcoin-regulation-should-be-treated-as-a-national-secur-zh\">atio\u003C\u002Fa>n 的 distribution-aware quantization。後來的 \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2206.15421\" target=\"_blank\" rel=\"noopener\">EDEN\u003C\u002Fa> 又把這套想法往前推。\u003C\u002Fp>\u003Cp>EDEN 團隊的說法很直接。他們認為 TurboQuant 只是更受限的版本。scale 選擇也比較弱。殘差量化的處理方式，還可能讓誤差更大。\u003C\u002Fp>\u003Cp>這種爭議在 ML 圈不稀奇。但它每次都會讓人火大。因為大家都知道，citation 不是裝飾品。它決定誰被看見。\u003C\u002Fp>\u003Cblockquote>“We were the first to introduce post-rotation distribution-aware quantization in 2021.”\u003C\u002Fblockquote>\u003Cp>這句話出自 HN 討論。意思很清楚。先做的人，想要被正確記住。這很合理。你辛苦寫出來的公式，不該被後來的包裝吃掉。\u003C\u002Fp>\u003Cp>我覺得這裡最刺的是，很多人會把「能跑」和「原創」混在一起。其實兩者差很多。能跑是工程。原創是論文脈絡。\u003C\u002Fp>\u003Cp>如果 TurboQuant 真的只是 EDEN 的變體，那它就應該老實寫成變體。這不是小氣。這是基本職業道德。\u003C\u002Fp>\u003Cp>而且這件事不只關乎名聲。還關乎後面誰會接著做。引用錯了，研究路線也會跟著歪。\u003C\u002Fp>\u003Ch2>數字怎麼看才不會被帶風向\u003C\u002Fh2>\u003Cp>爭議裡最常被拿來講的，是 6x 壓縮。這個數字很大聲。可是大聲不等於公平。你要先看測試條件，再看比較對象。\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1777467076981-ix4b.png\" alt=\"TurboQuant、EDEN 與引用爭議\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>在 HN 討論裡，有人指出 TurboQuant 的 benchmark 不太對等。像是某些比較用了單核心 CPU。TurboQuant 那邊卻跑在 \u003Ca href=\"https:\u002F\u002Fwww.nvidia.com\u002Fen-us\u002Fdata-center\u002Fa100\u002F\" target=\"_blank\" rel=\"noopener\">A100\u003C\u002Fa> GPU 上。這種比法很容易把結果弄歪。\u003C\u002Fp>\u003Cp>另外，社群也提到 \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fjy-yuan\u002FKIVI\" target=\"_blank\" rel=\"noopener\">KIVI\u003C\u002Fa>、\u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2411.17525\" target=\"_blank\" rel=\"noopener\">HIGGS\u003C\u002Fa>，還有 \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2501.19392\" target=\"_blank\" rel=\"noopener\">Cache Me If You Must\u003C\u002Fa>。這些方法都在不同面向處理 KV-cache 或量化問題。\u003C\u002Fp>\u003Cul>\u003Cli>TurboQuant 主打 6x 壓縮。\u003C\u002Fli>\u003Cli>有說法稱 2-bit EDEN 在某些情境贏過 3-bit TurboQuant。\u003C\u002Fli>\u003Cli>也有人指出 EDEN 的 unbiased 設計更準。\u003C\u002Fli>\u003Cli>benchmark 可能混用了 CPU 和 GPU。\u003C\u002Fli>\u003C\u002Ful>\u003Cp>這些數字放在一起看，味道就變了。若一個方法只是在特定硬體上贏，那它的實用價值就要打折。工程師最怕這種 paper win。\u003C\u002Fp>\u003Cp>再來是 reproducibility。OpenReview 上如果有人重跑不出來，那就很麻煩。因為推論系統不是寫作文。你不能只看圖漂亮。\u003C\u002Fp>\u003Cp>我自己的判斷很簡單。若 2-bit EDEN 在你的情境裡比 3-bit TurboQuant 還穩，那就別被標題騙了。實測比較重要。論文標語不會幫你省顯存。\u003C\u002Fp>\u003Ch2>這件事其實很像 AI 圈老毛病\u003C\u002Fh2>\u003Cp>TurboQuant 不是孤例。AI 圈很常把舊點子重新包裝。換個名字。換個圖表。換個 benchmark。然後大家又開始轉貼。\u003C\u002Fp>\u003Cp>這種現象之所以多，是因為論文和產品節奏太快。研究者想發表。工程師想上線。新創想講故事。三方需求不一樣。\u003C\u002Fp>\u003Cp>結果就是，真正重要的內容常被包裝蓋掉。原始方法可能沒那麼會講故事。可是它可能更完整，也更值得引用。\u003C\u002Fp>\u003Cp>如果你是台灣的開發者，這件事很實際。你在選 LLM 推論方案時，不能只看壓縮比。還要看硬體、延遲、吞吐量、準確率，還有實作成本。\u003C\u002Fp>\u003Cp>像 \u003Ca href=\"https:\u002F\u002Fvllm.ai\u002F\" target=\"_blank\" rel=\"noopener\">vLLM\u003C\u002Fa> 這種推論框架，會把方法放進真正的服務路徑。這時候，理論上的小差異，會變成機房裡的電費差異。\u003C\u002Fp>\u003Cp>所以我會說，TurboQuant 的價值不一定在原創。它比較像一個案例。提醒大家：論文名字很會唬人，但資料和 benchmark 不會說謊。\u003C\u002Fp>\u003Ch2>接下來該怎麼看這類論文\u003C\u002Fh2>\u003Cp>如果你在評估 KV-cache 壓縮，先回頭看舊論文。先看 \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2110.02170\" target=\"_blank\" rel=\"noopener\">DRIVE\u003C\u002Fa>。再看 \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2206.15421\" target=\"_blank\" rel=\"noopener\">EDEN\u003C\u002Fa>。你會更容易看出 TurboQuant 到底改了什麼。\u003C\u002Fp>\u003Cp>接著，把比較條件對齊。相同 GPU。相同 bit width。相同 accuracy target。相同 context length。少一項，結果都可能變味。\u003C\u002Fp>\u003Cp>最後，別只看壓縮比。要一起看 latency、throughput、顯存占用，還有實作複雜度。講白了，能進 production 的方法，才算真的有用。\u003C\u002Fp>\u003Cp>我的預測很直接。這類爭議只會越來越多。因為 LLM 基礎設施越來越成熟，大家開始更在意 citation、benchmark 和 reproducibility。你下次看到一個很猛的數字時，先問一句：這是新東西，還是舊東西換包裝？\u003C\u002Fp>","TurboQuant 主打 KV-cache 6x 壓縮，卻被指和 DRIVE、EDEN 同源，還有 scale 選擇與 benchmark 公平性爭議。","news.ycombinator.com","https:\u002F\u002Fnews.ycombinator.com\u002Fitem?id=47916890",null,"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1777467063814-l8dk.png",[13,14,15,16,17,18,19,20],"TurboQuant","EDEN","DRIVE","KV-cache","量化","LLM推論","benchmark","citation爭議","zh",1,false,"2026-04-29T12:50:45.096442+00:00","2026-04-29T12:50:44.936+00:00","done","e229803f-e26a-46a8-8172-e0029649c09d","turboquant-eden-citation-fight-zh","research","d7b529f2-02b7-4d5b-bf82-490aa5fe8362","published","2026-04-30T09:00:08.17+00:00",[34,36,39,40,42],{"name":14,"slug":35},"eden",{"name":37,"slug":38},"KV cache","kv-cache",{"name":17,"slug":17},{"name":13,"slug":41},"turboquant",{"name":15,"slug":43},"drive",{"id":30,"slug":45,"title":46,"language":47},"turboquant-eden-citation-fight-en","TurboQuant, EDEN, and the citation fight","en",[49,55,61,67,73,79],{"id":50,"slug":51,"title":52,"cover_image":53,"image_url":53,"created_at":54,"category":29},"667b72b6-e821-4d68-80a1-e03340bc85f1","turboquant-seo-shift-small-sites-zh","TurboQuant 與小站 SEO 變化","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778840440690-kcw9.png","2026-05-15T10:20:27.319472+00:00",{"id":56,"slug":57,"title":58,"cover_image":59,"image_url":59,"created_at":60,"category":29},"381fb6c6-6da7-4444-831f-8c5eed8d685c","turboquant-vllm-comparison-fp8-kv-cache-zh","TurboQuant 與 FP8 實測結果","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778839867551-4v9g.png","2026-05-15T10:10:36.034569+00:00",{"id":62,"slug":63,"title":64,"cover_image":65,"image_url":65,"created_at":66,"category":29},"c15f45ee-a548-4dbf-8152-91de159c1a11","llmbda-calculus-agent-safety-rules-zh","LLMbda 演算替 AI 代理人立安全規則","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778825503412-mlbf.png","2026-05-15T06:10:34.832664+00:00",{"id":68,"slug":69,"title":70,"cover_image":71,"image_url":71,"created_at":72,"category":29},"0c02225c-d6ff-44f8-bc92-884c8921c4a3","low-complexity-beamspace-denoiser-mmwave-mimo-zh","更簡單的毫米波波束域去噪器","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778814650361-xtc2.png","2026-05-15T03:10:30.06639+00:00",{"id":74,"slug":75,"title":76,"cover_image":77,"image_url":77,"created_at":78,"category":29},"9d27f967-62cc-433f-8cdb-9300937ade13","ai-benchmark-wins-cyber-scare-defenders-zh","為什麼 AI 基準賽在資安領域的勝利，應該讓防守方警醒","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778807450006-nofx.png","2026-05-15T01:10:29.379041+00:00",{"id":80,"slug":81,"title":82,"cover_image":83,"image_url":83,"created_at":84,"category":29},"bc402dc6-5da6-46fc-9d66-d09cb215f72b","why-linux-security-needs-patch-wave-mindset-zh","為什麼 Linux 安全需要「補丁浪潮」思維","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778741449813-s2wn.png","2026-05-14T06:50:24.052583+00:00",[86,91,96,101,106,111,116,121,126,131],{"id":87,"slug":88,"title":89,"created_at":90},"f18dbadb-8c59-4723-84a4-6ad22746c77a","deepmind-bets-on-continuous-learning-ai-2026-zh","DeepMind 押注 2026 連續學習 AI","2026-03-26T08:16:02.367355+00:00",{"id":92,"slug":93,"title":94,"created_at":95},"f4a106cb-02a6-4508-8f39-9720a0a93cee","ml-papers-of-the-week-github-research-desk-zh","每週 ML 論文清單，為何紅到 GitHub","2026-03-27T01:11:39.284175+00:00",{"id":97,"slug":98,"title":99,"created_at":100},"c4f807ca-4e5f-47f1-a48c-961cf3fc44dc","ai-ml-conferences-to-watch-in-2026-zh","2026 AI 研討會投稿時程整理","2026-03-27T01:51:53.874432+00:00",{"id":102,"slug":103,"title":104,"created_at":105},"9f50561b-aebd-46ba-94a8-363198aa7091","openclaw-agents-manipulated-self-sabotage-zh","OpenClaw Agent 會自己搞砸自己","2026-03-28T03:03:18.786425+00:00",{"id":107,"slug":108,"title":109,"created_at":110},"11f22e92-7066-4978-a544-31f5f2156ec6","vega-learning-to-drive-with-natural-language-instructions-zh","Vega：使用自然語言指示進行自駕車控制","2026-03-28T14:54:04.847912+00:00",{"id":112,"slug":113,"title":114,"created_at":115},"a4c7cfec-8d0e-4fec-93cf-1b9699a530b8","drive-my-way-en-zh","Drive My Way：個性化自駕車風格的實現","2026-03-28T14:54:26.207495+00:00",{"id":117,"slug":118,"title":119,"created_at":120},"dec02f89-fd39-41ba-8e4d-11ede93a536d","training-knowledge-bases-with-writeback-rag-zh","用 WriteBack-RAG 強化知識庫提升檢索效能","2026-03-28T14:54:45.775606+00:00",{"id":122,"slug":123,"title":124,"created_at":125},"3886be5c-a137-40cc-b9e2-0bf18430c002","packforcing-efficient-long-video-generation-method-zh","PackForcing：短影片訓練也能生成長影片","2026-03-28T14:55:02.688141+00:00",{"id":127,"slug":128,"title":129,"created_at":130},"72b90667-d930-4cc9-8ced-aaa0f8968d44","pixelsmile-toward-fine-grained-facial-expression-editing-zh","PixelSmile：提升精細臉部表情編輯的新方法","2026-03-28T14:55:20.678181+00:00",{"id":132,"slug":133,"title":134,"created_at":135},"cf046742-efb2-4753-aef9-caed5da5e32e","adaptive-block-scaled-data-types-zh","IF4：神經網路量化的聰明選擇","2026-03-31T06:00:36.990273+00:00"]