[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"article-turboquant-eden-citation-fight-en":3,"tags-turboquant-eden-citation-fight-en":30,"related-lang-turboquant-eden-citation-fight-en":41,"related-posts-turboquant-eden-citation-fight-en":45,"series-research-d7b529f2-02b7-4d5b-bf82-490aa5fe8362":82},{"id":4,"title":5,"content":6,"summary":7,"source":8,"source_url":9,"author":10,"image_url":11,"keywords":12,"language":18,"translated_content":10,"views":19,"is_premium":20,"created_at":21,"updated_at":21,"cover_image":11,"published_at":22,"rewrite_status":23,"rewrite_error":10,"rewritten_from_id":24,"slug":25,"category":26,"related_article_id":27,"status":28,"google_indexed_at":29,"x_posted_at":10,"tweet_text":10,"title_rewritten_at":10,"title_original":10,"key_takeaways":10,"topic_cluster_id":10,"embedding":10,"is_canonical_seed":20},"d7b529f2-02b7-4d5b-bf82-490aa5fe8362","TurboQuant, EDEN, and the citation fight","\u003Cp>TurboQuant entered the conversation with a bold claim: 6x compression for KV-cache quantization. But the debate around it quickly moved away from compression ratios and into a more uncomfortable question for ML research: who actually did the underlying work first?\u003C\u002Fp>\u003Cp>On Hacker News, the authors behind \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2604.18555\" target=\"_blank\" rel=\"noopener\">a new note\u003C\u002Fa> argued that TurboQuant is a restricted version of \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2110.02170\" target=\"_blank\" rel=\"noopener\">DRIVE\u003C\u002Fa> and \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2206.15421\" target=\"_blank\" rel=\"noopener\">EDEN\u003C\u002Fa>, with weaker scale choices and less accurate results. That is a strong accusation, and it matters because this is not a tiny corner of the field. KV-cache compression affects inference cost, latency, and memory pressure in large language models.\u003C\u002Fp>\u003Ch2>What TurboQuant actually claims\u003C\u002Fh2>\u003Cp>TurboQuant is about compressing the KV cache used during transformer inference. That cache stores past key and value vectors so the model can attend to earlier tokens without recomputing everything. The tradeoff is simple: the longer the context, the more memory the cache consumes, and the more room there is for quantization tricks to save money.\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1777467061610-ug4x.png\" alt=\"TurboQuant, EDEN, and the citation fight\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>The controversy starts because TurboQuant’s core method does not appear to be a clean new quantizer. In the HN thread, the EDEN authors say the paper uses an older quantization recipe, then fixes the scale parameter in a way that is easier to describe but worse in practice. They also say the paper mixes a biased multi-bit step with an unbiased 1-bit residual step, which creates extra error compared with using EDEN directly.\u003C\u002Fp>\u003Cp>Here are the main claims being debated:\u003C\u002Fp>\u003Cul>\u003Cli>\u003Cstrong>TurboQuant is framed as new\u003C\u002Fstrong>, but critics say it is a restricted EDEN variant.\u003C\u002Fli>\u003Cli>\u003Cstrong>The scale choice is fixed\u003C\u002Fstrong>, while EDEN had derived better scale settings.\u003C\u002Fli>\u003Cli>\u003Cstrong>The residual quantization step is weaker\u003C\u002Fstrong> than the unbiased EDEN setup.\u003C\u002Fli>\u003Cli>\u003Cstrong>The paper’s headline compression story\u003C\u002Fstrong> may overstate how much is actually novel.\u003C\u002Fli>\u003C\u002Ful>\u003Cp>That last point is where the argument gets spicy. If a paper presents a familiar method with a new application, that can still be useful work. But if the paper reads like a fresh algorithm while borrowing most of its machinery from earlier papers, the citation trail matters a lot more.\u003C\u002Fp>\u003Cp>And this is where the HN discussion became unusually specific. One commenter noted that the TurboQuant paper cites \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2411.17525\" target=\"_blank\" rel=\"noopener\">HIGGS\u003C\u002Fa> and \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2501.19392\" target=\"_blank\" rel=\"noopener\">Cache Me If You Must\u003C\u002Fa>, while another pointed out that the older EDEN papers already cover the same post-rotation quantization ideas more directly. The dispute is no longer about whether the method works at all. It is about how much of it was already known.\u003C\u002Fp>\u003Ch2>Why the prior work matters\u003C\u002Fh2>\u003Cp>To understand the criticism, you need the short version of the lineage. \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2110.02170\" target=\"_blank\" rel=\"noopener\">DRIVE\u003C\u002Fa> introduced post-rotation distribution-aware quantization in 2021, and \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2206.15421\" target=\"_blank\" rel=\"noopener\">EDEN\u003C\u002Fa> extended that idea to more bit widths and scale settings. The HN thread says those papers already gave the right derivations for choosing scales, while TurboQuant used a simpler but weaker version.\u003C\u002Fp>\u003Cp>That matters because quantization papers can look similar on the surface while differing a lot in the details. A fixed scale can be easier to implement. A derived optimal scale can improve mean squared error. A biased quantizer can behave differently from an unbiased one. Once you start chaining these choices together, the error budget changes quickly.\u003C\u002Fp>\u003Cblockquote>“We were the first to introduce post-rotation distribution-aware quantization in 2021. This was later implemented in many fields, including federated learning, vector retrieval, databases, inference engines, and KV-cache.”\u003C\u002Fblockquote>\u003Cp>That quote from the Hacker News discussion captures the real issue: credit. In research, being first is not just a vanity metric. It affects who gets cited, who gets invited to speak, and which papers become the base layer for later systems work.\u003C\u002Fp>\u003Cp>The same thread also points to a broader pattern that ML folks know too well. A method can be rediscovered in a product blog, a benchmark repo, or an implementation note, then relabeled as if it arrived from nowhere. When that happens, the original paper often gets less attention than the derivative version with better packaging.\u003C\u002Fp>\u003Cp>There is also a technical reason to care. If TurboQuant is really a weaker version of EDEN, then anyone choosing it for production inference may be leaving performance on the table. In a memory-sensitive system, a small quantization penalty can turn into higher latency, lower throughput, or both.\u003C\u002Fp>\u003Ch2>Comparing the numbers and the benchmarks\u003C\u002Fh2>\u003Cp>The strongest criticism in the thread is not philosophical. It is numerical. One commenter said TurboQuant’s “6x compression” headline is hard to compare with earlier KV-cache baselines like \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fjy-yuan\u002FKIVI\" target=\"_blank\" rel=\"noopener\">KIVI\u003C\u002Fa>, and that the paper’s RaBitQ comparison used a single-core CPU for the baseline but an \u003Ca href=\"https:\u002F\u002Fwww.nvidia.com\u002Fen-us\u002Fdata-center\u002Fa100\u002F\" target=\"_blank\" rel=\"noopener\">A100\u003C\u002Fa> GPU for TurboQuant. That is not a fair benchmark setup.\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1777467061521-c3tg.png\" alt=\"TurboQuant, EDEN, and the citation fight\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>Here is the comparison as described in the discussion and linked note:\u003C\u002Fp>\u003Cul>\u003Cli>\u003Cstrong>TurboQuant headline:\u003C\u002Fstrong> 6x compression for KV-cache.\u003C\u002Fli>\u003Cli>\u003Cstrong>EDEN comparison:\u003C\u002Fstrong> the note says 2-bit EDEN beats 3-bit TurboQuant in some settings.\u003C\u002Fli>\u003Cli>\u003Cstrong>Accuracy claim:\u003C\u002Fstrong> the note says unbiased EDEN is often more than a bit better than TurboQuant.\u003C\u002Fli>\u003Cli>\u003Cstrong>Benchmark setup concern:\u003C\u002Fstrong> one baseline reportedly ran on a single CPU core while TurboQuant used an A100 GPU.\u003C\u002Fli>\u003C\u002Ful>\u003Cp>Those numbers are enough to change how you read the paper. If a method wins on a benchmark because it gets a better device, a better implementation, or a friendlier comparison target, the headline result stops being useful for engineers.\u003C\u002Fp>\u003Cp>The OpenReview thread linked in the HN comments adds another layer: reproducibility concerns. If reported accuracy cannot be reproduced cleanly, then even a nice compression ratio is only half a result. Engineers need methods they can test, not just methods that look good in a PDF.\u003C\u002Fp>\u003Cp>This is also where the \u003Ca href=\"https:\u002F\u002Fvllm.ai\u002F\" target=\"_blank\" rel=\"noopener\">vLLM\u003C\u002Fa> implementation note becomes interesting. The docs for \u003Ca href=\"https:\u002F\u002Fdocs.vllm.ai\u002Fen\u002Flatest\u002Fapi\u002Fvllm\u002Fmodel_executor\u002Flayers\u002Fquantization\u002Fturboquant.html\" target=\"_blank\" rel=\"noopener\">TurboQuant in vLLM\u003C\u002Fa> describe the technique as a scalar case of HIGGS-style quantization applied to KV-cache compression. That framing suggests the idea is already part of a larger family of methods, which makes the “new invention” story even harder to defend.\u003C\u002Fp>\u003Ch2>What this says about AI research right now\u003C\u002Fh2>\u003Cp>This story is bigger than one paper. It shows how easily a method can be recast when it moves from theory papers into systems code, benchmark repos, or a blog post with a catchy name. The original math may be old, the implementation may be useful, and the packaging may be what gets attention.\u003C\u002Fp>\u003Cp>That creates a weird incentive structure. If you are a researcher, you want your work to be cited correctly. If you are an engineer, you want the method that actually performs best under real constraints. If you are a startup or infrastructure team, you want the version that is easiest to ship. Those goals overlap, but they are not the same thing.\u003C\u002Fp>\u003Cp>My read is simple: TurboQuant may still be useful as an implementation story, but it should be discussed as part of the EDEN\u002FDRIVE lineage, not as a fresh break from prior work. If the HN critique holds up under review, the paper will be remembered less for its compression ratio and more as a case study in weak attribution and shaky benchmarking.\u003C\u002Fp>\u003Cp>For teams working on KV-cache compression today, the practical move is to check the older papers first, then compare on the same hardware, same bit width, and same accuracy target. If 2-bit EDEN really beats 3-bit TurboQuant in your setup, that is the result that should drive the decision. The next question is whether the community starts treating citation hygiene as part of model quality, or keeps separating the math from the paper trail.\u003C\u002Fp>","TurboQuant’s KV-cache quantization claims are under fire: EDEN authors say the paper reuses older ideas, weaker scales, and shaky benchmarks.","news.ycombinator.com","https:\u002F\u002Fnews.ycombinator.com\u002Fitem?id=47916890",null,"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1777467061610-ug4x.png",[13,14,15,16,17],"TurboQuant","EDEN","KV-cache quantization","LLM inference","OpenReview","en",0,false,"2026-04-29T12:50:47.131528+00:00","2026-04-29T12:50:47.064+00:00","done","e229803f-e26a-46a8-8172-e0029649c09d","turboquant-eden-citation-fight-en","research","4242e1bf-4f38-488d-9f92-ccb4f5b70319","published","2026-04-30T09:00:08.141+00:00",[31,33,35,37,39],{"name":14,"slug":32},"eden",{"name":17,"slug":34},"openreview",{"name":16,"slug":36},"llm-inference",{"name":13,"slug":38},"turboquant",{"name":15,"slug":40},"kv-cache-quantization",{"id":27,"slug":42,"title":43,"language":44},"turboquant-eden-citation-fight-zh","TurboQuant、EDEN 與引用爭議","zh",[46,52,58,64,70,76],{"id":47,"slug":48,"title":49,"cover_image":50,"image_url":50,"created_at":51,"category":26},"94994abd-e24d-4fd1-b941-942d03d19acf","turboquant-seo-shift-small-sites-en","TurboQuant and the SEO Shift for Small Sites","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778840455122-jfce.png","2026-05-15T10:20:28.134545+00:00",{"id":53,"slug":54,"title":55,"cover_image":56,"image_url":56,"created_at":57,"category":26},"670a7f69-911f-41e8-a18b-7d3491253a19","turboquant-vllm-comparison-fp8-kv-cache-en","TurboQuant vs FP8: vLLM’s first broad test","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778839858405-b5ao.png","2026-05-15T10:10:37.219158+00:00",{"id":59,"slug":60,"title":61,"cover_image":62,"image_url":62,"created_at":63,"category":26},"5aef1c57-961f-49f7-8277-f83f7336799a","llmbda-calculus-agent-safety-rules-en","LLMbda calculus gives agents safety rules","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778825459914-obkf.png","2026-05-15T06:10:36.242145+00:00",{"id":65,"slug":66,"title":67,"cover_image":68,"image_url":68,"created_at":69,"category":26},"712a0357-f7cd-48f2-adde-c2691da0815f","low-complexity-beamspace-denoiser-mmwave-mimo-en","A simpler beamspace denoiser for mmWave MIMO","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778814646705-e7mx.png","2026-05-15T03:10:31.764301+00:00",{"id":71,"slug":72,"title":73,"cover_image":74,"image_url":74,"created_at":75,"category":26},"f595f949-6ea1-4b0e-a632-f1832ef26e36","ai-benchmark-wins-cyber-scare-defenders-en","Why AI benchmark wins in cyber should scare defenders","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778807444539-gz7f.png","2026-05-15T01:10:30.04579+00:00",{"id":77,"slug":78,"title":79,"cover_image":80,"image_url":80,"created_at":81,"category":26},"3ad202d1-9e5f-49c5-8383-02fcf1a23cf2","why-linux-security-needs-patch-wave-mindset-en","Why Linux security needs a patch-wave mindset","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778741441493-ikl6.png","2026-05-14T06:50:25.906256+00:00",[83,88,93,98,103,108,113,118,123,128],{"id":84,"slug":85,"title":86,"created_at":87},"a2715e72-1fe8-41b3-abb1-d0cf1f710189","ai-predictions-2026-big-changes-en","AI Predictions for 2026: Brace for Big Changes","2026-03-26T01:25:07.788356+00:00",{"id":89,"slug":90,"title":91,"created_at":92},"8404bd7b-4c2f-4109-9ec4-baf29d88af2b","ml-papers-of-the-week-github-research-desk-en","ML Papers of the Week Turns GitHub Into a Research Desk","2026-03-27T01:11:39.480259+00:00",{"id":94,"slug":95,"title":96,"created_at":97},"87897a94-8065-4464-a016-1f23e89e17cc","ai-ml-conferences-to-watch-in-2026-en","AI\u002FML Conferences to Watch in 2026","2026-03-27T01:51:54.184108+00:00",{"id":99,"slug":100,"title":101,"created_at":102},"6f1987cf-25f3-47a4-b3e6-db0997695be8","openclaw-agents-manipulated-self-sabotage-en","OpenClaw Agents Can Be Manipulated Into Failure","2026-03-28T03:03:18.899465+00:00",{"id":104,"slug":105,"title":106,"created_at":107},"a53571ad-735a-4178-9f93-cb09b699d99c","vega-driving-language-instructions-en","Vega: Driving with Natural Language Instructions","2026-03-28T14:54:04.698882+00:00",{"id":109,"slug":110,"title":111,"created_at":112},"a34581d6-f36e-46da-88bb-582fb3e7425c","personalizing-autonomous-driving-styles-en","Drive My Way: Personalizing Autonomous Driving Styles","2026-03-28T14:54:26.148181+00:00",{"id":114,"slug":115,"title":116,"created_at":117},"2bc1ad7f-26ce-4f02-9885-803b35fd229d","training-knowledge-bases-writeback-rag-en","Training Knowledge Bases with WriteBack-RAG","2026-03-28T14:54:45.643433+00:00",{"id":119,"slug":120,"title":121,"created_at":122},"71adc507-3c54-4605-bbe2-c966acd6187e","packforcing-long-video-generation-en","PackForcing: Efficient Long-Video Generation Method","2026-03-28T14:55:02.646943+00:00",{"id":124,"slug":125,"title":126,"created_at":127},"675942ef-b9ec-4c5f-a997-381250b6eacb","pixelsmile-facial-expression-editing-en","PixelSmile Framework Enhances Facial Expression Editing","2026-03-28T14:55:20.633463+00:00",{"id":129,"slug":130,"title":131,"created_at":132},"6954fa2b-8b66-4839-884b-e46f89fa1bc3","adaptive-block-scaled-data-types-en","IF4: Smarter 4-Bit Quantization That Adapts to Your Data","2026-03-31T06:00:36.65963+00:00"]