[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"article-turboquant-vllm-comparison-fp8-kv-cache-zh":3,"tags-turboquant-vllm-comparison-fp8-kv-cache-zh":36,"related-lang-turboquant-vllm-comparison-fp8-kv-cache-zh":47,"related-posts-turboquant-vllm-comparison-fp8-kv-cache-zh":51,"series-research-381fb6c6-6da7-4444-831f-8c5eed8d685c":88},{"id":4,"title":5,"content":6,"summary":7,"source":8,"source_url":9,"author":10,"image_url":11,"keywords":12,"language":19,"translated_content":10,"views":20,"is_premium":21,"created_at":22,"updated_at":22,"cover_image":11,"published_at":23,"rewrite_status":24,"rewrite_error":10,"rewritten_from_id":25,"slug":26,"category":27,"related_article_id":28,"status":29,"google_indexed_at":10,"x_posted_at":10,"tweet_text":10,"title_rewritten_at":10,"title_original":10,"key_takeaways":30,"topic_cluster_id":34,"embedding":35,"is_canonical_seed":21},"381fb6c6-6da7-4444-831f-8c5eed8d685c","TurboQuant 與 FP8 實測結果","\u003Cp data-speakable=\"summary\">v\u003Ca href=\"\u002Fnews\u002Fllmbda-calculus-agent-safety-rules-zh\">LLM\u003C\u002Fa> 首次大規模比較 \u003Ca href=\"\u002Ftag\u002Fturboquant\">TurboQuant\u003C\u002Fa> 與 FP8 KV-cache。結果很直白：FP8 在速度上更穩，TurboQuant 的高壓縮版本則常掉準確率。\u003C\u002Fp>\u003Cp>\u003Ca href=\"https:\u002F\u002Fvllm.ai\u002Fblog\u002Fturboquant\" target=\"_blank\" rel=\"noopener\">vLLM\u003C\u002Fa> 在 2026 年 5 月 11 日發文。它把 \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2401.08671\" target=\"_blank\" rel=\"noopener\">TurboQuant\u003C\u002Fa> 拉進真實服務場景測。測了 4 個變體、4 個模型、5 個 \u003Ca href=\"\u002Ftag\u002Fbenchmark\">benchmark\u003C\u002Fa>。還同時比了 BF16 和 \u003Ca href=\"https:\u002F\u002Fvllm.ai\u002Fblog\u002Ffp8-kv-cache\" target=\"_blank\" rel=\"noopener\">FP8 KV-cache\u003C\u002Fa>。\u003C\u002Fp>\u003Cp>講白了，這不是小 demo。這是看伺服器真的扛不扛得住。KV-cache 一旦進到\u003Ca href=\"\u002Ftag\u002F長上下文\">長上下文\u003C\u002Fa>、高併發、記憶體吃緊的場景，速度和準確率就會一起露餡。\u003C\u002Fp>\u003Ctable>\u003Cthead>\u003Ctr>\u003Cth>方法\u003C\u002Fth>\u003Cth>KV-cache 容量\u003C\u002Fth>\u003Cth>延遲影響\u003C\u002Fth>\u003Cth>吞吐影響\u003C\u002Fth>\u003Cth>準確率訊號\u003C\u002Fth>\u003C\u002Ftr>\u003C\u002Fthead>\u003Ctbody>\u003Ctr>\u003Ctd>FP8\u003C\u002Ftd>\u003Ctd>2x\u003C\u002Ftd>\u003Ctd>幾乎沒有\u003C\u002Ftd>\u003Ctd>接近 BF16\u003C\u002Ftd>\u003Ctd>接近基準\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>TurboQuant k8v4\u003C\u002Ftd>\u003Ctd>2.4x\u003C\u002Ftd>\u003Ctd>慢 10% 到 68%\u003C\u002Ftd>\u003Ctd>BF16 的 80% 到 75%\u003C\u002Ftd>\u003Ctd>接近基準\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>TurboQuant 4bit-nc\u003C\u002Ftd>\u003Ctd>2.3x 到 3.7x\u003C\u002Ftd>\u003Ctd>有明顯變慢\u003C\u002Ftd>\u003Ctd>約 BF16 的 75%\u003C\u002Ftd>\u003Ctd>有中度下降\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>TurboQuant k3v4-nc \u002F 3bit-nc\u003C\u002Ftd>\u003Ctd>高於 FP8\u003C\u002Ftd>\u003Ctd>最慢\u003C\u002Ftd>\u003Ctd>BF16 的 66% 到 73%\u003C\u002Ftd>\u003Ctd>下降明顯\u003C\u002Ftd>\u003C\u002Ftr>\u003C\u002Ftbody>\u003C\u002Ftable>\u003Ch2>TurboQuant 為什麼會紅\u003C\u002Fh2>\u003Cp>TurboQuant 的做法很直接。它把 KV-cache 壓到 3 到 4 bit。之後再解量化回 BF16，才能做 attention。這跟 FP8 很不一樣。FP8 是直接存 FP8，attention 也能跑 FP8 Tensor \u003Ca href=\"\u002Fnews\u002Fnvidia-backs-corning-factories-with-billions-zh\">Cor\u003C\u002Fa>e。\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778839867551-4v9g.png\" alt=\"TurboQuant 與 FP8 實測結果\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>差別就在這裡。TurboQuant 省記憶體很兇。可是在推理路徑裡，它多了一段解量化成本。你省下來的空間，常常又被額外運算吃回去。\u003C\u002Fp>\u003Cp>所以 vLLM 這篇比較像實測報告，不像產品宣傳。很多方法在簡報上很漂亮。可是一到延遲、吞吐、準確率，尤其是長提示詞和推理題，結果就很誠實。\u003C\u002Fp>\u003Cul>\u003Cli>測試的 TurboQuant 變體有 4 種：k8v4、4bit-nc、k3v4-nc、3bit-nc\u003C\u002Fli>\u003Cli>基準是 BF16 和 FP8\u003C\u002Fli>\u003Cli>模型包含 \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002FMiniMaxAI\u002FMiniMax-M2.7\" target=\"_blank\" rel=\"noopener\">MiniMax-M2.7\u003C\u002Fa>、\u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fmeta-llama\u002FLlama-3.3-70B-Instruct\" target=\"_blank\" rel=\"noopener\">Llama-3.3-70B-Instruct\u003C\u002Fa>、\u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002FQwen\u002FQwen3-30B-A3B-Instruct-2507\" target=\"_blank\" rel=\"noopener\">Qwen3-30B-A3B-Instruct-2507\u003C\u002Fa>、\u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002FQwen\u002FQwen3-30B-A3B-Thinking-2507\" target=\"_blank\" rel=\"noopener\">Qwen3-30B-A3B-Thinking-2507\u003C\u002Fa>\u003C\u002Fli>\u003Cli>benchmark 包含 \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fopenai\u002Fmrcr\" target=\"_blank\" rel=\"noopener\">openai\u002Fmrcr\u003C\u002Fa>、AIME25、GPQA:Diamond、MATH500、LiveCodeBench-v6\u003C\u002Fli>\u003C\u002Ful>\u003Ch2>準確率到底掉多少\u003C\u002Fh2>\u003Cp>準確率的結論沒有那麼單一，但趨勢很清楚。FP8 和 TurboQuant k8v4 大多能貼近原始基準。這代表它們還算能用，至少不會一上線就把答案搞歪。\u003C\u002Fp>\u003Cp>4bit-nc 就開始有感了。它還在可討論範圍內。若你的伺服器真的卡在記憶體，這版本可能還能試。可是一旦壓到 k3v4-nc 或 3bit-nc，掉分就很明顯。\u003C\u002Fp>\u003Cp>在長上下文檢索上，Llama-3.3-70B-Instruct 於 128k context 的結果很有代表性。BF16 平均恢復率約 98%。4bit-nc 約 96%。k3v4-nc 和 3bit-nc 則大約少了 20 分。\u003C\u002Fp>\u003Cblockquote>\"FP8 via --kv-cache-dtype fp8 remains the best default for KV-cache quantization.\" — vLLM blog, 2026-05-11\u003C\u002Fblockquote>\u003Cp>這句話很直白。我也覺得很合理。你如果想保住準確率，又想省一點記憶體，FP8 就是最穩的預設值。TurboQuant 比較像特殊情境工具，不是通用答案。\u003C\u002Fp>\u003Cul>\u003Cli>長上下文檢索測到各模型支援的最長長度\u003C\u002Fli>\u003Cli>準確率是 5 次重複的平均 pass@1\u003C\u002Fli>\u003Cli>k3v4-nc 和 3bit-nc 在最難的長上下文案例大約掉 20 分\u003C\u002Fli>\u003Cli>MiniMax-M2.7 上，激進版本在 AIME25 與 LiveCodeBench-v6 最多掉約 8 分\u003C\u002Fli>\u003C\u002Ful>\u003Ch2>速度這關，TurboQuant 輸得更明顯\u003C\u002Fh2>\u003Cp>速度結果更不漂亮。vLLM 用 1,024 個 input tokens 和 256 個 output tokens 來測延遲。batch size 也掃了 1、8、32、64。FP8 幾乎沒\u003Ca href=\"\u002Fnews\u002Fwhy-nebius-ai-pivot-is-more-real-than-hype-zh\">什麼\u003C\u002Fa>負擔。TurboQuant 就不是這樣。\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778839865020-3hoj.png\" alt=\"TurboQuant 與 FP8 實測結果\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>在 Qwen3-30B-A3B-Instruct-2507 上，TurboQuant 的延遲開銷大約落在 10% 到 60%。在 Llama-3.3-70B-Instruct 上，範圍更大，約 10% 到 68%。而且 batch size 越大，開銷還會往上爬。這對服務團隊來說很煩。\u003C\u002Fp>\u003Cp>吞吐量也一樣。FP8 在兩個模型上都能貼近 BF16。TurboQuant 則低一截。Qwen3-30B 介於 BF16 的 80% 到 73%。Llama-3.3-70B 則是 75% 到 66%。\u003C\u002Fp>\u003Cp>這代表一件事。KV-cache 省下來的容量，不會自動變成更快的服務。你把解量化成本加回去，整個算式就變了。\u003C\u002Fp>\u003Cul>\u003Cli>延遲測試用了 10 次 warmup 和 30 次正式測試\u003C\u002Fli>\u003Cli>吞吐測試用了 200 個 prompts，token 組合為 256\u002F256、1024\u002F512、4096\u002F256\u003C\u002Fli>\u003Cli>vLLM 版本是 0.20.2，commit 為 \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fvllm\u002Fcommit\u002F6ec9bbec3\" target=\"_blank\" rel=\"noopener\">6ec9bbec3\u003C\u002Fa>\u003C\u002Fli>\u003Cli>FP8 在延遲和吞吐上都接近 BF16，TurboQuant 則持續落後\u003C\u002Fli>\u003C\u002Ful>\u003Ch2>對實際部署代表什麼\u003C\u002Fh2>\u003Cp>這份測試最實際的結論是，TurboQuant 是記憶體工具，不是效能工具。若你的模型服務真的卡在 KV-cache 容量，而且你能接受慢一點，那 TurboQuant 4bit-nc 可以先試。若你同時在意延遲、吞吐、準確率，FP8 比較乾脆。\u003C\u002Fp>\u003Cp>還有一個硬體面很重要。FP8 會吃到現代 \u003Ca href=\"\u002Ftag\u002Fnvidia\">NVIDIA\u003C\u002Fa> \u003Ca href=\"\u002Ftag\u002Fgpu\">GPU\u003C\u002Fa> 的原生 Tensor Core。TurboQuant 則要先把低 bit 資料拆開，attention 才能跑。慢的那一段，通常就卡在這裡。\u003C\u002Fp>\u003Cp>所以這篇文章對工程團隊的價值，不是給你一張漂亮圖表，而是幫你縮小實驗範圍。先試 FP8。只有在記憶體真的還不夠時，再去碰 TurboQuant。若你打算在 \u003Ca href=\"https:\u002F\u002Fwww.nvidia.com\u002Fen-us\u002Fdata-center\u002Fh100\u002F\" target=\"_blank\" rel=\"noopener\">H100\u003C\u002Fa> 上部署，問題就不是 TurboQuant 能不能省空間。問題是，你願不願意拿速度和準確率去換那點空間。\u003C\u002Fp>\u003Cp>我也建議把這篇和 OraCore 的 \u003Ca href=\"\u002Fnews\u002Ffp8-kv-cache-vllm\">FP8 KV-cache in vLLM\u003C\u002Fa>、\u003Ca href=\"\u002Fnews\u002Fkv-cache-optimization-guide\">KV-cache optimization strategies\u003C\u002Fa> 一起看。你會更快看懂，哪些優化是真的能上線，哪些只是實驗室裡好看。\u003C\u002Fp>\u003Ch2>結論\u003C\u002Fh2>\u003Cp>vLLM 這次的大規模比較很直接。TurboQuant 只有在記憶體壓力很大時才值得考慮。就算要用，FP8 還是大多數團隊該先試的預設值。\u003C\u002Fp>\u003Cp>如果你在做模型服務，我的建議很簡單。先量你的 KV-cache 壓力，再看延遲預算。若兩者都緊，就別急著追低 bit。先把 FP8 跑穩，才有資格談 TurboQuant。\u003C\u002Fp>","vLLM 首次大規模比較 TurboQuant 與 FP8 KV-cache。結果很直白：FP8 在速度上更穩，TurboQuant 的高壓縮版本則常掉準確率。","vllm.ai","https:\u002F\u002Fvllm.ai\u002Fblog\u002Fturboquant",null,"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778839867551-4v9g.png",[13,14,15,16,17,18],"TurboQuant","FP8 KV-cache","vLLM","KV-cache quantization","LLM 推理","模型服務","zh",2,false,"2026-05-15T10:10:36.034569+00:00","2026-05-15T10:10:36.01+00:00","done","38fd5461-9e9d-4059-b58c-6be8d8d6c89a","turboquant-vllm-comparison-fp8-kv-cache-zh","research","670a7f69-911f-41e8-a18b-7d3491253a19","published",[31,32,33],"FP8 在 vLLM 的實測裡，比 TurboQuant 更穩，速度也更好。","TurboQuant 的低 bit 版本雖然省記憶體，但常付出延遲和準確率代價。","對多數部署來說，FP8 應該先試；TurboQuant 比較適合記憶體真的卡住的情境。","0c35a120-52fc-41fc-afa3-d404eb934158","[-0.027503036,0.0019839327,-0.009671621,-0.09656128,-0.013979797,-0.0062255,-0.01200484,0.012940767,-0.0046944455,-0.0049030115,0.0014910093,-0.019130684,0.019734276,-0.021566272,0.11689393,0.023725916,0.0029920677,0.025095584,0.0028172468,-0.01567043,-0.0023984886,0.019970795,0.013625888,-0.0027926865,-0.02359373,0.0015606218,0.034189384,0.007907556,0.028677965,0.020584878,-0.036783416,-0.0072194384,0.011236371,0.0011984647,-0.0057632783,0.014776197,0.019036751,-0.0070121735,0.007670705,0.017937131,0.0013590963,0.0063064955,0.01095025,0.0011320445,-0.03214679,0.028334018,0.003674131,-0.012743259,-0.0053855316,0.02053521,0.005672504,-0.00047381394,-0.01821004,-0.16095051,0.013896965,0.013198658,-0.006172223,0.01398221,0.025595555,-0.012024566,-0.033136874,0.030302344,-0.03129284,-0.020692706,-0.008586414,-0.02679037,0.020482063,-0.009306377,-0.033408165,-0.012740062,-0.022626607,-0.0067038997,-0.003286232,-0.0110864015,0.008508401,-0.018957915,0.03177381,0.018547341,0.021915313,0.017640272,0.010186079,-0.011827337,0.008770876,0.01810741,-0.0044768667,0.009190298,-0.0044551245,-0.009732552,0.0073927473,0.0048974003,0.0051151505,-0.0022197962,-0.0072318614,-0.0030734388,0.010369981,-0.0046074404,-0.02385373,-0.0029986314,-0.025783831,-0.008921351,-0.0070630116,-0.017457373,-0.0010700102,0.03627434,-0.0076846033,-0.020924676,0.0071483273,0.015160479,0.006236474,0.004999409,0.009087174,-0.015419508,0.016177543,0.026714202,-0.010778021,-0.108568944,-0.028970953,-0.002993668,-0.01691056,-0.011914598,-0.0143531235,0.03115615,0.020979298,0.0135860015,-0.016981125,-0.008534605,-0.0013386261,-0.00895411,-0.01231603,0.026411692,-0.023015944,0.0015858888,-0.003950224,0.021085754,0.0018501693,0.0069319014,0.008486811,0.0008858196,-0.009945281,-0.012365996,-0.009277937,0.0071742176,-0.011988373,-0.009334961,0.010721722,-0.020142926,-0.02444023,0.012390639,0.012870199,0.022183973,0.010165201,-0.016578952,-0.008832786,-0.014584499,0.01774072,-0.027130375,0.0068391445,0.0053387345,-0.016074423,-0.004685026,0.0063512404,-0.009207344,0.0002605502,-0.002346528,0.022452524,-0.010322168,-0.02395409,-0.009466138,0.010093898,5.771253e-05,-0.010651132,-0.008334741,-0.0038930336,-0.0038997969,0.015674377,-0.0032730112,-0.024076642,0.0010184532,-0.016101504,0.014336851,0.012673687,0.0026249848,0.010033778,0.0073245196,-0.018138321,0.010073981,0.010132513,0.011076705,0.0014683407,0.021509372,-0.02244002,0.021483429,0.029780732,-0.009179358,-0.0098972935,-0.026635312,-0.010409366,0.018116185,-0.029655933,0.012509295,0.0053733713,-0.0116979275,0.027182858,-0.00788738,-0.032957662,-0.0028491467,0.010288323,-0.02288591,0.011533359,-0.010595765,0.0087609645,0.01541721,0.004023089,-0.012781243,-0.02059639,-0.015186341,0.0016668935,0.0063066487,-0.0074085463,-0.0013861314,0.008763404,-0.010692342,0.0213825,-0.0019810698,0.003275177,-0.0027760342,-0.006105652,-0.010813188,-0.0037686646,-0.017064383,-0.0045899823,0.0060317637,0.003965206,-0.005363998,0.0385615,0.008558637,-0.014374622,-0.012044143,0.031423453,-0.0019876065,-0.021069944,0.020414343,0.017380802,0.0077738794,0.0055750306,-0.02046123,0.027371738,0.002521877,-0.028067369,0.026350785,0.0012470305,-0.0042970222,-0.006763607,-0.034474995,0.022648493,-0.011340908,-0.010392981,0.019585777,-0.004508903,0.025403367,0.017706517,-0.0010184448,-0.0056778877,-0.019347467,0.009545744,0.026942367,0.007475767,-0.00060556555,-0.023299888,0.0054763514,-0.008207476,0.016889894,0.034126785,0.0063720876,0.0048208283,-0.012952299,-0.04584923,0.036016285,0.006947497,0.013231856,0.019786347,0.013468269,0.008828244,0.0124556245,-0.03170499,-0.003971924,-0.012921853,-0.018734915,-0.004716157,-0.028847894,0.017967932,-0.0033551243,-0.0022186143,0.0017577711,-0.009994745,-0.0398762,0.002006746,-0.015206987,-0.014562435,0.00055008987,-0.02938132,0.021642247,-0.006564103,0.033705432,-0.012258427,0.019479781,-0.0133413365,0.013949844,-0.013100116,-0.0042950916,0.006803901,-0.02340813,0.0065353885,-0.007518461,-0.007569977,-0.008240284,-0.0024480696,0.014589978,-0.007330155,-0.015181247,-0.0071584694,-0.005851361,-0.006251964,0.011307424,0.0059208316,-0.0050189444,-0.0072523328,0.0067374036,0.018944236,0.002820283,0.0036248458,-0.01770818,0.024931706,-0.0059686983,-0.009437512,-0.00069958676,0.0013526621,-0.0010662153,-0.019946184,-0.030738376,-0.0053100972,0.0026632207,0.00013468017,0.012437505,-0.033666797,0.009537985,0.0023510035,0.02605627,-0.029805832,-0.0341616,-0.0041855285,0.0072194203,0.014634988,-0.01833929,-0.03505655,0.023477698,-0.018905208,-0.0040830947,0.009857164,0.01690485,0.038155653,0.000100275276,-0.0007205072,0.013157641,0.018584283,-0.035645537,0.012657031,-0.0050598825,-0.03968499,-0.015963687,-0.0194946,-0.0050839446,0.030942336,-0.015670383,0.0011621321,-0.003707061,-0.005586113,-0.01712149,0.0061464924,0.012777589,-0.01468901,0.036903,-0.022001598,-0.014197236,-0.023481125,0.007226747,0.035053246,0.015898094,0.029817132,0.026827771,0.016264273,0.0036315878,-0.0009723231,0.02541386,-0.016275791,-0.013034706,0.0017814081,0.015011853,-0.005201598,-0.028824149,0.0031630725,-0.015591933,0.0013588156,-0.02537358,-0.0020526687,0.006378003,-0.004292753,0.015507168,-0.012045565,-0.025607685,0.00013419421,0.011177644,0.012537347,-0.013714059,0.0064339503,-0.00076075,0.01614227,0.011314507,0.026724746,-0.024739044,-0.007411718,0.034671772,0.0036958007,-0.0036504322,-0.0031509257,0.010460132,-0.039610535,-0.036685217,0.011947243,-0.029047843,0.016739879,-0.01703366,-0.008981478,0.0015975935,-0.030308843,-0.015041301,-0.02594157,-5.441369e-05,0.00014526007,-0.0348536,-0.031990256,0.014103378,0.0141665675,0.004394067,-0.018124096,-0.009856715,0.011811997,-0.015086401,-0.030522157,0.009869627,0.036319766,0.0042033843,0.02273613,-0.0073795114,-0.0011478603,-0.0131186275,-0.003396009,0.009287281,-0.03049374,-0.0178664,-0.008821377,-0.013834586,0.002134842,0.021099579,0.0045863744,-0.0070812805,-0.0054581775,0.012913075,-0.014240495,-0.006253023,-0.0037740706,0.030539555,-0.006004059,0.02987261,0.004913103,0.0068455436,-0.016886972,0.0067971237,-0.009108657,-0.0070968904,-0.018306157,-0.0013960759,0.0027725226,-0.013281609,-0.009849076,0.0208902,0.0175151,0.02375157,-0.034560934,-0.003301984,0.0052507757,0.035786837,0.038348645,-0.017031968,0.0016856706,0.00029497818,-0.014505315,-0.028116176,-0.01283692,0.036134288,-0.0051881857,0.0017243989,0.0065250183,-0.012795639,0.0054252455,0.0050250073,0.008811768,-0.01759587,-0.012559981,-0.005573362,-0.004565482,-0.016759213,0.02569235,-0.021228584,-0.007041564,0.00017329975,-0.00020416992,0.017138856,-0.0372937,-0.017255329,-0.035001542,-0.024856703,0.0031172785,0.035378445,0.0006878464,-0.02188512,0.027552618,-0.022206102,0.032607395,0.025876751,-0.016315466,0.03599388,0.008228634,-0.030985866,0.00903823,0.0038231374,0.017538046,0.0041394187,-0.016171843,0.03035684,-0.02014408,0.0065907137,-0.012682766,-0.014998718,0.03265142,-0.10134497,-0.0037494458,0.01433455,-0.023866482,-0.02302057,-0.034701176,-0.0082748,-0.029228786,-0.0047518103,0.021529684,0.015387508,-0.0071866717,0.019434165,-0.03306919,-0.031117728,-0.011113733,-0.01884653,-0.008935435,0.038311794,-0.020990212,0.028215714,0.017527452,0.019342517,0.013809202,-0.0132473605,-0.0133558735,0.011868411,-0.0028480166,0.024214081,-0.019903852,-0.045294974,-0.02017267,-0.0018207944,0.011648664,-0.006855008,0.00017905478,0.027561309,0.0014481507,0.00051352044,0.0020094996,0.0035977906,0.0017630585,-0.029249536,-0.0015576275,-0.020615155,0.0014313469,-0.01897284,0.022753293,-0.014365619,0.017599655,-0.02151144,-0.018090578,-0.0033737677,-0.028392648,0.0053114803,0.014239282,-0.0023933074,-0.0034010273,-0.022333315,0.01413994,-0.01285387,-0.026196677,-0.003217403,0.043572657,-0.016652742,-0.0093450155,-0.024432313,0.03133257,0.007929149,0.02303744,-0.028705688,0.005337382,0.021312637,-0.0107876435,-0.024945535,-0.007011809,0.00095930614,-0.009239183,0.0044831163,-0.0035883123,-0.017794361,-0.03036779,-0.090737864,0.011682691,0.0053790123,-0.010712072,0.023672827,-0.008165526,0.015699418,-0.024032988,0.011431225,-0.018076029,-0.012212864,0.004885006,-0.017365541,-0.01683894,-0.012667689,-0.0020002832,-0.0052846055,-0.011591583,-0.015548854,-0.047213852,-0.017981812,-0.0155828595,0.038685147,-0.017332658,-0.02559852,-0.015013927,0.021505985,-0.0025273843,0.0037352531,-0.0054861037,-0.0020812599,-0.12611884,-0.01237337,-0.014460324,-0.019970369,0.0070715486,-0.009151827,0.00060807535,-0.0020076432,-0.004931077,0.00083458296,-0.020451475,-0.025842672,-0.014576219,0.024828786,-0.0027019065,0.13595644,-0.017140582,0.019604586,-0.03414688,-0.027961208,-0.013522562,-0.013949767,-0.015711095,0.03030384,-0.0023651936,-0.010402239,0.03220991,-0.0045075244,0.007631184,0.0081762355,0.046246476,0.009713052,0.000117404175,-0.0064519695,-0.004452495,0.0074260854,-0.0009588559,-0.03301137,0.019751642,0.010743572,0.008363558,0.027852701,-0.014144182,-0.0274662,-0.008164035,-0.004286831,-0.005016366,-0.00761197,-0.02835524,-0.0029698228,-0.018683756,-0.08091431,-0.0027909353,-0.0039742617,0.00079872593,-0.005087894,0.0043803104,0.00897162,0.044096917,0.012612671,-0.008197486,-0.0056778253,0.011057997,0.005305672,-0.035055067,-0.008738614,0.04120857,0.025455099,-0.009675857,0.017644914,-0.018479895,0.005983128,-0.0045414655,-0.02744517,-0.02205191,-0.00691931,0.0013186699,-0.0014403105,0.0060127135,-0.027037486,-0.0028558988,0.025152538,-0.018090043,-0.020181239,0.0072064274,-0.014186453,0.0011740869,-0.0012004405,0.018315423,-0.03088498,-0.008730478,0.011553802,0.0023979836,-0.005594025,0.0070832656,0.022513794,0.006691813,0.0010362838,0.03709817,0.011064541,-0.017038904,-0.0118433945,-0.016811026,-0.030209456,0.0104802335,0.0101939365,-0.0061741853,0.024634657,0.014578689,0.0010954312]",[37,39,41,43,45],{"name":14,"slug":38},"fp8-kv-cache",{"name":15,"slug":40},"vllm",{"name":13,"slug":42},"turboquant",{"name":16,"slug":44},"kv-cache-quantization",{"name":17,"slug":46},"llm-推理",{"id":28,"slug":48,"title":49,"language":50},"turboquant-vllm-comparison-fp8-kv-cache-en","TurboQuant vs FP8: vLLM’s first broad test","en",[52,58,64,70,76,82],{"id":53,"slug":54,"title":55,"cover_image":56,"image_url":56,"created_at":57,"category":27},"667b72b6-e821-4d68-80a1-e03340bc85f1","turboquant-seo-shift-small-sites-zh","TurboQuant 與小站 SEO 變化","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778840440690-kcw9.png","2026-05-15T10:20:27.319472+00:00",{"id":59,"slug":60,"title":61,"cover_image":62,"image_url":62,"created_at":63,"category":27},"c15f45ee-a548-4dbf-8152-91de159c1a11","llmbda-calculus-agent-safety-rules-zh","LLMbda 演算替 AI 代理人立安全規則","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778825503412-mlbf.png","2026-05-15T06:10:34.832664+00:00",{"id":65,"slug":66,"title":67,"cover_image":68,"image_url":68,"created_at":69,"category":27},"0c02225c-d6ff-44f8-bc92-884c8921c4a3","low-complexity-beamspace-denoiser-mmwave-mimo-zh","更簡單的毫米波波束域去噪器","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778814650361-xtc2.png","2026-05-15T03:10:30.06639+00:00",{"id":71,"slug":72,"title":73,"cover_image":74,"image_url":74,"created_at":75,"category":27},"9d27f967-62cc-433f-8cdb-9300937ade13","ai-benchmark-wins-cyber-scare-defenders-zh","為什麼 AI 基準賽在資安領域的勝利，應該讓防守方警醒","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778807450006-nofx.png","2026-05-15T01:10:29.379041+00:00",{"id":77,"slug":78,"title":79,"cover_image":80,"image_url":80,"created_at":81,"category":27},"bc402dc6-5da6-46fc-9d66-d09cb215f72b","why-linux-security-needs-patch-wave-mindset-zh","為什麼 Linux 安全需要「補丁浪潮」思維","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778741449813-s2wn.png","2026-05-14T06:50:24.052583+00:00",{"id":83,"slug":84,"title":85,"cover_image":86,"image_url":86,"created_at":87,"category":27},"d75b5708-d4ec-4c46-9592-fa0a68d4bc26","judge-reliability-harness-stress-tests-llm-judges-zh","LLM 評審也會不穩","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778740856189-g1zr.png","2026-05-14T06:40:32.198872+00:00",[89,94,99,104,109,114,119,124,129,134],{"id":90,"slug":91,"title":92,"created_at":93},"f18dbadb-8c59-4723-84a4-6ad22746c77a","deepmind-bets-on-continuous-learning-ai-2026-zh","DeepMind 押注 2026 連續學習 AI","2026-03-26T08:16:02.367355+00:00",{"id":95,"slug":96,"title":97,"created_at":98},"f4a106cb-02a6-4508-8f39-9720a0a93cee","ml-papers-of-the-week-github-research-desk-zh","每週 ML 論文清單，為何紅到 GitHub","2026-03-27T01:11:39.284175+00:00",{"id":100,"slug":101,"title":102,"created_at":103},"c4f807ca-4e5f-47f1-a48c-961cf3fc44dc","ai-ml-conferences-to-watch-in-2026-zh","2026 AI 研討會投稿時程整理","2026-03-27T01:51:53.874432+00:00",{"id":105,"slug":106,"title":107,"created_at":108},"9f50561b-aebd-46ba-94a8-363198aa7091","openclaw-agents-manipulated-self-sabotage-zh","OpenClaw Agent 會自己搞砸自己","2026-03-28T03:03:18.786425+00:00",{"id":110,"slug":111,"title":112,"created_at":113},"11f22e92-7066-4978-a544-31f5f2156ec6","vega-learning-to-drive-with-natural-language-instructions-zh","Vega：使用自然語言指示進行自駕車控制","2026-03-28T14:54:04.847912+00:00",{"id":115,"slug":116,"title":117,"created_at":118},"a4c7cfec-8d0e-4fec-93cf-1b9699a530b8","drive-my-way-en-zh","Drive My Way：個性化自駕車風格的實現","2026-03-28T14:54:26.207495+00:00",{"id":120,"slug":121,"title":122,"created_at":123},"dec02f89-fd39-41ba-8e4d-11ede93a536d","training-knowledge-bases-with-writeback-rag-zh","用 WriteBack-RAG 強化知識庫提升檢索效能","2026-03-28T14:54:45.775606+00:00",{"id":125,"slug":126,"title":127,"created_at":128},"3886be5c-a137-40cc-b9e2-0bf18430c002","packforcing-efficient-long-video-generation-method-zh","PackForcing：短影片訓練也能生成長影片","2026-03-28T14:55:02.688141+00:00",{"id":130,"slug":131,"title":132,"created_at":133},"72b90667-d930-4cc9-8ced-aaa0f8968d44","pixelsmile-toward-fine-grained-facial-expression-editing-zh","PixelSmile：提升精細臉部表情編輯的新方法","2026-03-28T14:55:20.678181+00:00",{"id":135,"slug":136,"title":137,"created_at":138},"cf046742-efb2-4753-aef9-caed5da5e32e","adaptive-block-scaled-data-types-zh","IF4：神經網路量化的聰明選擇","2026-03-31T06:00:36.990273+00:00"]