[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"article-turboquant-quantization-accuracy-performance-study-zh":3,"article-related-turboquant-quantization-accuracy-performance-study-zh":31,"series-research-456ad15d-693b-4a13-8896-23d26e57c4de":83},{"id":4,"slug":5,"title":6,"content":7,"summary":8,"source":9,"source_url":10,"author":11,"image_url":12,"cover_image":12,"category":13,"language":14,"translated_content":11,"related_article_id":15,"keywords":16,"key_takeaways":23,"views":27,"created_at":28,"published_at":29,"topic_cluster_id":30},"456ad15d-693b-4a13-8896-23d26e57c4de","turboquant-quantization-accuracy-performance-study-zh","TurboQuant 讓 4-bit 不再亂猜","\u003Cp data-speakable=\"summary\">\u003Ca href=\"\u002Ftag\u002Fturboquant\">TurboQuant\u003C\u002Fa> 的量化\u003Ca href=\"\u002Fnews\u002Fgoogle-deepmind-contextual-ai-hiring-licensing-deal-zh\">研究\u003C\u002Fa>把 8-bit、4-bit、PTQ、QAT 變成一套可直接照抄的部署選型流程。\u003C\u002Fp>\u003Cp>我調模型調久了，最煩的就是量化這一段。大家都很愛在簡報上講「模型變小、速度變快」，但真的一上線，accuracy 掉一點、輸出開始怪、某些 prompt 直接翻車，氣氛就像你剛把房間整理好，結果貓立刻吐在地毯上。我看過團隊先選 8-bit，因為聽起來保守；也看過人一口氣衝 4-bit，因為記憶體圖很漂亮，最後才發現最在意的那幾個案例被壓壞了。\u003C\u002Fp>\u003Cp>所以我看到 \u003Ca href=\"https:\u002F\u002Fdasroot.net\u002Fposts\u002F2026\u002F05\u002Fturboquant-comprehensive-study-quantization-accuracy-performance\u002F\">TurboQuant Comprehensive Study: Quantization Accuracy and Performance\u003C\u002Fa> 這篇時，眼睛有亮一下。它不是來賣神藥的，而是把 post-training quantization、quantization-aware training、static \u002F dynamic quantization，還有 \u003Ca href=\"https:\u002F\u002Fwww.tensorflow.org\u002F\">TensorFlow\u003C\u002Fa>、\u003Ca href=\"https:\u002F\u002Fpytorch.org\u002F\">PyTorch\u003C\u002Fa>、\u003Ca href=\"https:\u002F\u002Fgithub.com\u002Funslothai\u002Funsloth\">Unsloth\u003C\u002Fa>、\u003Ca href=\"https:\u002F\u002Fgithub.com\u002FTensorRT-LLM\u002FTensorRT-LLM\">TensorRT-LLM\u003C\u002Fa> 這些實作脈絡直接攤開來講。原文沒有提供可驗證的觀看數、星數或書籤數，我就不亂掰了。\u003C\u002Fp>\u003Ch2>別把量化當成一個開關\u003C\u002Fh2>\u003Cblockquote>Quantization in machine learning refers to the process of reducing the precision of model weights and activations, typically from 32-bit floating point (FP32) to lower bit-width representations such as 8-bit integers (INT8) or even 4-bit.\u003C\u002Fblockquote>\u003Cp>翻譯一下就是：量化不是一招通吃的加速術，它其實是在做取捨。你拿數值精度去換記憶體、頻寬、延遲，順便也把模型行為的一部分穩定性一起押上去。TurboQuant 這篇我喜歡的地方，是它沒有把這件事包裝成「選低 bit 就對了」，而是先講清楚：bit 越低，成本通常越漂亮，但模型會不會歪，要看你壓到哪裡、壓的是誰。\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1779287080305-zb4c.png\" alt=\"TurboQuant 讓 4-bit 不再亂猜\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>我之前碰過一個要上 edge 的 transformer 專案，團隊第一句話不是問「哪個任務最敏感」，而是問「最低可以壓到幾 bit」。這順序就錯了。你真正要問的是：這個模型在被\u003Ca href=\"\u002Fnews\u002Fwhy-kv-cache-compression-will-decide-edge-ai-inference-zh\">壓縮\u003C\u002Fa>之後，會在哪些地方壞掉？有些模型對量化很耐打，有些只要一壓就開始在特定語言、特定 layer、特定輸入型態上出事。你不先知道這些，後面所有漂亮圖表都只是安慰劑。\u003C\u002Fp>\u003Cp>實操寫法很簡單：先把你要優化的目標寫死。是 RAM、latency、吞吐、功耗，還是成本？不要把「全部都要」寫進需求文件，因為那只是把責任推給工程師。接著拿真實工作負載去測，不要只看乾淨 benchmark。\u003C\u002Fp>\u003Cul>\u003Cli>FP32：只有在你真的需要精度餘裕時才留著。\u003C\u002Fli>\u003Cli>INT8：通常是最保守、最好辯護的壓縮起點。\u003C\u002Fli>\u003Cli>4-bit：先確認敏感層與敏感任務，再決定要不要壓。\u003C\u002Fli>\u003C\u002Ful>\u003Ch2>PTQ 是快路，QAT 是保險\u003C\u002Fh2>\u003Cblockquote>Post-training quantization involves applying quantization to a pre-trained model without retraining it.\u003C\u002Fblockquote>\u003Cp>白話講，PTQ 就是先把模型訓練完，再直接壓縮。它快、便宜、流程乾淨，對團隊來說很有誘惑力，因為你不用重跑整套訓練管線。但 TurboQuant 也講得很直白：你省掉了訓練成本，也可能把 accuracy 一起省掉。\u003C\u002Fp>\u003Cp>相對地，QAT 是把量化直接塞進訓練過程，讓模型在訓練時就學會適應低精度世界。這通常能保住比較多品質，但代價就是更多時間、更多算力、更多訓練麻煩。我做過幾個案子，PTQ 一試就過，根本不用折騰；也做過幾個案子，PTQ 直接把關鍵任務打爛，最後只能回頭做 QAT。兩邊都不是錯，錯的是你一開始沒先量清楚損失有多大。\u003C\u002Fp>\u003Cp>這也是很多團隊會卡住的地方：PTQ 失敗了，就怪模型不行；QAT 太麻煩了，就怪流程太重。其實你只是沒先算帳。TurboQuant 的實用價值就在這裡，它把 PTQ \u002F QAT 的選擇拉回成本與風險，不是拉回信仰。\u003C\u002Fp>\u003Cp>實操寫法：先拿 PTQ 跑一輪，validation set 一定要像 production，不要拿教科書式資料騙自己。如果品質掉幅可接受，就停。只有當掉幅真的傷到產品，才值得上 QAT。別用直覺決定，要用「掉多少」和「重訓多貴」決定。\u003C\u002Fp>\u003Cul>\u003Cli>PTQ：適合快速部署、低風險壓縮。\u003C\u002Fli>\u003Cli>QAT：適合對品質很敏感的模型。\u003C\u002Fli>\u003Cli>先 PTQ，真的不行再 QAT，這順序最省事。\u003C\u002Fli>\u003C\u002Ful>\u003Ch2>Static 和 dynamic 的差別，是你什麼時候付校準費\u003C\u002Fh2>\u003Cblockquote>Static quantization determines the range of values for weights and activations during the calibration phase, which is typically done using a representative dataset.\u003C\u002Fblockquote>\u003Cp>這句話翻白話就是：static quantization 會先做校準，把範圍定好，再拿這組規則去跑推論。它比較穩，也通常比較準，但前提是你的 calibration dataset 要真的像 production。你如果拿一包太乾淨、太漂亮的資料去校準，最後就是在 notebook 裡看起來很正常，到了真實流量開始歪。\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1779287081440-yh3k.png\" alt=\"TurboQuant 讓 4-bit 不再亂猜\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>dynamic quantization 則比較像是把一部分判斷留到推論時處理。TurboQuant 的描述很清楚：它在某些情境下更靈活，也可能更省事，但不一定能保住 static 那種穩定感。這不是誰比較高級的問題，是你要的是 predictability 還是 flexibility。\u003C\u002Fp>\u003Cp>我以前踩過一個很蠢的坑：static quantization 明明流程都對，結果就是怪。後來才發現校準資料太乾淨，沒有包含真實使用者會丟進來的髒輸入。模型不是在 production 壞掉，是在一開始就被我校準壞了。這種 bug 最討厭，因為它看起來像模型問題，其實是資料問題。\u003C\u002Fp>\u003Cp>實操寫法：如果你選 static，就把校準集當成正式資產來做，別隨便抓幾筆湊數。如果你選 dynamic，就去量 runtime overhead，別讓你以為省下來的東西，最後被執行時成本吃回去。兩種都要在目標硬體上測，不要只在訓練機上自嗨。\u003C\u002Fp>\u003Cp>TurboQuant 也提到 TensorFlow 這邊的工具鏈，像 \u003Ca href=\"https:\u002F\u002Fwww.tensorflow.org\u002Flite\u002Fconvert\">TFLiteConverter\u003C\u002Fa> 和 \u003Ca href=\"https:\u002F\u002Fwww.tensorflow.org\u002Fmodel_optimization\">TensorFlow Model Optimization Toolkit\u003C\u002Fa>。這很重要，因為量化不是純理論，它是工具鏈問題；工具不好用，團隊最後就會默默退回大模型。\u003C\u002Fp>\u003Ch2>8-bit 之所以無聊，是因為它常常最對\u003C\u002Fh2>\u003Cblockquote>Recent benchmarks from 2026 demonstrate that 8-bit quantization typically retains over 99% of the original model’s accuracy across academic and real-world tasks.\u003C\u002Fblockquote>\u003Cp>這句話的意思很直接：8-bit 通常是最穩的中間解。它能明顯省記憶體，也能帶來實際推論收益，但不會像更低 bit 那樣逼模型去做太激烈的妥協。TurboQuant 引的 2026 benchmark 顯示，8-bit 通常能保住超過 99% 的原始 accuracy；4-bit 則會再掉一點，但在像 MMLU、ArenaHard 這類標準 benchmark 上還是維持得不錯。\u003C\u002Fp>\u003Cp>我知道「無聊」聽起來像嫌棄，但在 production 裡，無聊通常就是好事。只要你的應用是雲端 chatbot、搜尋 reranker、企業分類器這種對品質敏感的東西，8-bit 往往就是最容易說服團隊的答案。你拿得到不少好處，又不太需要準備救火流程。\u003C\u002Fp>\u003Cp>TurboQuant 也提醒一件常被忽略的事：8-bit 不是單一格式，像 W8A8-INT、W8A8-FP 這些 scheme 會因硬體 backend 不同而有差。也就是說，你不能只看「8-bit」三個字就下結論。\u003Ca href=\"\u002Ftag\u002Fnvidia\">NVIDIA\u003C\u002Fa>、AMD、Intel、甚至不同 runtime 的表現都可能不一樣。很多人拿不同部落格的數字互相比，然後很認真地下結論，我每次看到都想翻白眼。\u003C\u002Fp>\u003Cp>實操寫法：如果你是 server-side 部署，而且最在意的是不要掉太多品質，先從 8-bit 開始。用你的真實 workload 測 throughput、memory、輸出品質。只要收益夠，就停，不要因為 4-bit 聽起來更酷就硬追。\u003C\u002Fp>\u003Cul>\u003Cli>適合作為 production 預設值。\u003C\u002Fli>\u003Cli>通常最容易跟 PM、SRE、產品一起對齊。\u003C\u002Fli>\u003Cli>比較不容易踩到離譜的品質回歸。\u003C\u002Fli>\u003C\u002Ful>\u003Ch2>4-bit 才是 trade-off 真正開始咬人的地方\u003C\u002Fh2>\u003Cblockquote>4-bit quantization shows slightly lower but still impressive results, often maintaining 96–98% accuracy on standardized benchmarks like MMLU and ArenaHard.\u003C\u002Fblockquote>\u003Cp>這句話的\u003Ca href=\"\u002Fnews\u002F5-kv-cache-takeaways-for-llamacpp-users-zh\">重點\u003C\u002Fa>是：4-bit 不是免費午餐，但它可能是很划算的一餐。TurboQuant 說 4-bit 在標準 benchmark 上常能維持 96–98% accuracy，同時帶來更強的壓縮，有些情境還能換到更好的 latency。對 edge device、筆電、小型 server 來說，這些差距有時候就決定模型能不能裝得下、跑得動。\u003C\u002Fp>\u003Cp>我很熟那種「不然就 4-bit 啊」的提案。大家看到壓縮比就開始興奮，忘了模型行為會變脆。這種脆弱有時候很小，小到 benchmark 看不出來；但如果你的工作是\u003Ca href=\"\u002Ftag\u002F長上下文\">長上下文\u003C\u002Fa>對話、關鍵字抽取、嚴格格式輸出，那一點點不穩就可能變成產品事故。4-bit 真正考驗的是：你能不能接受「大多數時候很好，少數地方怪掉」。\u003C\u002Fp>\u003Cp>TurboQuant 提到 \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Funslothai\u002Funsloth\">Unsloth Dynamic v2.0\u003C\u002Fa>，還有 \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FIST-DASLab\u002Fbrevitas\">Brevitas\u003C\u002Fa> \u002F Qronos 這類生態，重點不是工具名單多長，而是整個方向已經從「硬壓」走向「更聰明地壓」。也就是說，4-bit 不再只是把權重砍得更狠，而是透過更好的校準、層級選擇、誤差處理，盡量把行為留住。\u003C\u002Fp>\u003Cp>實操寫法：當你的部署目標真的卡在記憶體或裝置限制時，再考慮 4-bit。先用代表性資料做校準，再拿 8-bit 當對照組，直接在同一台硬體上比。若 4-bit 只快一點點，卻讓輸出品質掉很多，那就別硬上。省下來的那點資源，不值得你後面一直修 bug。\u003C\u002Fp>\u003Cp>對本地端和 edge 來說，4-bit 的價值很現實：有時候不是更快，而是原本根本跑不動，現在終於能跑。這種差異比任何漂亮圖都實際。\u003C\u002Fp>\u003Ch2>2026 的工具鏈，終於比較像樣了\u003C\u002Fh2>\u003Cblockquote>TensorFlow 2.17 introduced enhanced support for dynamic quantization and optimized the conversion process for TFLite models, achieving up to 3x speed improvements in inference on Intel Arc B580 Graphics.\u003C\u002Fblockquote>\u003Cp>這句話的意思很務實：量化這件事，現在不只是演算法對不對，而是工具鏈終於有沒有跟上。TurboQuant 也提到 TensorFlow 2.17、PyTorch 2.7 這些框架在量化支援上的改善，以及 TensorRT 對 FP8、FP4、INT8、INT4 的支持。這些東西看起來很工程味，但它們決定了你到底是在做優化，還是在做手工藝。\u003C\u002Fp>\u003Cp>我現在比以前更在意這段，因為部署摩擦就是隱形成本。你如果要靠一支只有某位資深工程師會修的腳本、或一條很脆的轉換流程，去維持量化版本，那這個方案大概率活不久。團隊最後還是會回去用大模型，因為大家都比較懂，出事也比較好排。\u003C\u002Fp>\u003Cp>TurboQuant 把這些 backend 支援拉進來，是在提醒你：選 precision 不只是選數學形式，也是選 runtime 路線。你如果選了一個 backend 根本不吃的格式，後面所有 benchmark 都只是紙上談兵。\u003C\u002Fp>\u003Cp>實操寫法：在你決定 bit-width 之前，先確認你的部署堆疊真的支援它。TensorFlow、PyTorch、TFLite、TensorRT、行動端 runtime，各自支援的東西都不一樣。不要等到模型轉不出去，才開始怪框架。\u003C\u002Fp>\u003Cul>\u003Cli>工具支援度跟演算法一樣重要。\u003C\u002Fli>\u003Cli>backend 不合，優化就會變成負債。\u003C\u002Fli>\u003Cli>一定要在目標 runtime 上做最後 benchmark。\u003C\u002Fli>\u003C\u002Ful>\u003Ch2>可抄的模板\u003C\u002Fh2>\u003Cpre>\u003Ccode># Quantization selection checklist\n\n## Goal\n- Primary constraint: [accuracy | latency | memory | power | cost]\n- Deployment target: [server | mobile | edge | embedded]\n- Acceptable accuracy drop: [e.g. &lt;1%, &lt;3%, task-specific]\n- Hardware target: [NVIDIA | AMD | Intel | Apple | ARM]\n\n## Step 1: Start with PTQ\n- Apply post-training quantization to the trained model.\n- Use a validation set that matches production inputs.\n- Measure:\n  - accuracy \u002F task quality\n  - latency\n  - throughput\n  - memory use\n  - model size\n\n## Step 2: Decide if PTQ is good enough\n- Keep PTQ if the accuracy drop is acceptable.\n- Move to QAT if the model regresses too much.\n- Do not pick QAT unless the accuracy gain justifies retraining cost.\n\n## Step 3: Choose bit-width\n- 8-bit if you want the safest production default.\n- 4-bit if memory or edge deployment is the real bottleneck.\n- 3-bit or lower only if you have specialized tooling and strong benchmark evidence.\n\n## Step 4: Pick the quantization mode\n- Static quantization if you can calibrate with representative data.\n- Dynamic quantization if you need runtime flexibility or simpler setup.\n\n## Step 5: Calibrate properly\n- Build a representative calibration set.\n- Include messy, real-world inputs.\n- Avoid using only clean or synthetic samples.\n\n## Step 6: Benchmark on target hardware\n- Run the quantized model on the exact runtime you plan to ship.\n- Compare against the FP32 baseline.\n- Record quality regressions by task, not just aggregate score.\n\n## Decision rule\n- Use 8-bit for server inference when accuracy matters most.\n- Use 4-bit for edge, mobile, or latency-sensitive deployments.\n- Use QAT when PTQ loses too much quality.\n- Re-evaluate whenever the framework, backend, or model architecture changes.\n\n## Practical notes\n- TensorFlow users: check TFLiteConverter and Model Optimization Toolkit support.\n- PyTorch users: validate backend support before assuming parity.\n- TensorRT users: confirm precision support for your GPU generation.\n- If the benchmark looks great but production quality drops, the calibration set is probably the problem.\n\u003C\u002Fcode>\u003C\u002Fpre>\u003Cp>如果是我自己要把這篇研究變成團隊內部 SOP，我就會直接用上面這份。它的好處不是漂亮，是不會讓大家在會議上用感覺吵架。先定目標，再試 PTQ，再決定要不要升級到 QAT，然後才談 8-bit 還是 4-bit。順序對了，很多爛決策會自己消失。\u003C\u002Fp>\u003Cp>TurboQuant 最有價值的地方，不是告訴你「哪個 bit 最強」，而是逼你承認：量化本來就是在部署條件下做選擇題。這件事看起來很普通，但我看過太多人把它搞成信仰題，最後把自己搞累。\u003C\u002Fp>\u003Cp>來源：\u003Ca href=\"https:\u002F\u002Fdasroot.net\u002Fposts\u002F2026\u002F05\u002Fturboquant-comprehensive-study-quantization-accuracy-performance\u002F\">TurboQuant Comprehensive Study: Quantization Accuracy and Performance\u003C\u002Fa>。本文是我基於原文觀點做的拆解與重寫；模板段落是我整理後的可直接套用版本，不是原文逐字轉錄。\u003C\u002Fp>","我把 TurboQuant 的量化研究拆成一套可直接照抄的選型流程，幫你判斷 8-bit、4-bit、PTQ、QAT 怎麼選。","dasroot.net","https:\u002F\u002Fdasroot.net\u002Fposts\u002F2026\u002F05\u002Fturboquant-comprehensive-study-quantization-accuracy-performance\u002F",null,"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1779287080305-zb4c.png","research","zh","aed5cbda-77cf-4dfe-8606-c8463a64403e",[17,18,19,20,21,22],"quantization","PTQ","QAT","INT8","INT4","deployment",[24,25,26],"先定部署約束，再選 bit-width，不要反過來。","PTQ 先試、QAT 後補，通常最省時間也最實際。","8-bit 是保守預設，4-bit 則是拿品質換更強壓縮。",7,"2026-05-20T14:24:10.883063+00:00","2026-05-20T14:24:10.861+00:00","0c35a120-52fc-41fc-afa3-d404eb934158",{"tags":32,"relatedLang":42,"relatedPosts":46},[33,35,37,38,40],{"name":19,"slug":34},"qat",{"name":18,"slug":36},"ptq",{"name":17,"slug":17},{"name":20,"slug":39},"int8",{"name":21,"slug":41},"int4",{"id":15,"slug":43,"title":44,"language":45},"turboquant-quantization-accuracy-performance-study-en","TurboQuant shows how 4-bit beats guesswork","en",[47,53,59,65,71,77],{"id":48,"slug":49,"title":50,"cover_image":51,"image_url":51,"created_at":52,"category":13},"f374155a-c29e-478c-b7a5-679cad1c51e4","crdts-keep-replicas-in-sync-without-locks-zh","CRDT 讓副本不用鎖也能同步","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781011086259-4p4k.png","2026-06-09T13:17:34.493426+00:00",{"id":54,"slug":55,"title":56,"cover_image":57,"image_url":57,"created_at":58,"category":13},"4b3b5a50-45b7-4238-a38b-160f82e323ff","post-deterministic-systems-autonomous-infra-zh","後決定性分散系：自治基礎設施新框架","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781010194792-5ogb.png","2026-06-09T13:02:32.717551+00:00",{"id":60,"slug":61,"title":62,"cover_image":63,"image_url":63,"created_at":64,"category":13},"04e45398-9814-4907-b416-fcb5b8d69508","causal-learnability-formal-language-tasks-zh","用因果法量化任務可學性","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780987696075-l4g0.png","2026-06-09T06:47:34.438642+00:00",{"id":66,"slug":67,"title":68,"cover_image":69,"image_url":69,"created_at":70,"category":13},"75bcc569-5e89-45c8-b809-6f169e929f4b","rl-training-hands-off-control-gradually-zh","RL 先接管再放手","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780986786312-03yo.png","2026-06-09T06:32:32.849589+00:00",{"id":72,"slug":73,"title":74,"cover_image":75,"image_url":75,"created_at":76,"category":13},"e3ecab4b-7cc7-4246-baf6-e1c170d86ca5","omnigamearena-vlm-game-agent-benchmark-zh","OmniGameArena 讓 VLM 遊戲代理更好比","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780985893022-70pl.png","2026-06-09T06:17:32.189729+00:00",{"id":78,"slug":79,"title":80,"cover_image":81,"image_url":81,"created_at":82,"category":13},"6f25a29c-cbb8-4f53-9af7-1656b394333a","turboquant-cuts-kv-cache-memory-6x-google-tests-zh","TurboQuant 在 Google 測試中省下 6x KV 快取","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780906682236-sqe2.png","2026-06-08T08:17:21.878314+00:00",[84,89,94,99,104,109,114,119,124,129],{"id":85,"slug":86,"title":87,"created_at":88},"f18dbadb-8c59-4723-84a4-6ad22746c77a","deepmind-bets-on-continuous-learning-ai-2026-zh","DeepMind 押注 2026 連續學習 AI","2026-03-26T08:16:02.367355+00:00",{"id":90,"slug":91,"title":92,"created_at":93},"f4a106cb-02a6-4508-8f39-9720a0a93cee","ml-papers-of-the-week-github-research-desk-zh","每週 ML 論文清單，為何紅到 GitHub","2026-03-27T01:11:39.284175+00:00",{"id":95,"slug":96,"title":97,"created_at":98},"c4f807ca-4e5f-47f1-a48c-961cf3fc44dc","ai-ml-conferences-to-watch-in-2026-zh","2026 AI 研討會投稿時程整理","2026-03-27T01:51:53.874432+00:00",{"id":100,"slug":101,"title":102,"created_at":103},"cf046742-efb2-4753-aef9-caed5da5e32e","adaptive-block-scaled-data-types-zh","IF4：神經網路量化的聰明選擇","2026-03-31T06:00:36.990273+00:00",{"id":105,"slug":106,"title":107,"created_at":108},"53a0dc54-0371-4e40-8d5e-74e94a73840c","geometry-aware-similarity-metrics-for-neural-representations-zh","超越距離測量：用微分幾何重新理解神經網路","2026-03-31T06:01:01.241968+00:00",{"id":110,"slug":111,"title":112,"created_at":113},"fee7d472-a775-4b1d-bbc2-1e8bca1bbf8b","on-the-fly-repulsion-in-the-contextual-space-for-rich-divers-zh","讓AI繪圖更有創意：用排斥力提升生成多樣性","2026-03-31T06:01:25.439673+00:00",{"id":115,"slug":116,"title":117,"created_at":118},"a9901203-d69b-447b-8854-15d14eab32b4","vision-aided-beam-prediction-cnn-eca-zh","影像輔助波束預測升級 CNN","2026-04-01T10:00:25.8073+00:00",{"id":120,"slug":121,"title":122,"created_at":123},"b55e7dd4-0a24-4b3d-804d-b0309a03f498","triple-band-fss-mimo-antenna-sub-6-ghz-zh","三頻 FSS MIMO 天線瞄準 sub-6 GHz","2026-04-01T13:18:36.857305+00:00",{"id":125,"slug":126,"title":127,"created_at":128},"f68290bd-e7f3-4b30-ba22-dcd4e0130a66","openclaw-1299-repos-eight-weeks-analysis-zh","OpenClaw 1299 個 Repo 的資料解讀","2026-04-02T05:03:45.208411+00:00",{"id":130,"slug":131,"title":132,"created_at":133},"ed9f80eb-eb02-4d35-8ad4-0ddf428751dd","beam-coherence-aware-combining-mmwave-mimo-zh","毫米波 MIMO 的雙階合併法","2026-04-02T05:27:26.897188+00:00"]