[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"article-adacodec-predictive-visual-code-video-mllms-zh":3,"article-related-adacodec-predictive-visual-code-video-mllms-zh":30,"series-research-3479bdee-21fb-4fda-9572-9394caba01b0":83},{"id":4,"slug":5,"title":6,"content":7,"summary":8,"source":9,"source_url":10,"author":11,"image_url":12,"cover_image":12,"category":13,"language":14,"translated_content":11,"related_article_id":15,"keywords":16,"key_takeaways":22,"views":26,"created_at":27,"published_at":28,"topic_cluster_id":29},"3479bdee-21fb-4fda-9572-9394caba01b0","adacodec-predictive-visual-code-video-mllms-zh","AdaCodec 用預測碼壓縮影片 token","\u003Cp data-speakable=\"summary\">AdaCodec 只編碼難預測的畫面與幀間變化，讓影片 MLLM 用更少 \u003Ca href=\"\u002Ftag\u002Ftoken\">token\u003C\u002Fa> 還能維持表現。\u003C\u002Fp>\u003Cul>\u003Cli>\u003Cstrong>研究機構\u003C\u002Fstrong>：arXiv 摘要未明確標註\u003C\u002Fli>\u003Cli>\u003Cstrong>核心數據\u003C\u002Fstrong>：32k tokens 對比 224k baseline\u003C\u002Fli>\u003Cli>\u003Cstrong>突破點\u003C\u002Fstrong>：預測式視覺碼\u003C\u002Fli>\u003C\u002Ful>\u003Cp>影片\u003Ca href=\"\u002Fnews\u002Fdatabricks-custom-models-aws-overview-zh\">模型\u003C\u002Fa>一直有個老問題：重複資訊太多。相鄰幀常常只是背景、物件位置或局部動作在變，但很多系統還是把每一幀都當成新的 RGB 圖像來編碼。結果就是 token 花得快，推理也變重。\u003C\u002Fp>\u003Cp>\u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2606.02569\">AdaCodec: A Predictive Visual Code for Video MLLMs\u003C\u002Fa> 想處理的就是這種浪費。它不是把每一幀都完整送進模型，而是先判斷這一幀能不能從前文預測出來。能預測的，就只傳幀間變化；難預測的，才送 reference frame。\u003C\u002Fp>\u003Ch2>這篇在解什麼痛點\u003C\u002Fh2>\u003Cp>這篇論文鎖定的是影片 MLLM 的輸入效率問題。現在常見做法，是把抽樣出的每一幀都獨立編成視覺 token。這樣做很直觀，但也很粗暴，因為影片本來就有很強的時間冗餘。\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780381988591-z2sp.png\" alt=\"AdaCodec 用預測碼壓縮影片 token\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>對開發者來說，冗餘不是抽象名詞，而是直接反映在成本上。token 預算更高、推理更慢、能塞進上下文的影片也更少。當影片內容越長，這個問題越明顯。\u003C\u002Fp>\u003Cp>作者的觀點很直接：影片的輸入方式應該跟影片本身的結構一致。也就是說，不要每次都重新描述整張圖，而是優先描述「這一幀相對前文改了\u003Ca href=\"\u002Fnews\u002Fwhy-ai-news-sections-are-failing-readers-zh\">什麼\u003C\u002Fa>」。\u003C\u002Fp>\u003Ch2>AdaCodec 的做法是什麼\u003C\u002Fh2>\u003Cp>AdaCodec 可以理解成一種預測式視覺碼。它的核心不是壓縮單張圖，而是壓縮影片序列中的可預測部分。\u003C\u002Fp>\u003Cp>具體來說，AdaCodec 會在 conditional predictive cost 高的時候，才使用完整 reference frame。若場景其實很好預測，就改用幀間變化來表示，包含 motion 和 prediction residual，並把這些資訊包成更精簡的 P-tokens。\u003C\u002Fp>\u003Cp>白話一點，它不是在重建每一幀，而是在問：下一幀到底新增了什麼。這個思路跟傳統「每幀都是獨立圖片」的管線很不一樣，因為它把編碼重心從內容本身，轉到內容之間的差異。\u003C\u002Fp>\u003Cp>這種設計對系統工程很有吸引力。只要模型能保留足夠的影片資訊，卻少吃很多重複 token，就有機會同時換到更\u003Ca href=\"\u002Ftag\u002F長上下文\">長上下文\u003C\u002Fa>、更低延遲，或更低推理成本。\u003C\u002Fp>\u003Ch2>論文實際證明了什麼\u003C\u002Fh2>\u003Cp>摘要提到，作者在 11 個 \u003Ca href=\"\u002Ftag\u002Fbenchmark\">benchmark\u003C\u002Fa> 上做了評估。它的比較對象是 Qwen3-VL-8B 的 per-frame RGB baseline，而且是在 matched visual-token budget 下比較，也就是盡量把 token 條件拉齊。\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780381979033-vm4m.png\" alt=\"AdaCodec 用預測碼壓縮影片 token\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>最醒目的數字是 token 效率。摘要寫到，即使只有 one-seventh 的 budget，AdaCodec 用 32k tokens 仍然能在所有 long-video benchmarks 上超過 224k baseline。這代表它不是只靠多吃 token 換分數，而是真的把冗餘壓掉了。\u003C\u002Fp>\u003Cp>在五個 general-video benchmarks 上，摘要說 AdaCodec 提升了平均分數，同時把 time-to-first-token 從 9.26 秒降到 1.62 秒。這個差距很實際，因為對互動式產品來說，使用者最先感受到的常常不是最終分數，而是模型多久開始回應。\u003C\u002Fp>\u003Cp>不過，摘要沒有公開完整 benchmark 表格，所以看不到每個任務的細部分數差，也沒有把 11 個 benchmark 的完整\u003Ca href=\"\u002Fnews\u002Fai-resist-list-global-pushback-zh\">清單\u003C\u002Fa>與設定全部列出來。就摘要能確認的範圍來看，AdaCodec 的訊號很明確：更少 token、更低延遲，而且表現沒有跟著掉下去。\u003C\u002Fp>\u003Ch2>這對開發者代表什麼\u003C\u002Fh2>\u003Cp>如果你在做影片問答、會議摘要、監控分析，或任何要吃影片的 MLLM 產品，token 效率都不是小事。它會直接影響你能處理多長的影片、一次能塞多少上下文，以及每次請求的成本。\u003C\u002Fp>\u003Cp>AdaCodec 提供的是一種更務實的輸入層設計。它不是一味擴大模型，而是重新定義影片該怎麼進模型。對重複性高的影片資料來說，這種 predictive coding 比 per-frame RGB encoding 更貼近資料本身的樣子。\u003C\u002Fp>\u003Cp>這也提醒一件事：有時候性能提升不一定只靠更大的模型，還可能來自更聰明的資料介面。AdaCodec 的重點就在這裡。它是在輸入端做優化，但摘要聲稱這個改動已經能同時帶來分數與延遲上的改善。\u003C\u002Fp>\u003Ch2>還有哪些限制要注意\u003C\u002Fh2>\u003Cp>摘要雖然給了方向，但還沒有把細節講滿。它沒有說 predictive cost 具體怎麼算，也沒有交代 P-tokens 的內部形成方式。\u003C\u002Fp>\u003Cp>另外，摘要也沒有說 AdaCodec 是否需要特殊訓練資料，是否能直接跨模型家族使用，或是在快速切鏡、鏡頭晃動、嚴重遮擋這些情境下會不會變得不穩定。這些都是真正在產品裡會遇到的狀況。\u003C\u002Fp>\u003Cp>所以現在比較適合的解讀，不是「影片 MLLM 已經被解決」，而是「這篇提出了一條更省 token 的路」。從摘要看，它至少在幾類 benchmark 上證明了這條路可行。\u003C\u002Fp>\u003Ch2>總結\u003C\u002Fh2>\u003Cp>AdaCodec 把影片理解重新包裝成一個預測問題：能預測的內容就少傳，真的變動了再補上。摘要顯示，這個做法能在較低 token 預算下維持甚至提升表現，還能明顯縮短首 token 延遲。\u003C\u002Fp>\u003Cp>對開發者來說，這個方向的價值很直接。如果後續完整論文與實作細節能站得住腳，預測式視覺碼有機會讓影片 MLLM 變得更便宜、更快，也更容易擴到長影片場景。\u003C\u002Fp>\u003Cul>\u003Cli>它把影片冗餘從輸入端先壓掉。\u003C\u002Fli>\u003Cli>它用 reference frame 搭配 P-tokens 表示變化。\u003C\u002Fli>\u003Cli>它在摘要中同時給出效率與延遲改善訊號。\u003C\u002Fli>\u003C\u002Ful>","AdaCodec 用預測式視覺碼只編碼難預測畫面與幀間變化，讓影片 MLLM 在更少 token 下維持表現，還能降低首 token 延遲。","arxiv.org","https:\u002F\u002Farxiv.org\u002Fabs\u002F2606.02569",null,"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780381988591-z2sp.png","research","zh","a455fdc4-fe0d-41d8-a1f5-b77d7c869c6a",[17,18,19,20,21],"video MLLM","visual tokens","predictive coding","reference frame","P-tokens",[23,24,25],"AdaCodec 用預測式編碼取代逐幀 RGB 輸入，目標是減少影片冗餘 token。","摘要聲稱它在 32k token 下仍能超過 224k baseline，並把首 token 延遲從 9.26 秒降到 1.62 秒。","目前只看得到摘要資訊，完整 benchmark 細節、P-tokens 形成方式與泛化能力仍未公開。",5,"2026-06-02T06:32:28.249023+00:00","2026-06-02T06:32:28.239+00:00","0c35a120-52fc-41fc-afa3-d404eb934158",{"tags":31,"relatedLang":42,"relatedPosts":46},[32,34,36,38,40],{"name":20,"slug":33},"reference-frame",{"name":19,"slug":35},"predictive-coding",{"name":18,"slug":37},"visual-tokens",{"name":17,"slug":39},"video-mllm",{"name":21,"slug":41},"p-tokens",{"id":15,"slug":43,"title":44,"language":45},"adacodec-predictive-visual-code-video-mllms-en","AdaCodec cuts video tokens with predictive visual codes","en",[47,53,59,65,71,77],{"id":48,"slug":49,"title":50,"cover_image":51,"image_url":51,"created_at":52,"category":13},"f374155a-c29e-478c-b7a5-679cad1c51e4","crdts-keep-replicas-in-sync-without-locks-zh","CRDT 讓副本不用鎖也能同步","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781011086259-4p4k.png","2026-06-09T13:17:34.493426+00:00",{"id":54,"slug":55,"title":56,"cover_image":57,"image_url":57,"created_at":58,"category":13},"4b3b5a50-45b7-4238-a38b-160f82e323ff","post-deterministic-systems-autonomous-infra-zh","後決定性分散系：自治基礎設施新框架","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781010194792-5ogb.png","2026-06-09T13:02:32.717551+00:00",{"id":60,"slug":61,"title":62,"cover_image":63,"image_url":63,"created_at":64,"category":13},"04e45398-9814-4907-b416-fcb5b8d69508","causal-learnability-formal-language-tasks-zh","用因果法量化任務可學性","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780987696075-l4g0.png","2026-06-09T06:47:34.438642+00:00",{"id":66,"slug":67,"title":68,"cover_image":69,"image_url":69,"created_at":70,"category":13},"75bcc569-5e89-45c8-b809-6f169e929f4b","rl-training-hands-off-control-gradually-zh","RL 先接管再放手","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780986786312-03yo.png","2026-06-09T06:32:32.849589+00:00",{"id":72,"slug":73,"title":74,"cover_image":75,"image_url":75,"created_at":76,"category":13},"e3ecab4b-7cc7-4246-baf6-e1c170d86ca5","omnigamearena-vlm-game-agent-benchmark-zh","OmniGameArena 讓 VLM 遊戲代理更好比","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780985893022-70pl.png","2026-06-09T06:17:32.189729+00:00",{"id":78,"slug":79,"title":80,"cover_image":81,"image_url":81,"created_at":82,"category":13},"6f25a29c-cbb8-4f53-9af7-1656b394333a","turboquant-cuts-kv-cache-memory-6x-google-tests-zh","TurboQuant 在 Google 測試中省下 6x KV 快取","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780906682236-sqe2.png","2026-06-08T08:17:21.878314+00:00",[84,89,94,99,104,109,114,119,124,129],{"id":85,"slug":86,"title":87,"created_at":88},"f18dbadb-8c59-4723-84a4-6ad22746c77a","deepmind-bets-on-continuous-learning-ai-2026-zh","DeepMind 押注 2026 連續學習 AI","2026-03-26T08:16:02.367355+00:00",{"id":90,"slug":91,"title":92,"created_at":93},"f4a106cb-02a6-4508-8f39-9720a0a93cee","ml-papers-of-the-week-github-research-desk-zh","每週 ML 論文清單，為何紅到 GitHub","2026-03-27T01:11:39.284175+00:00",{"id":95,"slug":96,"title":97,"created_at":98},"c4f807ca-4e5f-47f1-a48c-961cf3fc44dc","ai-ml-conferences-to-watch-in-2026-zh","2026 AI 研討會投稿時程整理","2026-03-27T01:51:53.874432+00:00",{"id":100,"slug":101,"title":102,"created_at":103},"cf046742-efb2-4753-aef9-caed5da5e32e","adaptive-block-scaled-data-types-zh","IF4：神經網路量化的聰明選擇","2026-03-31T06:00:36.990273+00:00",{"id":105,"slug":106,"title":107,"created_at":108},"53a0dc54-0371-4e40-8d5e-74e94a73840c","geometry-aware-similarity-metrics-for-neural-representations-zh","超越距離測量：用微分幾何重新理解神經網路","2026-03-31T06:01:01.241968+00:00",{"id":110,"slug":111,"title":112,"created_at":113},"fee7d472-a775-4b1d-bbc2-1e8bca1bbf8b","on-the-fly-repulsion-in-the-contextual-space-for-rich-divers-zh","讓AI繪圖更有創意：用排斥力提升生成多樣性","2026-03-31T06:01:25.439673+00:00",{"id":115,"slug":116,"title":117,"created_at":118},"a9901203-d69b-447b-8854-15d14eab32b4","vision-aided-beam-prediction-cnn-eca-zh","影像輔助波束預測升級 CNN","2026-04-01T10:00:25.8073+00:00",{"id":120,"slug":121,"title":122,"created_at":123},"b55e7dd4-0a24-4b3d-804d-b0309a03f498","triple-band-fss-mimo-antenna-sub-6-ghz-zh","三頻 FSS MIMO 天線瞄準 sub-6 GHz","2026-04-01T13:18:36.857305+00:00",{"id":125,"slug":126,"title":127,"created_at":128},"f68290bd-e7f3-4b30-ba22-dcd4e0130a66","openclaw-1299-repos-eight-weeks-analysis-zh","OpenClaw 1299 個 Repo 的資料解讀","2026-04-02T05:03:45.208411+00:00",{"id":130,"slug":131,"title":132,"created_at":133},"ed9f80eb-eb02-4d35-8ad4-0ddf428751dd","beam-coherence-aware-combining-mmwave-mimo-zh","毫米波 MIMO 的雙階合併法","2026-04-02T05:27:26.897188+00:00"]