[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"tag-multimodal-agents":3},{"tag":4,"articles":11},{"id":5,"name":6,"slug":7,"article_count":8,"description_zh":9,"description_en":10},"3a2f2da3-c922-4fd2-bfb4-5774d103dc9f","multimodal agents","multimodal-agents",3,"多模態代理結合文字、語音、影像與工具呼叫，讓模型能在即時互動中理解情境並採取動作。這類系統的關鍵不只在於答對，還包括何時該查工具、何時該直接推理，以及如何在低延遲下維持穩定表現。","Multimodal agents combine text, audio, video, and tool use so models can interpret context and act in real time. The hard part is not only accuracy, but deciding when to call tools, when to reason directly, and how to keep latency and reliability in balance.",[12],{"id":13,"slug":14,"title":15,"summary":16,"category":17,"image_url":18,"cover_image":18,"language":19,"created_at":20},"5e4f3620-9a8e-4185-84d2-fa8ef42fc058","act-wisely-tool-use-agentic-multimodal-models-zh","教代理何時別叫工具","HDPO 把「答對」和「少叫工具」分開訓練，想修正多模態代理的盲目工具使用。摘要稱它能大幅減少呼叫次數，同時提升推理正確率。","research","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1775801029065-5n2l.png","zh","2026-04-10T06:03:34.31315+00:00"]