[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"tag-multimodal-agents":3},{"tag":4,"articles":11},{"id":5,"name":6,"slug":7,"article_count":8,"description_zh":9,"description_en":10},"3a2f2da3-c922-4fd2-bfb4-5774d103dc9f","multimodal agents","multimodal-agents",3,"多模態代理結合文字、語音、影像與工具呼叫，讓模型能在即時互動中理解情境並採取動作。這類系統的關鍵不只在於答對，還包括何時該查工具、何時該直接推理，以及如何在低延遲下維持穩定表現。","Multimodal agents combine text, audio, video, and tool use so models can interpret context and act in real time. The hard part is not only accuracy, but deciding when to call tools, when to reason directly, and how to keep latency and reliability in balance.",[12,21],{"id":13,"slug":14,"title":15,"summary":16,"category":17,"image_url":18,"cover_image":18,"language":19,"created_at":20},"3cefc37f-e116-4597-a5cb-55bfb3fc4aa4","act-wisely-tool-use-agentic-multimodal-models-en","Act Wisely: Teaching Agents When Not to Call Tools","A new training scheme, HDPO, aims to cut blind tool use in multimodal agents by separating accuracy from tool efficiency.","research","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1775801032138-7jih.png","en","2026-04-10T06:03:34.728615+00:00",{"id":22,"slug":23,"title":24,"summary":25,"category":26,"image_url":27,"cover_image":27,"language":19,"created_at":28},"d6233062-5791-432c-944d-02e125e4e299","googles-gemini-3-1-flash-live-real-time-voice-ai-en","Google's Gemini 3.1 Flash Live Targets Real-Time Voice AI","Gemini 3.1 Flash Live brings low-latency audio, video, and tool use to Google’s Live API, with 90.8% on ComplexFuncBench Audio.","model-release","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1775168343490-yokz.png","2026-04-02T22:18:32.891652+00:00"]