Tag

multimodal AI

Multimodal AI combines text, images, audio, and video in one model or workflow, so systems can understand, generate, and edit across formats. It matters for long-context assistants, image editing, speech interfaces, video analysis, and agentic software.

4 articles

Model Releases/May 5

Why Kimi K2.5 Changes the Open-Source Agent Race

Kimi K2.5 makes open-source agents matter by pairing multimodal reasoning with tool-heavy execution.

Model Releases/May 4

Kimi K2.6 Brings 256K Context to API Users

Kimi K2.6 adds 256K context, multimodal input, and stronger coding for developers using the Kimi API Platform.

Model Releases/Apr 24

OpenAI’s ChatGPT Images 2.0 lands with sharper edits

OpenAI quietly shipped ChatGPT Images 2.0, and early tests show stronger edits, cleaner text, and faster image workflows for creators.

Industry News/Mar 28

Xiaomi’s MiMo AI Push Targets Agentic Software

Xiaomi’s MiMo-V2-Pro, Omni, and TTS models pair 1T+ parameters with low pricing, aiming squarely at agentic AI workloads.