[AGENT] 7 min readOraCore Editors

Kimi K2.5 review: stronger, still not a legend

Kimi K2.5 adds vision, coding, and multi-agent tools, but long runs, weak art direction, and paywalls keep it from elite status.

Share LinkedIn
Kimi K2.5 review: stronger, still not a legend

Kimi’s new Kimi K2.5 arrives with a loud promise: a trillion-parameter model that can read long documents, understand images and video, and drive agent workflows. In hands-on tests, it did raise the bar in long-context understanding and visual reasoning, but it also showed the usual scars of early agent products: slow runs, inconsistent answers, and a pricing wall that gets in the way fast.

That mix matters because Kimi is no longer just selling a chat window. It now has a model, an agent layer, and a coding product in one bundle, with open-source code on GitHub for parts of the stack and premium access for the heavier tools. The result is a more complete product line, but also a clearer split between what casual users can try and what power users have to pay for.

What Kimi K2.5 actually improved

Get the latest AI news in your inbox

Weekly picks of model releases, tools, and deep dives — no spam, unsubscribe anytime.

No spam. Unsubscribe at any time.

The strongest signal from this release is simple: Kimi got better at understanding messy, high-density inputs. In tests with a 400,000-character novel, Kimi K2.5 built a more detailed character map than Qwen3-Max, tracked factions and relationships more carefully, and picked up on plot threads that a shallower model missed.

It also handled open-ended reasoning more confidently. When asked who mattered most to the protagonist, Kimi did not freeze at the question’s ambiguity. It compared multiple angles, then committed to a final answer. That sounds small, but it is exactly the kind of behavior that makes a model feel useful instead of merely fluent.

The visual side was the surprise. Kimi K2.5 parsed a three-page PDF instruction set, then analyzed a 30-second tennis clip and gave detailed form corrections. It also accepted Apple MOV files, which widens the list of inputs it can process. The current cap is still around 100MB per video, so this is practical for short clips, not full-length footage.

  • Long-text test: 400,000-character novel
  • Video input support: Apple MOV included
  • Per-video size limit: about 100MB
  • Typical clip length: roughly 3 minutes
  • Reasoning output: more detailed character and plot mapping than Qwen3-Max

The agent tools are useful, but they are not magic

Kimi is clearly betting that agents are the next product layer people will pay for. The company now offers a single-agent mode and an Agent Swarm setup, where multiple agents can work in parallel on a task. It also launched Kimi Code, which connects to a local dev environment, reads project files, inspects code structure, writes or edits code, and runs tests or commands.

That is a serious toolkit on paper. In practice, the experience is uneven. A complex agent run can take around 30 minutes before you get a result, and the system sometimes stalls mid-task. More worrying, Kimi can follow contradictory instructions without stopping to ask for clarification. If the prompt is wrong, the agent may keep going in the wrong direction instead of checking with you first.

That behavior is good enough for demos and exploratory work. It is less convincing for tasks where a bad assumption can waste half an hour or produce a polished-looking wrong answer. Kimi’s agent layer feels like a capable junior teammate, not an autonomous operator you can ignore.

“If you want to build a ship, don’t drum up people to collect wood and don’t assign them tasks and work, but rather teach them to long for the endless immensity of the sea.” — Antoine de Saint-Exupéry, The Little Prince

That quote is old, but it fits the current agent moment well. The industry is moving from single-shot chat to systems that plan, split work, and stitch outputs together. The hard part is still the same: making the system know when to ask for help.

Where Kimi K2.5 compares with rivals

In the tests described above, Kimi did a few things better than Gemini and Qwen3-Max, but it did not dominate everything. On visual instruction following, Kimi and Gemini were close. On long-form reasoning, Kimi had the edge. On design quality, the model often fell into a very familiar trap: competent layout, weak taste.

That last part matters because the model was asked to produce data journalism graphics and a news-style hero image. The results were technically complete but visually plain, with a strong “presentation deck” vibe. Qwen3-Max even refused one of the image tasks outright, which shows a different tradeoff: more caution, less output.

  • Long-text understanding: Kimi ahead of Qwen3-Max in this test
  • Visual instruction following: Kimi roughly on par with Gemini
  • Data-visual design: functional, but basic
  • Hero-image generation: too close to PPT aesthetics
  • Response style: more willing to answer than Qwen3-Max

There is also a product angle here. Kimi’s website now pushes users toward subscription more aggressively, with queue-priority prompts and premium-only access for Kimi Code and Agent Swarm. The top plan reportedly reaches 199 yuan per month. That puts Kimi in the same broad economic game as OpenAI and Anthropic: the free tier brings people in, while the expensive features pay the bills.

For developers, that means the interesting question is no longer whether Kimi can answer questions. It can. The real question is whether its agents are dependable enough to sit inside a workflow that touches code, reports, or customer-facing output.

What this release says about 2026

Kimi K2.5 feels like a marker for where the market is heading. The model is getting better at video, images, and long documents, while the product around it is shifting toward agents, coding, and paid tiers. That combination tells you where the money is: less in chat alone, more in systems that can do work.

It also hints at a bigger split in the AI market. Open-source models keep getting stronger in community adoption and integration, while premium closed products still control the most polished experiences and the highest-margin features. Kimi is trying to sit on both sides at once, which is smart, but hard to execute without confusing users.

For now, my read is straightforward: Kimi K2.5 is a real step up, especially if you care about long-context reading and multimodal input. But it is still a product you test carefully, not one you hand the keys to. If Kimi can tighten agent reliability and improve design quality, it will matter much more to developers in the next few quarters. If not, it will remain one of the better models that people like to try and then keep on a short leash.

The next thing to watch is whether Kimi’s agents start asking better questions before they act. That one change would tell us a lot about whether the company is building a flashy demo or a tool people can trust every day.