Xiaomi MiMo-V2-Pro: 1T MoE Model for Agents
Xiaomi’s MiMo-V2-Pro packs 1T parameters, 42B active, and 1M context, with SWE-bench results close to Claude Sonnet 4.6.

Xiaomi’s MiMo-V2-Pro arrived with a number that gets attention fast: over 1 trillion total parameters, with 42 billion active on each token. It also brings a 1 million token extended context window and pricing that starts at $1 per million input tokens.
That combination matters because the model is aimed at agentic coding, where cost, latency, and long context all hit the budget at the same time. On SWE-bench Verified, Xiaomi says MiMo-V2-Pro scores 78.0%, which puts it close to Claude Sonnet 4.6 at 79.6% and a bit behind Claude Opus 4.6 at 80.8%.
What Xiaomi actually shipped
Get the latest AI news in your inbox
Weekly picks of model releases, tools, and deep dives — no spam, unsubscribe anytime.
No spam. Unsubscribe at any time.
MiMo-V2-Pro is the flagship text model in Xiaomi’s second-generation MiMo family. It is built as a Mixture-of-Experts system, which means the model has a very large total parameter pool but activates only a smaller slice for each token. In this case, that active slice is 42B, up from 15B active parameters in the smaller MiMo-V2-Flash.
That design choice is the whole trick. Xiaomi gets a model with huge capacity for hard reasoning and tool use, while keeping per-token inference more manageable than a dense trillion-parameter model would be. The company also added a 7:1 hybrid attention pattern and a lightweight Multi-Token Prediction layer, both aimed at faster agent loops.
For developers, the practical details matter more than the architecture diagram. MiMo-V2-Pro is API-only, so there are no public weights to download. If you want to call it directly, Xiaomi points developers to platform.xiaomimimo.com and its OpenAI-compatible endpoint at api.xiaomimimo.com/v1. It is also listed on OpenRouter.
- Over 1 trillion total parameters
- 42B active parameters per token
- 256K standard context, 1M extended context
- 131,072-token maximum completion
- $1 input / $3 output per million tokens at standard context
- $2 input / $6 output per million tokens at 256K to 1M context
Why the Hunter Alpha mystery mattered
Before Xiaomi revealed the model, the AI community had already been arguing about an anonymous OpenRouter model called Hunter Alpha. It appeared around March 11, 2026, then started chewing through huge volumes of traffic. The mystery model reportedly handled roughly 500 billion tokens per week, which is the kind of usage that makes people assume something very large is hiding underneath.
The guess that spread fastest was DeepSeek V4, partly because Xiaomi’s lead researcher, Luo Fuli, had previously worked at DeepSeek. That made the speculation feel plausible. Once Xiaomi confirmed that Hunter Alpha was actually MiMo-V2-Pro on March 18, the story changed from “what is this?” to “how did Xiaomi get this close on price and coding performance?”
There is a direct quote from the model itself that captures the confusion. When asked who built it, Hunter Alpha replied: “I am a Chinese AI model primarily trained in Chinese.” That answer was vague, but it was enough to keep the mystery alive for another week.
“I am a Chinese AI model primarily trained in Chinese.”
Xiaomi also used the launch to give developers a reason to test the model immediately. Teams working with Cline, Blackbox, KiloCode, OpenClaw, and OpenCode got free API access during launch week. That is a smart move, because agentic models improve fastest when real developers hit them with messy repos and broken tool calls.
Benchmarks: close to Sonnet, far cheaper
The cleanest way to judge MiMo-V2-Pro is to compare it against the models developers already know. On coding benchmarks, it lands very close to Anthropic’s best general-purpose coding models while undercutting them hard on price.
On SWE-bench Verified, Xiaomi reports 78.0% for MiMo-V2-Pro, compared with 79.6% for Claude Sonnet 4.6 and 80.8% for Claude Opus 4.6. That is a small gap in raw score, but the pricing spread is much larger. At standard context, MiMo-V2-Pro costs $1/$3 per million tokens, while Sonnet 4.6 is priced at $3/$15 and Opus 4.6 at $5/$25.
On agentic tasks, Xiaomi’s ClawEval score is 61.5. That matters because ClawEval measures multi-turn tool use, recovery from errors, and long-horizon planning, which is where coding agents usually break down. Xiaomi’s number puts MiMo-V2-Pro above GPT-5.2’s reported 50.0 on that benchmark and behind Opus 4.6 at 66.3.
- SWE-bench Verified: MiMo-V2-Pro 78.0%, Sonnet 4.6 79.6%, Opus 4.6 80.8%
- ClawEval: MiMo-V2-Pro 61.5, GPT-5.2 50.0, Opus 4.6 66.3
- VentureBeat’s benchmark cost total: $348 for MiMo-V2-Pro, $2,304 for GPT-5.2, $2,486 for Claude Opus 4.6
- Terminal-Bench 2.0: 86.7 for MiMo-V2-Pro
- GPQA Diamond: 87% for MiMo-V2-Pro
The cost math may be the most important line in the whole launch. VentureBeat reported a total benchmark bill of $348 for MiMo-V2-Pro, versus $2,304 for GPT-5.2 and $2,486 for Claude Opus 4.6. If those numbers hold up in real workloads, procurement teams will care more than they care about a 1.6-point SWE-bench gap.
Where MiMo-V2-Pro fits in a real stack
MiMo-V2-Pro is not Xiaomi’s only model. The company launched three at once, and each one targets a different buyer. That matters because the Pro tier is the expensive, closed option, while the other two models give developers different tradeoffs.
MiMo-V2-Flash is the self-hostable one. It has 310B total parameters, 15B active parameters, and a MIT license on Hugging Face. Xiaomi also released MiMo-V2-Omni, a multimodal model for text, image, video, and audio. Xiaomi says Omni can process 10+ hours of continuous audio in a single request and costs $0.40 input / $2.00 output per million tokens.
That gives teams a fairly clear split. If you need local control, Flash is the obvious candidate. If you need multimodal inputs, Omni is the one to test. If you want the strongest agentic coding performance Xiaomi has right now, Pro is the model to benchmark first.
There is one catch: MiMo-V2-Pro still has a few unknowns. Xiaomi has not published public weights, exact total parameter counts, or a full apples-to-apples benchmark slate across every major knowledge test. It also has no multimodal input, so teams doing document understanding or media workflows will need a different model.
My read is simple: Xiaomi is trying to buy developer trust with price and performance, then keep the enterprise upside for later. If MiMo-V2-Pro keeps its current SWE-bench and agentic numbers under heavy real-world use, expect more teams to route coding agents through it as the default high-volume model and reserve pricier systems for edge cases.
For now, the most useful question is not whether MiMo-V2-Pro beats every rival. It is whether your own agent stack can save enough money by switching a chunk of coding traffic to Xiaomi without losing reliability. That is a test worth running this quarter, not next year.
// Related Articles
- [MODEL]
Gemini 1.5 Pro-002, Flash-002 and 2.0 Flash update Google AI
- [MODEL]
MiniMax M3 Proves Open-Weight Can Still Win on Coding
- [MODEL]
Gemini 3.5 Flash Pricing, Context, Benchmarks
- [MODEL]
Gemma 4 12B: Specs, Benchmarks & How to Run It Locally
- [MODEL]
Best Kimi Models in 2026: K2.5 vs K2 Thinking
- [MODEL]
Kimi K2.6 adds open-source coding and agent swarm