MiniMax M2 opens up cheap agentic coding
MiniMax open-sourced M2, a model for agents and code that costs $0.30 per million input tokens and is free for a limited time.

MiniMax open-sourced M2, a fast agentic model built for coding and tool use.
MiniMax just put a very specific bet on the table: if agents are going to do real work, the model behind them has to be fast, cheap, and good enough to ship code. On its announcement page, the company says MiniMax M2 is open source, priced at $0.30 per million input tokens, and available with a limited-time free trial until November 7 at 00:00 UTC.
The pitch is simple, and the numbers matter. MiniMax says M2 runs at around 100 tokens per second in online inference, costs $1.20 per million output tokens, and is offered alongside a new MiniMax Agent product that uses the model for coding, research, and longer tool-driven tasks.
| Metric | MiniMax M2 | What MiniMax says |
|---|---|---|
| Input token price | $0.30 / million | 8% of Claude Sonnet pricing |
| Output token price | $1.20 / million | 8% of Claude Sonnet pricing |
| Online inference speed | ~100 TPS | About 2x faster than Claude Sonnet |
| Free trial end | Nov. 7, 00:00 UTC | Limited-time access |
Why MiniMax built M2 around agents
Get the latest AI news in your inbox
Weekly picks of model releases, tools, and deep dives — no spam, unsubscribe anytime.
No spam. Unsubscribe at any time.
MiniMax’s announcement reads like a company that has been using its own tools hard enough to find the pain points. The team says it has been building internal agents for data analysis, technical research, programming, user feedback processing, and HR resume screening, and that those workflows exposed the trade-off most agent products still carry: good results usually cost too much, run too slowly, or both.

That matters because agent software is different from a chat demo. A model has to keep a plan together across many steps, call tools without falling apart, and recover when a shell command or browser action fails. MiniMax says M2 was designed for exactly that mix of programming, tool use, logical reasoning, and knowledge work.
There’s also a practical business angle here. If a company wants agents to do hour-long work sessions, token cost and inference speed stop being abstract benchmark numbers and become the bill at the end of the month. MiniMax is clearly trying to make the case that the economics of agents can change if the base model is cheaper to run.
- MiniMax says M2 is built for end-to-end development workflows.
- The company highlights tool use across Shell, Browser, Python, and MCP tools.
- It says the model is free for a limited time while usage is open.
- The model weights are available on Hugging Face for local deployment.
How M2 stacks up in practice
MiniMax says M2 is close to leading overseas models in tool use and deep search, while landing a bit behind the very best on programming. That is a useful framing because it avoids pretending the model is perfect. The company is basically arguing that agent performance is good enough where it counts, and that the price-speed mix makes up for the remaining gap.
“Today, we are officially open-sourcing and launching MiniMax M2, a model born for Agents and code.” — MiniMax
The company also says M2 reached the top five globally on the Artificial Analysis benchmark, which combines 10 test tasks. That benchmark claim is worth reading carefully: it is not a single coding score, but a broader measure that includes multiple tasks. In other words, MiniMax is trying to show that M2 is not a one-trick coder model.
MiniMax says the model was developed by having its own developers use it in real work, alongside algorithm engineers building environments and evaluations. That is a sensible way to train and test an agent model. If the model cannot survive the company’s own internal workflows, it probably will not hold up in outside production use either.
- Top-five placement on Artificial Analysis, which aggregates 10 tasks.
- Close-to-top overseas performance in tool use and deep search.
- Stronger domestic standing in programming, according to MiniMax.
- Internal dogfooding across business, backend, and engineering teams.
MiniMax Agent gets a faster engine
MiniMax did not stop at the model release. It also upgraded MiniMax Agent in China and rolled out an improved overseas version. The product now has two modes: Lightning Mode for quick Q&A, lightweight search, and smaller coding tasks, and Pro Mode for long-running work such as deep research, full-stack development, report writing, PPT creation, and web building.

That split is smart. A lot of agent products try to force every task through one interaction style, which makes them feel either too slow for casual use or too shallow for serious work. MiniMax is separating fast, conversational jobs from longer, more expensive ones, which is how real usage tends to break down anyway.
The company says the new agent is free for now, “until our servers can’t keep up.” That is a familiar launch tactic, but it also tells you where MiniMax thinks the demand will come from: developers who want to test coding workflows, plus users who want a cheaper agent alternative to premium subscriptions that can run into tens or hundreds of dollars per month.
For developers who want to try the model directly, MiniMax points to the MiniMax Open Platform for API access and the tool calling guide for agent-style integrations. It also recommends vLLM and SGLang for local deployment.
What the price-speed combo changes
The most interesting part of this launch is not the benchmark bragging. It is the economics. MiniMax says M2 costs 8% of Claude Sonnet pricing and runs at nearly double the speed. If those numbers hold up in real workloads, the model becomes attractive for teams that care more about throughput and task completion than about squeezing out the last bit of benchmark prestige.
That matters because agent products are often priced like luxury software even when the underlying tasks are routine. If M2 can hold its performance in coding, search, and tool use, it gives developers a cheaper option for building systems that need lots of calls, lots of retries, and lots of background work.
There is also a strategic open-source angle. By releasing the weights, MiniMax is inviting the community to test, fine-tune, and deploy the model on their own hardware. That usually speeds up adoption faster than a closed API alone, especially for teams that need control over data, latency, or inference cost.
For now, the real test is whether M2 can keep its balance outside MiniMax’s own demos. If developers start using it in Claude Code-style workflows, browser automation, and multi-step coding agents without hitting the usual reliability wall, then this release will matter far beyond one company’s product page.
What to watch next
The next useful signal is not another launch post. It is whether independent developers can reproduce MiniMax’s claims in real projects, and whether the open-source weights lead to fast community deployment on Hugging Face, vLLM, and SGLang.
If that happens, the bigger story is not just that MiniMax shipped another model. It is that agent-grade coding models may be entering a price band where normal product teams can actually afford to use them every day.
For OraCore readers, the practical question is simple: would you rather pay for a premium closed model, or run an open one that is fast enough for agents and cheap enough to keep on all day?
// Related Articles
- [MODEL]
Why MiniMax M2.7’s Self-Evolution Claim Matters More Than Its Benchma…
- [MODEL]
Copilot Studio shifts to GPT-4.1 by default
- [MODEL]
Claude API model guide gets a new top tier
- [MODEL]
Mistral Is Building a Cybersecurity Model for Banks
- [MODEL]
Kimi K2.6: What Changed in 2026
- [MODEL]
Why Kimi K2.6 Changes the Coding Model Race