MiniMax-M1 brings 1M-token open reasoning model
MiniMax released M1, an open-source reasoning model with 1M-token context, 80k output, and low-cost API pricing.

MiniMax-M1 is an open-source reasoning model with a 1 million-token context window.
MiniMax unveiled MiniMax-M1 on June 16, 2025, and the headline numbers are hard to miss: a 1 million-token context window, 80,000-token reasoning output, and training that reportedly used 512 H800 GPUs for three weeks. The company says the full reinforcement learning phase cost $534,700, which is a very specific way to say this model was built to be efficient as well as large.
For developers, the more interesting part is where M1 lands in practice. MiniMax says the model is open-source, tuned for productivity-heavy work, and already available through the MiniMax app, web product, and API. It also ships with support from the open-source inference stack around vLLM and SGLang, with weights and technical details on Hugging Face and GitHub.
| Metric | MiniMax-M1 | What it means |
|---|---|---|
| Context window | 1,000,000 tokens | Matches Gemini 2.5 Pro and exceeds DeepSeek R1 by 8x |
| Reasoning output | 80,000 tokens | Long internal reasoning traces for complex tasks |
| RL training compute | 512 H800s for 3 weeks | Reported reinforcement learning budget |
| RL rental cost | $534,700 | MiniMax’s stated training cost |
| SWE-bench validation | 55.6% to 56.0% | Strong software engineering benchmark result |
| API pricing | $0.4 / $2.2 per million tokens | Input and output pricing for 0-200k tokens |
Why MiniMax built M1 this way
Get the latest AI news in your inbox
Weekly picks of model releases, tools, and deep dives — no spam, unsubscribe anytime.
No spam. Unsubscribe at any time.
MiniMax is betting that long-context reasoning is becoming a practical requirement, not a demo trick. If a model can keep a million tokens in working memory, it can hold large codebases, long documents, and extended tool traces without constant truncation. That matters for real workflows where the model has to read, think, revise, and keep track of prior steps.

The company says M1 uses a hybrid-attention design with a Lightning Attention mechanism. The pitch is simple: keep long-context computation efficient enough that the model can reason over huge inputs without burning absurd amounts of compute.
That design choice also explains the training story. MiniMax says the reinforcement learning phase used a faster algorithm called CISPO, which clips importance sampling weights instead of relying on traditional token updates. The company claims this made convergence about twice as fast as other RL methods it compared against, including ByteDance’s DAPO.
- 1 million-token context window
- 80,000-token reasoning output
- 512 H800 GPUs used in RL training
- $534,700 reported RL rental cost
What the benchmarks actually say
MiniMax is careful to frame M1 as especially strong in software engineering, long-context understanding, and tool use. On SWE-bench validation, the company reports 55.6% for MiniMax-M1-40k and 56.0% for MiniMax-M1-80k. That trails DeepSeek-R1-0528 at 57.6%, but it still puts M1 ahead of other open-weight models in the company’s comparisons.
The more eye-catching claim is long-context performance. MiniMax says the M1 series beats all open-weight models on long-context understanding and even ranks above OpenAI o3 and Claude 4 Opus, landing second overall behind Gemini 2.5 Pro. In agent tool-use tests on TAU-bench, MiniMax says M1-40k beats every open-weight model and also tops Gemini 2.5 Pro.
"This feature gives us a substantial computational efficiency advantage in both training and inference." — MiniMax
That quote matters because it gets to the real business of the release. A giant context window is impressive, but if it costs too much to train or run, it stays a lab curiosity. MiniMax is trying to argue the opposite: that M1 is large enough for serious work and cheap enough to ship widely.
Price is part of the product here
MiniMax is not treating pricing as an afterthought. The company says M1 is free to use in the MiniMax app and on the web, and its API pricing is aimed at undercutting higher-end rivals. For inputs between 0 and 200k tokens, the price is $0.4 per million input tokens and $2.2 per million output tokens. For 200k to 1M token inputs, the price rises to $1.3 per million input tokens, while output stays at $2.2 per million tokens.

That pricing structure matters because long-context models usually get expensive fast. MiniMax is signaling that it wants developers to test huge prompts, long codebases, and extended agent traces without treating every run like a budget decision.
- 0-200k input: $0.4 per million tokens
- 0-200k output: $2.2 per million tokens
- 200k-1M input: $1.3 per million tokens
- 200k-1M output: $2.2 per million tokens
There is also a practical ecosystem angle. MiniMax says the model is already supported by vLLM and SGLang, which matters because teams do not want to wait months for tooling to catch up. If a model is open but hard to deploy, adoption slows down fast.
What developers should watch next
M1 is a strong signal that the open-weight race is moving from raw parameter bragging to practical constraints like context length, inference cost, and agent performance. MiniMax is trying to win on all three at once: open weights, million-token memory, and API pricing that invites experimentation.
The real test is whether teams use M1 for tasks that punish weaker models: large codebase refactors, long document analysis, multi-step tool workflows, and agent loops that need to remember what happened 50,000 tokens ago. If MiniMax’s claims hold up outside its own report, M1 could become a serious option for developers who care more about throughput and context than brand names.
MiniMax also said more updates are coming over the next four workdays, so this release may be the first move in a larger product push. The question now is simple: can M1 keep its benchmark edge once more teams run it on their own workloads, with their own prompts, costs, and failure cases?
If you want the practical takeaway, it is this: watch the open-source models that can handle very long contexts without punishing your GPU bill. That is where the next wave of useful reasoning systems is likely to get judged.
// Related Articles
- [MODEL]
Gemini Omni Video Review: Text Rendering Beats Rivals
- [MODEL]
Why Xiaomi’s MiMo-V2.5-Pro Changes Coding Agents More Than Chatbots
- [MODEL]
OpenAI’s Realtime Audio Models Target Live Voice
- [MODEL]
Anthropic发布10款金融AI Agent
- [MODEL]
Why Claude’s “Infinite” Context Window Still Won’t Make AI Autonomous
- [MODEL]
Why Midjourney 8.1 Raw Mode Is Better Than Default Style