[MODEL] 6 min readOraCore Editors

MiniMax-M1 brings 1M-token open reasoning model

MiniMax released M1, an open-source reasoning model with 1M-token context, 80k output, and low-cost API pricing.

Share LinkedIn
MiniMax-M1 brings 1M-token open reasoning model

MiniMax-M1 is an open-source reasoning model with a 1 million-token context window.

MiniMax unveiled MiniMax-M1 on June 16, 2025, and the headline numbers are hard to miss: a 1 million-token context window, 80,000-token reasoning output, and training that reportedly used 512 H800 GPUs for three weeks. The company says the full reinforcement learning phase cost $534,700, which is a very specific way to say this model was built to be efficient as well as large.

For developers, the more interesting part is where M1 lands in practice. MiniMax says the model is open-source, tuned for productivity-heavy work, and already available through the MiniMax app, web product, and API. It also ships with support from the open-source inference stack around vLLM and SGLang, with weights and technical details on Hugging Face and GitHub.

MetricMiniMax-M1What it means
Context window1,000,000 tokensMatches Gemini 2.5 Pro and exceeds DeepSeek R1 by 8x
Reasoning output80,000 tokensLong internal reasoning traces for complex tasks
RL training compute512 H800s for 3 weeksReported reinforcement learning budget
RL rental cost$534,700MiniMax’s stated training cost
SWE-bench validation55.6% to 56.0%Strong software engineering benchmark result
API pricing$0.4 / $2.2 per million tokensInput and output pricing for 0-200k tokens

Why MiniMax built M1 this way

Get the latest AI news in your inbox

Weekly picks of model releases, tools, and deep dives — no spam, unsubscribe anytime.

No spam. Unsubscribe at any time.

MiniMax is betting that long-context reasoning is becoming a practical requirement, not a demo trick. If a model can keep a million tokens in working memory, it can hold large codebases, long documents, and extended tool traces without constant truncation. That matters for real workflows where the model has to read, think, revise, and keep track of prior steps.

MiniMax-M1 brings 1M-token open reasoning model

The company says M1 uses a hybrid-attention design with a Lightning Attention mechanism. The pitch is simple: keep long-context computation efficient enough that the model can reason over huge inputs without burning absurd amounts of compute.

That design choice also explains the training story. MiniMax says the reinforcement learning phase used a faster algorithm called CISPO, which clips importance sampling weights instead of relying on traditional token updates. The company claims this made convergence about twice as fast as other RL methods it compared against, including ByteDance’s DAPO.

  • 1 million-token context window
  • 80,000-token reasoning output
  • 512 H800 GPUs used in RL training
  • $534,700 reported RL rental cost

What the benchmarks actually say

MiniMax is careful to frame M1 as especially strong in software engineering, long-context understanding, and tool use. On SWE-bench validation, the company reports 55.6% for MiniMax-M1-40k and 56.0% for MiniMax-M1-80k. That trails DeepSeek-R1-0528 at 57.6%, but it still puts M1 ahead of other open-weight models in the company’s comparisons.

The more eye-catching claim is long-context performance. MiniMax says the M1 series beats all open-weight models on long-context understanding and even ranks above OpenAI o3 and Claude 4 Opus, landing second overall behind Gemini 2.5 Pro. In agent tool-use tests on TAU-bench, MiniMax says M1-40k beats every open-weight model and also tops Gemini 2.5 Pro.

"This feature gives us a substantial computational efficiency advantage in both training and inference." — MiniMax

That quote matters because it gets to the real business of the release. A giant context window is impressive, but if it costs too much to train or run, it stays a lab curiosity. MiniMax is trying to argue the opposite: that M1 is large enough for serious work and cheap enough to ship widely.

Price is part of the product here

MiniMax is not treating pricing as an afterthought. The company says M1 is free to use in the MiniMax app and on the web, and its API pricing is aimed at undercutting higher-end rivals. For inputs between 0 and 200k tokens, the price is $0.4 per million input tokens and $2.2 per million output tokens. For 200k to 1M token inputs, the price rises to $1.3 per million input tokens, while output stays at $2.2 per million tokens.

MiniMax-M1 brings 1M-token open reasoning model

That pricing structure matters because long-context models usually get expensive fast. MiniMax is signaling that it wants developers to test huge prompts, long codebases, and extended agent traces without treating every run like a budget decision.

  • 0-200k input: $0.4 per million tokens
  • 0-200k output: $2.2 per million tokens
  • 200k-1M input: $1.3 per million tokens
  • 200k-1M output: $2.2 per million tokens

There is also a practical ecosystem angle. MiniMax says the model is already supported by vLLM and SGLang, which matters because teams do not want to wait months for tooling to catch up. If a model is open but hard to deploy, adoption slows down fast.

What developers should watch next

M1 is a strong signal that the open-weight race is moving from raw parameter bragging to practical constraints like context length, inference cost, and agent performance. MiniMax is trying to win on all three at once: open weights, million-token memory, and API pricing that invites experimentation.

The real test is whether teams use M1 for tasks that punish weaker models: large codebase refactors, long document analysis, multi-step tool workflows, and agent loops that need to remember what happened 50,000 tokens ago. If MiniMax’s claims hold up outside its own report, M1 could become a serious option for developers who care more about throughput and context than brand names.

MiniMax also said more updates are coming over the next four workdays, so this release may be the first move in a larger product push. The question now is simple: can M1 keep its benchmark edge once more teams run it on their own workloads, with their own prompts, costs, and failure cases?

If you want the practical takeaway, it is this: watch the open-source models that can handle very long contexts without punishing your GPU bill. That is where the next wave of useful reasoning systems is likely to get judged.