MiniMax shipped M3 today. The pitch is loud: frontier coding, a 1M-token context window, native multimodality, and the ability to drive a computer desktop, all in one model. The numbers they’re quoting would be a big deal if they hold.
The part I actually care about: it’s already wired into the tools I use. You don’t have to wait for anything to try it.
The Claims
All vendor-reported, all launch-day:
- SWE-bench Pro: 59% - edges out GPT-5.5 (58.6%) and Gemini 3.1 Pro, approaching Opus 4.7
- BrowseComp: 83.5 - past Opus 4.7’s 79.3 on autonomous browsing
- 1M-token context via their MiniMax Sparse Attention (MSA), with a claimed ~20x inference-efficiency gain on long context
- Native multimodal: text, image, and video in, trained that way from the ground up
The flex demo: M3 ran ~12 hours unattended reproducing an ICLR 2025 paper, producing 18 commits and 23 figures.
The Price Is the Story
This is where it gets interesting. The Information clocked M3 at $0.12 per 1M input tokens against Opus 4.7’s $5. That figure is MiniMax’s own-platform promo rate. On OpenRouter it launched at a standard $0.60 in / $2.40 out, with a temporary 50% promo halving that.
Either way you’re looking at a frontier-ish coding model at somewhere between a tenth and a fortieth of the incumbent’s cost. That’s the number that moves a market, not the benchmark.
— A developer on X, launch dayMiniMax M3 is the first model that has shown initiative like Claude models. It has that big model smell too.
You Can Use It Today
The reason I’m writing this on launch day rather than waiting for the weights: M3 is already a routable backend.
It’s live on Ollama Cloud as minimax-m3:cloud (US-based, zero data retention) and on OpenRouter as minimax/minimax-m3. Both are OpenAI-compatible endpoints, which means Claude Code can talk to them. Ollama even published the launch line directly:
ollama launch claude --model minimax-m3:cloud
Or route through OpenRouter with claude-launcher, the wrapper I built for exactly this - swapping Claude Code’s backend without hand-editing env vars:
claude-launcher -o # OpenRouter backend
# pick minimax/minimax-m3 from the searchable model list
Same harness, same workflow, different model. This is the model-freedom loop I keep coming back to: the framework stays put, only the brain swaps. A frontier-cost-class model dropping straight into that loop on day one is the actual unlock.
One gotcha if you want the full 1M window: Claude Code hardcodes a 200k context per model and won’t auto-detect M3’s real limit through a custom backend. Set CLAUDE_CODE_MAX_CONTEXT_TOKENS=1000000 and DISABLE_COMPACT=1 to lift it. The override is coupled to disabling auto-compaction, so it’s a deliberate trade, and you only want it active on the M3 backend, not your normal Anthropic sessions.
This matters more for a Chinese model. Ollama Cloud is the clean route: all hardware is in US datacenters, with no logging, no training on your inputs, and zero data retention (content is processed transiently and never stored). OpenRouter is more nuanced - OpenRouter itself doesn’t log prompts by default, but it forwards each request to an upstream provider under that provider’s own policy, which for M3 can be MiniMax’s own infrastructure. If that matters to you, turn on OpenRouter’s Zero Data Retention setting so it only routes to ZDR endpoints. And remember :cloud means cloud - this isn’t pulling weights to your machine.
What It Won’t Do (Yet)
A few things to be clear about before you reorganize your stack around it.
- The open weights aren’t out. MiniMax promised the weights plus a technical report within ~10 days. Until then it’s API-only.
- Your laptop can’t run it. I checked the obvious question: would a 128GB machine run M3 locally? Almost certainly not. M3’s predecessor M1 was a 456B-parameter MoE, and for inference the entire model has to sit in memory, not just the active experts. At 4-bit that’s ~228GB. Even a hypothetical ~230B version fills 128GB with no headroom for the OS or the 1M-token KV cache - which is the whole selling point you’d then be unable to use. You’d want a 256GB+ box, or just use the API.
- The benchmarks are unverified. Which brings me to the real caveat.
The Caveat That Always Applies
These are vendor benchmarks, published the same day the model shipped, by the company that built it. I’ve written before about how little a launch-day leaderboard tells you. “Beats GPT-5.5 on SWE-bench Pro” is a claim, not a result, until independent runs confirm it on tasks the model wasn’t tuned for.
The launch-day social feed reinforces the point rather than settling it. X is wall-to-wall enthusiasm and “trying it now” posts: it’s already wired into opencode, Ollama Cloud, and at least one $1/month coding plan, and developers are talking about its “big model smell” and Claude-like initiative. What’s almost entirely missing is independent verification. Universal praise on day one is a vibe, not a result. The market wasn’t fully sold either: MiniMax shares spiked 5% on the news, then dropped 12%.
So treat the numbers as a reason to try it, not a reason to believe it. The good news is that trying it costs almost nothing and takes one command. Point Claude Code at minimax-m3 for a day of real work, and you’ll learn more than any benchmark table will tell you.
The leaderboard claims may or may not survive contact with reality. The pricing pressure is real regardless - and it’s the incumbents who have to answer for it now.


