The Open Source Agentic Moment

December 22: Zhipu AI releases GLM-4.7. December 23: MiniMax releases M2.1. Both MIT licensed. Both explicitly optimize for Claude Code, Cline, Kilo Code, and Roo Code compatibility.

Open source isn’t chasing benchmarks anymore. It’s chasing workflows.

The Models at a Glance

	GLM-4.7	MiniMax M2.1
Parameters	358B	230B total / 10B active (MoE)
Context	200K	128K
SWE-bench	73.8%	72.5% multilingual
Price vs Claude	~7x cheaper	~10x cheaper
License	MIT	MIT
Release	Dec 22, 2025	Dec 23, 2025

Both models target the same use case: reliable agentic coding at a fraction of closed-model costs. The 48-hour timing isn’t coincidence. It’s a race.

What They’re Actually Competing On

Not raw intelligence. Agentic reliability.

The benchmark that matters: SWE-bench Verified. Real software engineering tasks: reading codebases, understanding issues, writing fixes. GLM hits 73.8% (up from 68% in 4.6). M2.1 reaches 72.5% on multilingual repos. Both within striking distance of Claude Sonnet 4.5’s 77.2%.

GLM-4.7 supports thinking before acting, with significant improvements on complex tasks in mainstream agent frameworks.

— Z.ai announcement

The framework compatibility play is strategic. Rather than building their own tooling, both models optimize for existing infrastructure:

Claude Code: Full tool calling support
Cline / Kilo Code / Roo Code: Multi-turn agent patterns
Terminal operations: +16.5% improvement on Terminal Bench for GLM

This is design-for-adoption. Ship a model that drops into existing workflows, not one that requires new scaffolding.

The Thinking Innovation

GLM-4.7 introduces something genuinely new: Preserved Thinking.

Standard multi-turn agents re-derive reasoning from context every turn. Works fine until you hit what Dex Horthy’s research calls the “dumb zone”: past 40% context capacity, reasoning quality degrades sharply.

GLM’s approach:

Preserved Thinking: Retains reasoning blocks across turns, reusing existing logic instead of re-deriving
Turn-level Thinking: Toggle reasoning per-request. Disable for simple tool calls (faster, cheaper). Enable for complex decisions (more reliable)

Architecture implication

For long-horizon agents that chain 20+ tool calls, preserved thinking could eliminate the context degradation that currently forces “summarize and restart” patterns. The model remembers why it made decisions, not just what it decided.

MiniMax takes a different approach: Mixture of Experts with 230B total but only 10B active parameters per forward pass. That’s actually closer to the swarm direction Yegge predicts: specialized experts activating as needed rather than one massive model doing everything. Lower latency, lower cost, higher throughput.

GLM is still building a bigger ant: 358B parameters, longer context. But Preserved Thinking hints at swarm behavior too. Instead of one diver burning through oxygen, you get reasoning that persists like handoffs between specialized agents. Not the orchestrated CNC machines yet, but a step toward how swarms actually work.

What This Doesn’t Solve

The caveats are real:

Self-hosting scale: 358B parameters isn’t trivial. GLM recommends 4-8 GPU tensor parallelism with vLLM or SGLang. Most teams will use the API.
Ecosystem maturity: Debugging, observability, and support lag closed models. When your agent fails at 3am, Anthropic has a support team. Hugging Face has GitHub issues.
Edge case reliability: Both models beat benchmarks but still trail Sonnet/Opus on the hardest tasks. The 90/90 rule applies: first 90% fast, last 10% brutal.
Hallucination rates: Neither model publishes hallucination benchmarks. For agentic work where outputs get executed as code, this gap matters.

When to Reach for Open Source

The use cases where open source wins:

Cost multiplication in agent loops: When you’re running 10 parallel agents, 10x cheaper adds up fast
Data sovereignty: Air-gapped environments, regulated industries, proprietary codebases that can’t touch external APIs
Fine-tuning: Training on your specific codebase, patterns, and conventions
Latency control: Self-hosted inference with no network round-trips

The PII blocker

For many enterprises, the closed-model blocker isn’t cost or capability. It’s compliance. Sending proprietary code or customer data to external APIs requires legal review, DPAs, and audit trails that can take months. Self-hostable models eliminate this entirely. Your code never leaves your infrastructure. For regulated industries (healthcare, finance, government), this is the unlock that makes agentic coding viable at all.

This is where the Claude Code compatibility play pays off. Enterprise teams don’t have to choose between great tooling and compliance. They can run Claude Code with a self-hosted GLM or M2.1 backend, getting the same workflow their unrestricted colleagues use. The framework stays the same. Only the model endpoint changes.

Want to try these models without self-hosting? Both GLM-4.7 and MiniMax M2.1 are available through OpenRouter, which integrates directly with Claude Code. Point your ANTHROPIC_BASE_URL at OpenRouter’s API, and you can swap between models mid-session. See the official setup guide for details.

When closed models still win:

Bleeding-edge reasoning: Opus 4.5 and GPT-5.2 still lead on the hardest problems
Priority support: When your production system breaks at 3am
Zero-ops deployment: Just an API key, no infrastructure to manage

What This Means

Three takeaways:

Speed matters more than base intelligence for agentic work. Both models prioritize fast, reliable tool calling over raw benchmark scores. The Gemini 3 Flash lesson applies: for iterative agents, faster and cheaper beats smarter and slower.
MIT licensing removes friction. No enterprise legal review, no usage restrictions, no vendor lock-in. For production deployments, this matters as much as the benchmarks.
The framework compatibility play is strategic. By targeting Claude Code and Cline, these models inherit an existing user base. Developers don’t need to learn new tools. They just swap the model.

The timing

Two major open source agentic models in 48 hours, both targeting the same frameworks, both MIT licensed. The open source ecosystem just got serious about agents.

Open source isn’t competing on who’s smartest anymore. It’s competing on who ships production-ready agents first.

The Open Source Agentic Moment

The Models at a Glance

What They’re Actually Competing On

The Thinking Innovation

What This Doesn’t Solve

When to Reach for Open Source

What This Means

Share this article

Related Posts

Anthropic's Walled Garden: The Claude Code Crackdown

Your AI Tools Are the Attack Surface

From Ralph Wiggum to /loop: The Absorption Continues