On February 16, Alibaba launched Qwen 3.5. On February 17, Anthropic shipped Sonnet 4.6 and xAI dropped Grok 4.2 beta. Hours later, OpenClaw pushed a major release. Dreamer came out of stealth. All while Fortune was reporting $2 trillion wiped off software market caps.

Three days. Five major releases. One word on everyone’s lips: agentic.

This isn’t a recap post. The individual releases have been covered to death. What’s worth examining is the pace itself, what it means, and what it’s hiding.

The Scorecard

A quick inventory of what landed between Friday and Monday:

  • Anthropic - Sonnet 4.6 (Feb 17): 79.6% SWE-bench, 72.5% OSWorld. Within 1-2 points of Opus 4.6 on the benchmarks that matter. 1M token context (beta). Now the default for free and pro users.
  • Alibaba - Qwen 3.5 (Feb 16): 397B total params, 17B active per pass. 201 languages. Native multimodal. Claims to outperform GPT-5.2, Opus 4.5, and Gemini 3. Open-weight. 60% cheaper to run than its predecessor.
  • xAI - Grok 4.2 beta (Feb ~17): Four specialized agents debating in parallel before synthesizing a response. “Rapid learning” architecture with weekly improvement cycles. Opt-in beta. Benchmarked against a stock-trading simulation, which tells you something.
  • OpenClaw v2026.2.17 (Feb 17): Sonnet 4.6 support, 1M context, subagent spawning from chat, iOS share extension, Slack streaming. Shipped hours after Steinberger’s OpenAI acqui-hire closed.
  • Dreamer (Feb 17): Out of stealth. David Singleton (ex-Stripe CTO) and Hugo Barra. $50M funding. “Build agentic apps by talking.” An agent that builds agents.

That’s not counting Allen AI’s OLMo 3, Hugging Face’s AnyLanguageModel for Apple devices, or OpenAI’s Frontier enterprise agent platform from earlier in the month.

Context on the month

This comes two weeks after Opus 4.6, GPT-5.3-Codex, and Codex-Spark all landed in the same week. February 2026 has seen more frontier model releases than most entire quarters.

The Convergence

Every release this week used the same framing. Alibaba built Qwen 3.5 for “the agentic AI era.” Anthropic positioned Sonnet 4.6 around “agent planning.” xAI pitched Grok 4.2’s weekly learning as agent-grade adaptability. Dreamer’s entire product is an agent that builds other agents.

A year ago, “agentic” was a buzzword on conference slides. Now it’s the default product positioning for every major lab. The shift happened faster than most people noticed.

What’s converging isn’t just the language. It’s the architecture. Every model this week shipped with:

  • Long context windows (1M tokens for Sonnet 4.6, 1M for Qwen 3.5 hosted)
  • Tool use as a first-class feature, not an afterthought
  • Multi-step task execution as the primary design target
  • Cost efficiency as competitive positioning (Sonnet at $3/$15, Qwen open-weight, Grok bundled with X Premium)

The models are converging. The benchmarks are converging. The pricing is converging. When everything looks the same, differentiation moves somewhere else.

The Ecosystem War

Steinberger’s acqui-hire is the clearest signal of where that differentiation is going.

OpenClaw went from a solo project to 198K GitHub stars to a bidding war between OpenAI, Meta, and Microsoft in under three months. Altman called personally. Zuckerberg messaged on WhatsApp and tested the product himself. Nadella reached out directly. Steinberger was hemorrhaging $10-20K per month running it.

He chose OpenAI because they agreed to keep it open source.

What I want is to change the world, not build a large company.

— Peter Steinberger

The model underneath doesn’t matter to OpenClaw. It started on Claude, got hit with Anthropic’s C&D over the name, switched to Moltbot, became OpenClaw, and now lives inside OpenAI. It’ll run on whatever model is best. The value was never the model. It was the agent layer on top: the integrations, the personality, the memory, the messaging platform connectors that let 198K people delegate real tasks through WhatsApp.

This is the pattern the labs are waking up to. Models are commoditizing. The moat is the ecosystem. OpenAI bought the ecosystem. Meta acquired Manus AI for the same reason. Dreamer raised $50M to build one from scratch. The race isn’t “who has the best model” anymore. It’s “who has the best platform for agents to live on.”

The Anthropic angle

This should sting for Anthropic. OpenClaw was built on Claude, named after Claude, and had a community of users building on Claude’s ecosystem. Instead of embracing it, Anthropic sent a cease-and-desist. Steinberger renamed, moved on, and ended up at their chief rival. The most viral agent project in recent memory pushed directly into OpenAI’s arms by a trademark dispute.

What the Smart People Are Saying

The most interesting reactions this week weren’t about individual models.

Karpathy (990K views) quote-tweeted Thomas Wolf’s thread on “shifting structures in a software world dominated by AI” and mused about programming languages and formal methods becoming critical again. His point: if AI writes the code, the value of languages that make AI-written code verifiable goes up. The models get better at generating. We need to get better at checking.

Simon Willison (156K views, 2.3K likes) coined a term that deserves more attention: cognitive debt. Technical debt is code you know is messy. Cognitive debt is code you don’t understand at all because an AI wrote it and you never reviewed it. His argument: agentic AI doesn’t eliminate technical debt. It converts it into something harder to see.

I’m seeing this in my own work, where excessive unreviewed AI-generated code creates cognitive debt.

— Simon Willison

Ethan Mollick published his updated “Guide to Which AI to Use” and had to subtitle it “in the Agentic Era.” When the professor who wrote the book on AI adoption starts framing everything around agents, the paradigm shift is done.

The $2 Trillion Question

While all these models shipped, $2 trillion was wiped off software market caps. J.P. Morgan estimates legal, IT, consulting, and logistics sectors all took hits. Thomson Reuters, LegalZoom, and the broader SaaS sector continue the slide that accelerated around Opus 4.6’s launch.

The market’s thesis is simple: if models can do knowledge work, the companies selling tools for knowledge work are worth less. Every “agentic” announcement reinforces that thesis.

But here’s what the market isn’t pricing in: these models still can’t do most of what they’re marketed to do. Sonnet 4.6 is impressive. It also scores 72.5% on OSWorld, meaning it fails more than a quarter of the time on computer use tasks. Qwen 3.5’s benchmarks are self-reported and unverified. Grok 4.2’s showcase benchmark is a stock-trading simulation. Dreamer is a demo video.

The gap between capability and reliability hasn’t closed. It’s just gotten noisier.

What’s Actually Different

Strip away the hype and three things genuinely changed this week:

1. The cost floor dropped. Sonnet 4.6 delivering near-Opus performance at Sonnet pricing makes sustained agentic workflows more viable. Running three Opus agents in parallel for an hour adds up. Running three Sonnet agents that score within 1-2 points of Opus on the benchmarks that matter is a different calculation entirely.

2. Open-weight models caught up. Qwen 3.5 being open-weight and genuinely competitive with frontier models means enterprises now have a real choice between API dependence and self-hosting. 140,000 derivative models. 300 named variants. Baidu integrating it into their main app. The open-source ecosystem isn’t an afterthought anymore. It’s a parallel frontier.

3. Weekly improvement cycles are new. Grok 4.2’s “rapid learning” claim is unproven, but the concept of a model that improves weekly based on real-world usage is genuinely novel. Every other lab ships discrete versions. If xAI pulls this off, the versioning model for AI changes fundamentally. No more “4.6 vs 4.5” comparisons. Just a system that’s different tomorrow than it was today.

What Hasn’t Changed

  • Context is still a budget, not infinite storage. 1M tokens doesn’t mean 1M tokens of reliable retrieval. The dumb zone didn’t disappear.
  • The skill gap is widening, not narrowing. More capable models amplify the difference between people who can scope work precisely and people who can’t. The delegation problem scales with capability.
  • Nobody is reviewing the output. Willison’s cognitive debt observation is the most important thing said this week. The models got faster. The review process didn’t.
  • Demos aren’t deployments. Enterprise adoption still hits the same friction: security, compliance, integration, institutional inertia. A $2 trillion market cap wipeout based on demo videos is the market getting ahead of itself.
The question to ask

Before you switch models, upgrade your agent, or panic about your job: can you clearly define six hours of work well enough to hand it to any of these systems? If not, the model doesn’t matter. The bottleneck is you.

Where This Leaves Us

February 2026 has been the most compressed period of AI releases in history. Opus 4.6, Codex-Spark, Sonnet 4.6, Qwen 3.5, Grok 4.2, OpenClaw’s acqui-hire, Dreamer’s launch. All in 18 days.

The pace is the product now. Not any single model. The collective acceleration. Labs are shipping faster because their competitors are shipping faster. The feedback loop has no natural brake.

For practitioners, the practical takeaway is unglamorous: the tools are getting better faster than most people’s ability to use them well. The models shipped this week are all capable enough. The limiting factor is workflow. Can you decompose problems cleanly? Can you verify output systematically? Can you resist the urge to ship unreviewed AI code?

The models won’t slow down. The question is whether we speed up.