There’s a sentence in Anthropic’s Claude Sonnet 5 announcement that a competitor would pay to have written. It’s about their own cheaper model, and it quietly hands away the premium tier’s entire pitch.
— Anthropic, introducing Claude Sonnet 5Its performance is close to that of Opus 4.8, but at lower prices.
When the vendor tells you the budget option is nearly as good as the premium one, believe them. They priced it, they benchmarked it, and they said it out loud anyway. That single line is the whole story of where Opus 4.8 now sits: not dethroned by a rival, but hollowed out from the inside by the tier below it.
The numbers, briefly
The sticker gap is stark. Opus 4.8 runs $5 per million input tokens and $25 output. Sonnet 5 launched at an introductory $2 and $10 (rising to $3 and $15 after August 31). At intro pricing that’s 2.5x cheaper.
One honest correction to that headline: Sonnet 5 ships with a newer tokenizer that produces roughly 30% more tokens for the same text. So the real per-task saving is closer to 1.9x than 2.5x. Cheaper, just not as cheap as the price card implies.
On capability, the picture is a near-tie with one exception. These figures come from Anthropic’s own published charts (relayed second-hand, so treat them as directional):
- Deep agentic coding: Opus leads. Around 69% vs 63% on SWE-bench Pro, a real six-point gap on the hardest multi-step work.
- Tool-use reasoning: a dead heat. Roughly 57.9 vs 57.4 on Humanity’s Last Exam with tools.
- Knowledge work: Sonnet actually edges ahead on GDPval.
- Terminal and CLI tasks: Sonnet wins outright, around 80%.
So Opus keeps a genuine lead on exactly one thing: the deepest, longest coding chains. Everywhere else, the cheaper model is inside the margin of noise.
The squeeze isn’t where you think
There’s a tempting story that Anthropic quietly strangled Opus with rate limits, and that’s what herded everyone onto Sonnet. It’s mostly wrong. Through 2026 the ropes got looser, not tighter: Claude Code’s five-hour limits doubled in early May, a peak-hour throttle was removed, and weekly limits rose 50%, all riding on a fresh SpaceX compute deal. For a single conversation, Opus stayed a comfortable daily driver.
The “plan with Opus, execute with Sonnet” split wasn’t forced by scarcity either. Anthropic shipped Opus plan mode back in August 2025, and the community’s own verdict was that it “automates what everyone was already doing manually” - weeks before the weekly limits had even taken effect. It’s a capability-and-cost call, not rationing: Opus for the reasoning-heavy planning where its edge is real, Sonnet for the fast, cheap execution.
So where does the squeeze people genuinely feel actually come from? Themselves. Nobody runs one conversation anymore. You run an orchestrator that fans out a dozen long-lived subagents across three or four workstreams at once, chewing through context in the background while you get on with something else. One developer reported a single five-hour session where Opus spawned 451 Sonnet subagents and burned through 14 million tokens. The unit of work stopped being a chat turn and became a fleet.
That is what eats the budget: not a dial Anthropic turned, but a working style scaling horizontally faster than any quota can follow. And it is exactly why the price gap between Opus and Sonnet stopped being abstract. When your default move is to spin up fifty agents at once, the question is no longer “which model is smartest,” it’s “which model can I afford fifty of.” Sonnet wins that by construction. Opus gets promoted to the single brain doing the thinking, because running the whole fleet on Opus is financial nonsense.
— r/ClaudeAI, on reaching for Opus by defaultI’d say it sounds more like you tried to crush an ant with an excavator.
So Opus doesn’t get throttled. It gets sandwiched: Sonnet undercutting it from below on price, and its own users’ appetite for fan-out squeezing it into an ever-narrower role from the inside.
The “just default to Sonnet” advice has a sharp edge. At maximum effort settings, Sonnet 5 is notably token-hungry. Multiple people running max-effort agent loops report it costing more than Opus 4.8 for a lower pass rate, because it burns so many reasoning tokens. As one developer put it bluntly on Hacker News: “Sonnet 5 on high costs more than Opus 4.8 at a lower pass rate.” The cheap-default story holds for routine execution. It can invert the moment you crank the effort dial.
It cuts both ways
Be fair to Opus, because the case isn’t one-sided. It still owns the hardest deep-agentic coding by a real margin, it holds Anthropic’s own top scores on a handful of specialist benchmarks, and it has a few things Sonnet doesn’t: a fast mode, and the tightest instruction-following at the top of the range. If you’re doing genuinely hard, long-horizon engineering and you have the budget, it’s still the pick.
And nobody’s thrilled with either one. Sonnet 5’s own reception has been lukewarm, more consolation prize than event. Opus, for its part, gets knocked for being chatty: one popular line says it “bills by the paragraph.” The recurring gripe across Hacker News and Reddit is that the whole generation feels incremental. As one reply to a widely-read Opus 4.8 review put it, 4.7 and 4.8 are “both 4.6 that have each been RLd on top to vastly diminishing returns.” The big leaps are behind us; the fight has moved from raw capability to cost and routing.
What it means
Opus 4.8 is now the expensive middle: roughly double Sonnet’s price for a handful of benchmark points, the brain you plan with rather than the fleet you run. Its remaining job is narrow and real, but it’s a job, not a throne.
And here’s the part worth watching: the ladder just gained a rung back. Fable 5, the tier above Opus, returns this week. The US Commerce Department just lifted the export controls that had pulled it offline, and through July 7 it’s bundled into Pro and Max plans for up to half your weekly usage limit, before moving behind separate credits. So a smarter, hungrier model is about to eat into the very quota your subagents were already draining. The middle gets pressed from every side at once: undercut from below on price, out-classed from above on capability, and drained from within by your own fan-out.
The convenient framing is “which model wins.” That was never the right question. The right one is where each tier’s job actually lands once the dust settles, and for the premium tier the answer got a lot smaller than the launch-day benchmarks suggested.
The head-to-head benchmark figures here are Anthropic’s own, relayed second-hand rather than independently reproduced, so read them as directional not gospel. Intro pricing on Sonnet 5 expires August 31, which narrows the gap. And the usage-limit numbers are subscription-plan policy that Anthropic has changed repeatedly in 2026, so anything specific may be stale by the time you read it. Verify the limits against your own plan before you architect a workflow around them.



