Anthropic released Claude Opus 4.5 today. Here’s the TLDR:
- Token efficiency: 76% fewer output tokens at medium effort, 48% at max effort
- Automatic context summarization: Long conversations no longer hit walls
- Tool Search: Discover tools on-demand instead of loading definitions upfront
- Effort parameter: API control to balance speed/cost vs capability
- Pricing: $5/$25 per million tokens (down from $15/$75)
- Claude Code: Improved Plan Mode, multi-agent parallel sessions in desktop app
The feature that matters most for my workflow: Tool Search.
The Problem I Wrote About
Last month I published Isolating MCP Context in Claude Code. The core issue:
Chrome DevTools MCP? 20,000 tokens just to load the tool definitions. Add a few more MCP servers and you’re starting every conversation with 50% of your context window already gone.
My workaround: slash commands that spawn isolated Claude instances with separate MCP configs. /chrome for browser debugging, /db for database queries. Each runs in its own context, reports back results, keeps the main conversation clean.
It worked. But it was a workaround for a problem that shouldn’t exist.
The Native Solution
Opus 4.5 ships with Tool Search. Instead of loading all tool definitions upfront, you mark tools with defer_loading: true:
{
"type": "mcp_toolset",
"mcp_server_name": "chrome-devtools",
"default_config": {"defer_loading": true}
}
Claude sees only the Tool Search Tool initially. When it needs specific capabilities, it searches for relevant tools. Matching tools get expanded into full definitions on-demand.
Deferred tools aren’t loaded into context initially. Claude uses regex or BM25-based search to discover tools when needed. You can also implement custom search using embeddings.
The numbers are dramatic:
- Traditional: ~72K tokens upfront for 50+ MCP tools
- With Tool Search: ~500 tokens initially, plus 3-5 tools (~3K) on-demand
- Result: 85% token reduction while maintaining full tool access
Accuracy improved too. Opus 4.5 on MCP evaluations: 79.5% → 88.1% with Tool Search enabled. Fewer tools in context means less confusion between similar tools like notification-send-user vs notification-send-channel.
— Anthropic, Advanced Tool UseThe tool search tool lets agents work with hundreds of tools by dynamically discovering and loading only what they need instead of loading all definitions upfront.
Does This Make Isolation Patterns Obsolete?
Mostly yes. For pure token savings, defer_loading wins. No slash commands, no subprocess spawning, no context juggling. Just a flag.
But the isolation pattern survives for specific cases:
- Complete context separation: Tool Search reduces token overhead. It doesn’t prevent Chrome DevTools output (console logs, network traces, DOM snapshots) from filling your conversation. If you want debugging artifacts isolated from your main coding context, spawn a separate instance.
- Specialized system prompts: My
/chromecommand includes a focused system prompt for browser debugging. Tool Search doesn’t help with cognitive specialization. - Permission boundaries:
--allowed-toolsconstrains what a spawned instance can access. Tool Search is about discovery, not restriction. - Self-contained reports: Isolated instances return concise results. No conversation history pollution.
The practical split: Use defer_loading for token efficiency. Use isolation patterns when you need cognitive or permission boundaries.
What Else Is New
The Advanced Tool Use post covers two other features worth knowing:
- Programmatic Tool Calling (PTC): Claude writes Python to orchestrate tools in a sandbox. Only final output returns to context. 37% token reduction on complex research tasks, eliminates 19+ inference passes when running 20+ tool calls.
- Tool Use Examples:
input_examplesparameter provides sample tool calls. Improved accuracy from 72% → 90% on complex parameter handling.
Both require the same beta header as Tool Search.
The Bigger Picture
Tool Search is the headline feature. Combined with Opus 4.5’s other improvements:
- 76% fewer output tokens means longer runs before hitting limits
- Automatic context summarization handles conversation overflow
- Effort parameter lets you dial quality vs cost per-task
This is infrastructure for the SDLC collapse I wrote about. When agents can sustain multi-hour reasoning across planning, building, testing, and documentation, they need efficient context management. Tool Search + token efficiency + auto-summarization = longer autonomous runs with less human intervention.
Simon Willison tested Opus 4.5 and couldn’t identify meaningful capability differences from Sonnet 4.5 in practice. The improvements might be more about efficiency than raw capability. That’s still valuable, but temper expectations.
What I’m Changing
- Removing MCP isolation entirely:
defer_loadinghandles token savings natively. No more/chromeor/dbslash commands for context isolation. - Still using libraries over MCPs: As I wrote in my MCP Code Wrapper post, native libraries (pg, Playwright) are faster and lower friction than MCP wrappers. That hasn’t changed.
- Keeping multi-model slash commands:
/codex,/gemini, and/nano-bananaremain. These aren’t about context isolation. They’re about cognitive specialization: different models for different thinking modes.
Try It
Tool Search is currently API-only with a beta header:
client.beta.messages.create(
betas=["advanced-tool-use-2025-11-20"],
model="claude-opus-4-5-20251101",
tools=[
{"type": "tool_search_tool_regex_20251119", "name": "tool_search_tool_regex"},
# Your MCP tools with defer_loading: true
]
)
As of writing, Tool Search and defer_loading aren’t available in Claude Code. The v2.0.51 release added Opus 4.5 and improved Plan Mode, but MCP tool discovery is still API-only. Expect this to change soon.
Tool Search: You have many MCP tools and want token efficiency. Most cases.
Isolation patterns: You need context separation, specialized prompts, or permission boundaries. Specific cases.
The workaround I built was necessary at the time. Now there’s a better way for most of it. That’s progress.


