Anthropic quietly renamed “Claude Code SDK” to “Claude Agent SDK”. The reason? Internally, they’ve been using Claude Code for deep research, video creation, and note-taking. Not just coding. The agent harness now powers almost all of Anthropic’s major agent loops.

Deep research isn’t new. GPT Researcher has been around for a while. What’s new: Claude’s extended thinking plus tool use makes multi-step verification actually work. Not just “search and summarize” but genuine claim verification across sources.

Here are three approaches, from minimal to production-ready.

1. DIY: Recursive Agent Spawning

The simplest approach is Cranot’s deep-research: ~20 lines of shell, zero dependencies.

The key trick is one flag: --allowedTools "Bash(claude:*)". This lets Claude spawn more Claude instances. That’s it. That’s the whole architecture.

Researchers run in parallel. Unlimited depth. Results flow back up into synthesis.

How it works:

  • Orchestrator receives the initial question
  • Spawns parallel researchers for different angles
  • Each researcher decides: answer directly or spawn sub-agents
  • Results aggregate upward into a final report
  • Output: Timestamped markdown with synthesis

The elegance is in what’s missing. No framework. No dependencies. No configuration. Just shell scripts and Claude Code CLI.

Best for

Exploration, one-off research questions, and learning how recursive agents work. Low investment, high flexibility.

Trade-offs:

  • Token costs can spiral with deep recursion
  • No progress visibility (you wait and hope)
  • No persistence across sessions
  • Quality depends entirely on your prompts

2. MCP: Plug-and-Play Research Tools

If you’re already using Claude Code daily, MCP servers add research capabilities without changing your workflow.

Claude-Deep-Research integrates DuckDuckGo and Semantic Scholar. It provides structured prompts that guide Claude through: initial exploration, preliminary synthesis, follow-up research, comprehensive analysis, and proper citations (APA style).

SuperClaude Framework offers /sc:research with Tavily integration for deep web research.

The MCP approach slots into your existing setup. Install the server, restart Claude Code, and you have new capabilities available.

MCP tool weight tradeoff

Every MCP tool definition consumes context tokens upfront. A research server with 10+ tools can eat 5-10k tokens before you ask a single question. Anthropic’s API has Tool Search with defer_loading to solve this, but it’s not in Claude Code yet. Until then, be selective about which MCP servers you enable.

Best for

Augmenting existing Claude Code workflows. You keep your current habits but gain structured research tools when needed.

Trade-offs:

  • Limited to what the MCP server exposes
  • Still single-session (no persistence)
  • Less control than DIY, more than built-in

3. Production: Building a Research Product

When you need to ship research to users, the requirements change completely.

I’ve been building a verification product that analyzes claims across multiple sources. The core pipeline: scrape sources, extract claims, verify across databases, score confidence, surface discrepancies. Not revolutionary architecture, but the production details matter.

What production requires that DIY doesn’t:

  • Progress streaming: 30-90 second analysis time feels like forever. Server-sent events show “Found company website…”, “Verifying funding data…”, “Cross-referencing claims…”. Latency becomes engagement instead of frustration.

  • Cost tracking: Real numbers matter. My pipeline runs $0.20-0.60 per research. That sounds cheap until you’re doing 10,000/month. Track per-request costs from day one.

  • Graceful fallbacks: Scraping fails constantly. Sites block, structures change, rate limits hit. When URL scrape fails, prompt for manual text input. Never hard-fail.

  • Multi-source verification: The real value isn’t search. It’s cross-referencing. If source A claims X, does source B confirm it? Discrepancies are often more valuable than confirmations.

The scraping problem

Every production research system fights the same battle: getting reliable data from websites that don’t want to be scraped. Budget significant time for fallback paths and manual overrides.

Trade-offs:

  • Significant engineering investment
  • Infrastructure costs (compute, storage, APIs)
  • Maintenance burden (sources change constantly)
  • But: full control, custom domain logic, real product differentiation

Choosing Your Approach

NeedApproach
Learning how agents workDIY recursive spawning
Quick research in existing workflowMCP server
Building a productFull production stack
Just need answers (no customization)Anthropic’s built-in Research

The approaches aren’t mutually exclusive. Start with DIY to understand the patterns. Add MCP for daily workflow. Graduate to production when you have a specific domain and users.

What Deep Research Doesn’t Solve

  • Hallucination risk: Multi-source verification reduces hallucinations but doesn’t eliminate them. Claude can still confidently synthesize incorrect conclusions.

  • Source quality: Garbage in, garbage out. If your sources are wrong, verification just confirms the wrong answer with citations.

  • Cost at scale: Recursive spawning burns tokens fast. A deep research query can easily hit $1-5 in API costs. Plan for this.

  • Latency expectations: Real research takes 30-90+ seconds minimum. Users conditioned on instant search results need UX education or progress feedback.

  • Legal/ethical gray areas: Automated scraping at scale raises questions. Rate limiting, robots.txt compliance, and terms of service all matter in production.

The agent framework hiding in plain sight wasn’t a coding tool. It was a research tool that happened to be good at coding.

The Bigger Picture

Anthropic’s SDK rename signals where this is heading. Claude Code proved that agent harnesses work. Now those harnesses are generalizing beyond code.

The multi-agent research system Anthropic published shows the pattern: orchestrator-worker architecture, specialized subagents running in parallel, results flowing back up for synthesis. Sound familiar? It’s the same pattern as Cranot’s 20-line shell script, just productionized.

The interesting question isn’t which approach to choose. It’s what happens when deep research becomes table stakes. When every product can verify claims across sources, the differentiator shifts to domain expertise, source access, and knowing which questions matter.

We’re not there yet. But the SDK rename suggests Anthropic sees it coming.