Sometimes sensitive data can’t leave your network. GDPR data residency requirements. Enterprise compliance policies. Air-gapped environments. Or just the principle of keeping private information off third-party servers.

Ollama 0.14 added native Anthropic API compatibility. Claude Code speaks to it directly. No translation layer. No shims. Your prompts and code never leave localhost.

The Setup

You’ll need the Ollama pre-release for full tool use support. Streaming tool calls landed in recent pre-releases and aren’t in stable yet.

# Install latest pre-release
curl -fsSL https://ollama.com/install.sh | OLLAMA_VERSION=0.14.3-rc1 sh

# Pull a tool-capable model
ollama pull glm-4.7-flash:latest
Pre-release required

Stable Ollama has issues with streaming tool calls that break Claude Code’s agentic loop. Use a recent pre-release (0.14.3-rc1 or later) until these fixes land in stable.

Why GLM-4.7-Flash

GLM-4.7-Flash from Zhipu AI hits the sweet spot for local inference. The flash variant specifically - it has native tool calling support that Claude Code requires:

  • 30B MoE, 3B active: Only activates 3 billion parameters per token. Fast inference on consumer hardware.
  • 128K context: Plenty for local work.
  • 79.5% on tool-use benchmarks: Built for agentic workflows, not just chat.
  • ~6.5GB VRAM with Q4: Runs on RTX 3060/4060 or M1/M2 Macs with 16GB.
  • Near-zero marginal cost: After hardware, every token is free.

Local models aren’t Claude. But for many tasks, they’re good enough. And they’re yours.

— The tradeoff

The Role Model Problem

Claude Code doesn’t just use one model. It routes to different models for different task types: haiku for quick operations, sonnet for standard work, opus for complex reasoning.

If you only set ANTHROPIC_MODEL, the role model requests still try to reach Anthropic’s servers. Your main model runs local, but background calls leak to the cloud.

claude-launcher 0.3.0 fixes this. It always remaps role models to your selected Ollama model. Say yes to customize which model each role uses:

? Configure role models? (ensures all requests stay local) Yes
? Sonnet (lighter tasks): glm-4.7-flash:latest
? Opus (complex tasks): glm-4.7-flash:latest
? Haiku (quick/cheap): glm-4.7-flash:latest

Either way, all four env vars get set. Every request stays on your machine.

npm install -g claude-launcher
claude-launcher -l  # Launch with Ollama backend

The Compliance Angle

GDPR Article 35 mandates data protection impact assessments for large-scale processing. Violations can hit 4% of global revenue. The right to erasure is nearly impossible with cloud LLMs that encode data in model weights.

Data residency laws in many jurisdictions require data to stay within geographic boundaries. If your LLM provider hosts in the US, you may be non-compliant by default.

Local inference sidesteps all of this. Your code, your prompts, your machine. No cross-border data transfer. No third-party processing agreements. No audit trail requirements for external providers.

Not just code

Compliance isn’t usually about proprietary code. It’s PII. Customer data. Support tasks that touch user records. Database queries with real names and emails. Local inference means none of that leaves your machine.

Limitations

Local models aren’t a free lunch:

  • Slower than cloud: Even fast local inference adds latency compared to Anthropic’s infrastructure.
  • Capability gap: GLM-4.7 is impressive, but it’s not Opus 4.5. Complex refactors and architectural decisions still benefit from frontier models.
  • Hardware requirements: You need a decent GPU or M-series Mac. CPU-only inference exists but is painful.

The practical approach: switch based on what you’re working on. Working with sensitive data - legal records, medical information, private customer details? Go local. Refactoring a public API? Use cloud. claude-launcher -l and claude-launcher -a make switching instant.

Try It

# Install Ollama pre-release
curl -fsSL https://ollama.com/install.sh | OLLAMA_VERSION=0.14.3-rc1 sh

# Pull a model
ollama pull glm-4.7-flash:latest

# Install launcher and go
npm install -g claude-launcher
claude-launcher -l

First run configures role models. After that, claude-launcher -l launches fully local. Switch back to cloud anytime with -a (Anthropic) or -o (OpenRouter).

Your code. Your machine. Your rules.