Running Claude Code Fully Local with Ollama

Sometimes sensitive data can’t leave your network. GDPR data residency requirements. Enterprise compliance policies. Air-gapped environments. Or just the principle of keeping private information off third-party servers.

Ollama 0.14 added native Anthropic API compatibility. Claude Code speaks to it directly. No translation layer. No shims. Your prompts and code never leave localhost.

The Setup

You’ll need the Ollama pre-release for full tool use support. Streaming tool calls landed in recent pre-releases and aren’t in stable yet.

# Install latest pre-release
curl -fsSL https://ollama.com/install.sh | OLLAMA_VERSION=0.14.3-rc1 sh

# Pull a tool-capable model
ollama pull glm-4.7-flash:latest

Pre-release required

Stable Ollama has issues with streaming tool calls that break Claude Code’s agentic loop. Use a recent pre-release (0.14.3-rc1 or later) until these fixes land in stable.

Why GLM-4.7-Flash

GLM-4.7-Flash from Zhipu AI hits the sweet spot for local inference. The flash variant specifically - it has native tool calling support that Claude Code requires:

30B MoE, 3B active: Only activates 3 billion parameters per token. Fast inference on consumer hardware.
128K context: Plenty for local work.
79.5% on tool-use benchmarks: Built for agentic workflows, not just chat.
~6.5GB VRAM with Q4: Runs on RTX 3060/4060 or M1/M2 Macs with 16GB.
Near-zero marginal cost: After hardware, every token is free.

Local models aren’t Claude. But for many tasks, they’re good enough. And they’re yours.

— The tradeoff

The Role Model Problem

Claude Code doesn’t just use one model. It routes to different models for different task types: haiku for quick operations, sonnet for standard work, opus for complex reasoning.

If you only set ANTHROPIC_MODEL, the role model requests still try to reach Anthropic’s servers. Your main model runs local, but background calls leak to the cloud.

claude-launcher 0.3.0 fixes this. It always remaps role models to your selected Ollama model. Say yes to customize which model each role uses:

? Configure role models? (ensures all requests stay local) Yes
? Sonnet (lighter tasks): glm-4.7-flash:latest
? Opus (complex tasks): glm-4.7-flash:latest
? Haiku (quick/cheap): glm-4.7-flash:latest

Either way, all four env vars get set. Every request stays on your machine.

npm install -g claude-launcher
claude-launcher -l  # Launch with Ollama backend

The Compliance Angle

GDPR Article 35 mandates data protection impact assessments for large-scale processing. Violations can hit 4% of global revenue. The right to erasure is nearly impossible with cloud LLMs that encode data in model weights.

Data residency laws in many jurisdictions require data to stay within geographic boundaries. If your LLM provider hosts in the US, you may be non-compliant by default.

Local inference sidesteps all of this. Your code, your prompts, your machine. No cross-border data transfer. No third-party processing agreements. No audit trail requirements for external providers.

Not just code

Compliance isn’t usually about proprietary code. It’s PII. Customer data. Support tasks that touch user records. Database queries with real names and emails. Local inference means none of that leaves your machine.

Real-World Benchmark

I ran the same investigation task against a production .NET monorepo: “find all places X is created” - tracing a domain entity through services, endpoints, and handlers. This is the kind of practical task working engineers actually do - not synthetic benchmarks. Both runs used a standard Claude Code setup with typical system prompts and tool definitions.

	Claude (Cloud)	GLM-4.7 (Local)
Tool calls	30	41
Time	1m 13s	1h 22m
Tokens	74.1k	~70k (estimated)

The outputs were nearly indistinguishable. Both identified:

The core service orchestrating the creation flow
Multiple API endpoints triggering creation
Conditional creation patterns in request handlers
External API integrations for the actual creation call

Quality parity

For investigative and exploratory work, local model output quality matched cloud Claude. The model found the same architectural patterns, understood the codebase structure, and produced comparable analysis.

The Speed Problem

Local inference is ~68x slower. That’s the honest tradeoff today.

On a MacBook Pro M1 Max with 64GB RAM running GLM-4.7 with 32K context, each tool call took about 2 minutes average. Cloud Claude processes the same calls in 2-3 seconds.

This will improve:

Hardware is advancing fast: M5 Max and future chips will close the gap
Model optimization continues: Better quantization, speculative decoding, and MoE routing improvements
Context management: Smaller context windows run faster - 8K vs 32K makes a big difference

When Local Makes Sense

Local models aren’t a free lunch:

Speed penalty: 68x slower makes local impractical for rapid iteration
Capability gap: GLM-4.7 is impressive, but it’s not Opus 4.5. Complex refactors still benefit from frontier models.
Hardware requirements: You need a decent GPU or M-series Mac. CPU-only inference exists but is painful.

The practical approach: switch based on what you’re working on. Working with sensitive data - legal records, medical information, private customer details? Go local and accept the slower pace. Rapid prototyping or refactoring public code? Use cloud. claude-launcher -l and claude-launcher -a make switching instant.

Try It

# Install Ollama pre-release
curl -fsSL https://ollama.com/install.sh | OLLAMA_VERSION=0.14.3-rc1 sh

# Pull a model
ollama pull glm-4.7-flash:latest

# Install launcher and go
npm install -g claude-launcher
claude-launcher -l

First run configures role models. After that, claude-launcher -l launches fully local. Switch back to cloud anytime with -a (Anthropic) or -o (OpenRouter).

Your code. Your machine. Your rules.

Running Claude Code Fully Local with Ollama

The Setup

Why GLM-4.7-Flash

The Role Model Problem

The Compliance Angle

Real-World Benchmark

The Speed Problem

When Local Makes Sense

Try It

Share this article

Related Posts

Claude Code Hooks: Guardrails That Actually Work

claude-launcher: Model Freedom for Claude Code

Stop Speedrunning Claude Code (Master the Core Loop First)