Ollama added native Anthropic Messages API compatibility in early 2026. Claude Code speaks that same API. Put them together and Claude Code routes requests to a local Gemma 4 model instead of Anthropic’s servers — no API costs, no data leaving your machine, full Claude Code workflow intact (file editing, tool calling, shell execution).
This guide covers the exact setup as of April 2026: correct env vars, right model tags, context window gotcha, and what works vs what doesn’t.
:::note[TL;DR]
- Ollama now exposes an Anthropic-compatible Messages API at
http://localhost:11434(not the/v1path — that’s the OpenAI compat layer) - Env vars:
ANTHROPIC_BASE_URL=http://localhost:11434,ANTHROPIC_AUTH_TOKEN=ollama,ANTHROPIC_API_KEY="" - Correct Gemma 4 tags:
gemma4:e4b(default, 9.6GB),gemma4:26b(18GB MoE),gemma4:31b(20GB Dense) - Claude Code needs at least 64K context — set
num_ctx 65536in a Modelfile or Ollama will default to much less - Tool calling (file edits, bash) works;
tool_choiceforced use and prompt caching do not :::
Prerequisites
- Claude Code installed:
npm install -g @anthropic-ai/claude-code - Ollama installed (v0.14.0+ required for Anthropic API compat; v0.14.3+ for stable streaming tool calls)
- Hardware for Gemma 4:
gemma4:e4b— 9.6GB, runs on 16GB RAM Mac or 8GB VRAM GPUgemma4:26b— 18GB, needs M2 Pro/Max or RTX 3090+gemma4:31b— 20GB, needs 24GB+ VRAM or M3 Max/M4 Max
Check your Ollama version:
ollama --version
# Should be 0.14.0 or later
If you need to update:
# macOS
brew upgrade ollama
# Linux
curl -fsSL https://ollama.com/install.sh | sh
Step 1: Pull Gemma 4
The correct model tags for Gemma 4 in Ollama (as of April 2026):
# Default — E4B edge model, 9.6GB, 128K context, best for most machines
ollama pull gemma4
# E2B — lighter, 7.2GB, 128K context
ollama pull gemma4:e2b
# 26B MoE — activates 4B params, 18GB, 256K context (best coding quality)
ollama pull gemma4:26b
# 31B Dense — 20GB, 256K context, maximum quality
ollama pull gemma4:31b
gemma4 with no tag pulls gemma4:e4b — the 9.6GB edge model. For Claude Code coding tasks, gemma4:26b is the best quality-to-speed tradeoff if your machine can handle the 18GB.
Verify the download:
ollama list
# NAME ID SIZE MODIFIED
# gemma4:26b abc123def456 18 GB 2 minutes ago
Step 2: Fix the context window before connecting
This is the most common failure point. Ollama defaults to a low context window (2K–4K tokens). Claude Code sends long system prompts plus file contents — it needs at least 64K tokens to function properly. Without this, Claude Code will fail mid-task with truncation errors.
Create a Modelfile that sets the context to 64K:
mkdir -p ~/.ollama/Modelfiles
cat > ~/.ollama/Modelfiles/gemma4-claude <<'EOF'
FROM gemma4:26b
PARAMETER num_ctx 65536
PARAMETER temperature 0.2
PARAMETER top_p 0.9
EOF
Build the custom model:
ollama create gemma4-claude -f ~/.ollama/Modelfiles/gemma4-claude
Verify it’s available:
ollama list
# gemma4-claude ... 18 GB
Use gemma4-claude (not gemma4:26b) when connecting to Claude Code. The base model ignores context size; this variant enforces 64K.
Step 3: Set the environment variables
Ollama’s Anthropic-compatible endpoint is at http://localhost:11434 — not http://localhost:11434/v1. The /v1 path is Ollama’s OpenAI-compatible layer. Claude Code uses the Anthropic protocol, which maps to the root endpoint.
export ANTHROPIC_BASE_URL=http://localhost:11434
export ANTHROPIC_AUTH_TOKEN=ollama
export ANTHROPIC_API_KEY=""
The ANTHROPIC_AUTH_TOKEN value can be any non-empty string — Ollama ignores it but Claude Code requires it to be set. Setting ANTHROPIC_API_KEY="" prevents Claude Code from falling back to a real API key if one is set in your environment.
Start Claude Code with the model:
claude --model gemma4-claude
Quick test:
claude --model gemma4-claude -p "What model are you?"
Gemma 4 will describe itself. If you see a connection error, confirm Ollama is running:
ollama serve # start if not running
curl http://localhost:11434/api/tags # confirm it responds
Step 4: Per-project config with .claude/settings.json
Setting env vars globally means every Claude Code session on every project routes to Ollama. Usually you want this scoped to specific projects.
In your project root:
mkdir -p .claude
cat > .claude/settings.json <<'EOF'
{
"model": "gemma4-claude",
"env": {
"ANTHROPIC_BASE_URL": "http://localhost:11434",
"ANTHROPIC_AUTH_TOKEN": "ollama",
"ANTHROPIC_API_KEY": ""
}
}
EOF
Claude Code loads .claude/settings.json automatically when run from that directory. Projects without this file continue using the Anthropic API normally.
Add it to .gitignore if you don’t want teammates picking up local model config:
echo ".claude/settings.json" >> .gitignore
Or commit it if local-model-first is a team decision — just ensure everyone has Ollama running.
Step 5: Document the setup in CLAUDE.md
Claude Code reads CLAUDE.md in the project root as persistent context for every session. Add your model routing strategy so it’s clear when to use local vs cloud:
## Model setup
This project defaults to Gemma 4 locally via Ollama (see `.claude/settings.json`).
To switch to Claude for a specific task:
```bash
ANTHROPIC_BASE_URL="" ANTHROPIC_API_KEY="your-key" claude --model claude-sonnet-4-6
Use Claude API for: multi-file refactors, complex debugging, architecture decisions. Use Gemma 4 locally for: file summaries, boilerplate generation, single-file edits, tests.
---
## What works and what doesn't
Ollama's Anthropic compatibility layer supports most of what Claude Code needs, but not everything. As of April 2026:
**Works:**
- Messages API (multi-turn conversation)
- Streaming responses
- System prompts
- Tool calling — file reads, file edits, bash execution
- Vision / image input (base64 only, not URL)
- Temperature, top_p, stop sequences
**Does not work:**
- `tool_choice` (forced tool selection) — Claude Code uses this occasionally; it silently falls back to auto
- Prompt caching — no performance benefit from repeated identical prompts
- Extended thinking / budget_tokens — parameter is accepted but not enforced
- URL-referenced images — only base64 works
- Token counting endpoint
In practice, basic Claude Code tasks — file reads, edits, bash commands, test generation — work correctly. The missing `tool_choice` support means Claude Code may occasionally pick the wrong tool on its first attempt, but it self-corrects.
---
## Switching back to Claude API for a single task
```bash
# One-off cloud task without changing settings.json
ANTHROPIC_BASE_URL="" ANTHROPIC_API_KEY="sk-ant-..." claude --model claude-sonnet-4-6 -p "Refactor auth module"
Or switch mid-session with the slash command:
/model claude-sonnet-4-6
Note: /model changes the model but not ANTHROPIC_BASE_URL. If the URL is still pointing at Ollama, the model name is passed to Ollama, which will error if it doesn’t have that model. Clear the URL in the env before switching to a cloud model.
Performance on common Claude Code tasks
Approximate response times with gemma4:26b (18GB MoE, 64K context):
| Task | M2 Pro 16GB | M3 Max 36GB | RTX 4090 24GB |
|---|---|---|---|
| Explain a 200-line file | ~6s | ~3s | ~2.5s |
| Write a unit test | ~8s | ~4s | ~3s |
| 50-line code generation | ~10s | ~5s | ~4s |
| Multi-file refactor plan | ~20s | ~10s | ~8s |
gemma4:26b activates only 4B parameters during inference despite its 26B total size — which is why it’s faster than you’d expect. The gemma4:31b Dense model is 2–3× slower and noticeably better on complex reasoning tasks.
Troubleshooting
Error: connect ECONNREFUSED 127.0.0.1:11434
Ollama isn’t running. Start it:
ollama serve
On macOS, check the menubar — Ollama runs as a menubar app after install.
model "gemma4-claude" not found
The Modelfile build didn’t complete. Rebuild:
ollama create gemma4-claude -f ~/.ollama/Modelfiles/gemma4-claude
ollama list # confirm it appears
Claude Code truncates mid-response or fails with context errors
The context window isn’t large enough. Edit your Modelfile and increase num_ctx:
PARAMETER num_ctx 131072 # 128K
Then rebuild: ollama create gemma4-claude -f ~/.ollama/Modelfiles/gemma4-claude
Requests still going to Anthropic API despite settings.json
Check the settings.json is in the project root (same directory as CLAUDE.md and where you run claude), and validate the JSON:
cat .claude/settings.json | python3 -m json.tool
Tool calls not working / Claude Code can’t edit files
Streaming tool calls require Ollama v0.14.3+. Check your version and update if needed:
ollama --version
brew upgrade ollama # macOS
Summary
- Ollama’s Anthropic API at
http://localhost:11434(not/v1) is what Claude Code connects to - Three env vars:
ANTHROPIC_BASE_URL,ANTHROPIC_AUTH_TOKEN=ollama,ANTHROPIC_API_KEY="" - Always build a Modelfile with
num_ctx 65536— default context is too small for Claude Code gemma4:26bis the practical choice: 18GB, 256K context, fast MoE inference- Tool calling works;
tool_choiceand prompt caching don’t — expect occasional first-attempt wrong tool, self-corrects - Scope config to
.claude/settings.jsonper-project rather than global env vars
FAQ
Does this actually save money compared to the Anthropic API?
For light usage, Claude API costs are low enough that local setup overhead may not be worth it. Where local makes clear sense: privacy-sensitive codebases where data shouldn’t leave your network, sustained heavy usage (thousands of Claude Code invocations per day), or offline work with no connectivity. If you’re doing a few dozen claude -p calls per day, the API cost is negligible.
Can I use gemma4:31b instead of gemma4:26b?
Yes. Create a separate Modelfile pointing to gemma4:31b and build it as gemma4-claude-31b. The 31B Dense model gives noticeably better output on complex multi-step reasoning, at 2–3× the latency and 20GB vs 18GB memory. Worth it on a machine with 32GB+ VRAM; marginal on 24GB.
Will this work on Apple Silicon?
Yes. Ollama uses Metal automatically on Apple Silicon — no configuration needed. The gemma4:26b model runs well on M2 Pro (16GB) at around 20–25 tokens/second. M3 Max (36–48GB) handles gemma4:31b comfortably.
What to read next
- How to Install Gemma 4 Locally with Ollama — hardware requirements, all model sizes, quantization options
- Claude Code Cheat Sheet — full slash command and flag reference
- MCP Explained: How Claude Connects to Any Tool — extend Claude Code further with MCP servers