Run Gemma 4 Locally with OpenClaw

You don’t need cloud credits to run a capable AI agent. You need Ollama, OpenClaw, and about 16GB of RAM. Gemma 4 26B A4B is a Mixture-of-Experts model — 26 billion total parameters, only 3.8B active per inference pass. That’s why it runs on a MacBook M2 without melting it. 256K context window. Apache 2.0 license. No usage limits. This guide gets it running end to end.

What is Gemma 4 26B A4B?

Gemma 4 is Google DeepMind’s latest open-weight model family. The “26B A4B” tag means 26 billion total parameters, with only ~4 billion active per forward pass. It’s a Mixture-of-Experts (MoE) architecture — each token gets routed through a subset of 128 experts, keeping inference fast and memory requirements low relative to a dense 26B model.

Spec	Value
Total parameters	26B
Active parameters	~3.8B (MoE)
Context window	256K tokens
Modality	Text + Vision
License	Apache 2.0
Ollama tag	`gemma4:26b`

The 31B dense model is slightly stronger but needs 32GB+ RAM. The 26B A4B hits 95% of that quality at half the memory cost. It’s the practical choice for most setups.

What hardware do I need?

Model	Minimum RAM	VRAM (GPU)	Best for
`gemma4:e2b`	4 GB	3.2 GB	Edge devices, Raspberry Pi
`gemma4:e4b`	8 GB	5 GB	Laptops, 8GB MacBook
`gemma4:26b`	16 GB	15.6 GB	Workstations, M2/M3 Mac
`gemma4:31b`	32 GB	17.4 GB	High-end workstations

16GB MacBook Pro M2 or later handles gemma4:26b fine via unified memory. On Windows or Linux with NVIDIA, you want at least an RTX 3090 (24GB) for comfortable inference at longer context lengths.

The Scenario: Your team’s legal department just told you that sending source code to any cloud AI provider violates the client NDA. You’ve been using Copilot for three months and now you can’t. Local Gemma 4 + OpenClaw means the code stays on your machine, and nobody from legal calls you again.

Step 1 — Install Ollama

Ollama manages local models. It handles downloads, quantization, and serves them over a local REST API on port 11434. Think of it as a local model server that speaks a clean HTTP API.

Install on macOS or Linux with one command:

curl -fsSL https://ollama.com/install.sh | sh

On macOS, there’s also a DMG at ollama.com/download if you prefer a menu bar app. On Windows, grab the .exe installer from the same page — it runs as a background service.

Confirm it’s working:

ollama --version

Ollama starts automatically and listens on http://127.0.0.1:11434 by default. You don’t need to start it manually.

Step 2 — Pull Gemma 4 26B

This downloads the 4-bit quantized model. It’s an 18GB download — happens once, cached locally after that.

ollama pull gemma4:26b

On a slower connection or a machine with less than 16GB RAM, pull the smaller variant instead:

# 9.6GB — runs on an 8GB MacBook Air
ollama pull gemma4:e4b

Confirm the model downloaded correctly:

ollama list

You’ll see gemma4:26b listed with its size and the date it was pulled.

The Scenario: You’re at a hotel on throttled Wi-Fi and the 18GB download keeps timing out. Pull gemma4:e4b — it’s 9.6GB, still 128K context, and you can swap to 26B when you’re back on a real connection. Nothing breaks. It’s just a different model tag.

Step 3 — Install OpenClaw

OpenClaw is a local AI agent. It runs on your machine, connects to messaging platforms like WhatsApp, Telegram, and Slack, and uses Ollama as its model backend when you’re running local-only.

Install on macOS or Linux:

curl -fsSL https://openclaw.ai/install.sh | bash

On Windows, open PowerShell and run:

iwr -useb https://openclaw.ai/install.ps1 | iex

Or install globally via npm if you prefer:

npm install -g openclaw

Check the install worked:

openclaw --version

Step 4 — Configure OpenClaw with Ollama

Run the onboarding wizard. It asks a few questions and writes your config file.

openclaw onboard

Two things matter here:

1. The Ollama base URL. Enter http://127.0.0.1:11434 — exactly that. Do not add /v1 at the end. If you use http://127.0.0.1:11434/v1, tool calling breaks completely. OpenClaw starts outputting raw JSON tool descriptions as plain text instead of executing them. It’s a silent failure that wastes an hour to debug.

2. Choose local-only mode. No cloud provider needed.

Then set the environment variable that tells OpenClaw to auto-discover your local Ollama models:

# Add to ~/.zshrc or ~/.bashrc so it persists
export OLLAMA_API_KEY="ollama-local"

# Apply immediately without restarting the terminal
source ~/.zshrc

On Windows, set OLLAMA_API_KEY=ollama-local via System Properties → Environment Variables.

Once that’s set, OpenClaw finds every model in ollama list automatically. No manual model registration needed.

Step 5 — Launch OpenClaw with Gemma 4

The one-liner that does everything — installs OpenClaw if it isn’t there yet, and starts it with Gemma 4 26B pinned as the backend:

ollama launch openclaw --model gemma4:26b

What each part does:

ollama launch — Ollama’s built-in app launcher. Handles install, config wiring, and startup in one step.
openclaw — the app to launch. Ollama fetches and installs it automatically if it’s not on your system.
--model gemma4:26b — pins Gemma 4 26B A4B as the backend explicitly. Skip this flag and OpenClaw prompts you to pick a model interactively. Pass it and you skip the prompt entirely.

If you’ve already pulled gemma4:26b, launch is near-instant. If you haven’t, Ollama pulls it first.

Want to try the smaller model on a laptop?

ollama launch openclaw --model gemma4:e4b

Same setup. 9.6GB instead of 18GB. Swap back to 26B anytime by re-running with the different tag.

If you’d rather start both processes separately for more control:

# Terminal 1 — keep Ollama serving
ollama serve

# Terminal 2 — OpenClaw picks up OLLAMA_API_KEY automatically
openclaw start

OpenClaw opens a local web interface at http://localhost:3000 where you can chat with Gemma 4.

Step 6 — Verify it’s working

Run the health check. It tests Ollama connectivity, model availability, tool calling, and gateway status:

openclaw doctor

Then check gateway status separately:

openclaw gateway status

If anything fails, the output tells you exactly what’s misconfigured — it’s not cryptic. Fix what it says and re-run.

For a direct sanity check, hit Ollama’s API yourself. This confirms the model is responding before you blame OpenClaw for anything:

curl http://localhost:11434/api/chat \
  -d '{
    "model": "gemma4:26b",
    "messages": [{"role": "user", "content": "What model are you?"}],
    "stream": false
  }'

A valid JSON response means Ollama is serving correctly.

Troubleshooting

Why is tool calling outputting raw JSON instead of running?

You used /v1 in the base URL. Go back, re-run openclaw onboard, and set the URL to http://127.0.0.1:11434 without the suffix.

Why does OpenClaw say “model not found”?

The model isn’t pulled yet. Run ollama pull gemma4:26b and wait for the full download before starting OpenClaw. Partial downloads don’t count.

Why does the model keep crashing or running out of memory?

Your machine doesn’t have enough RAM for the 26B variant. Switch to gemma4:e4b:

ollama pull gemma4:e4b
ollama launch openclaw --model gemma4:e4b

Why won’t Ollama start on Windows?

Check Task Manager — the Ollama process might have exited. Launch the Ollama app from the Start menu. It runs as a system tray service and needs to be running before OpenClaw can connect.

Why can’t OpenClaw reach Ollama?

Check if OLLAMA_HOST is set to a non-default address. By default Ollama binds to 127.0.0.1:11434. If you changed it, update your OpenClaw config to match.

What’s next?

Connect messaging platforms — WhatsApp, Telegram, Slack, Discord integrations are built into OpenClaw
Build your first agent — wire up tool calling so Gemma 4 can read files, run code, and search the web
Multi-agent workflows — chain agents for tasks that need multiple steps and model handoffs

The rest of the OpenClaw series covers each of these.