Zero API costs. Your data never leaves your machine. And since Ollama 0.17 shipped in February 2026, the setup takes one command. That's the pitch for running OpenClaw with a local LLM, and it's mostly true.
Mostly. There are two gotchas that will waste your afternoon if nobody warns you first.
One Command to Start
Ollama 0.17 introduced native OpenClaw support. If you've got Ollama installed, this is it:
ollama launch openclaw --model qwen3-coder:32b
That pulls the model, configures the connection, and starts OpenClaw pointed at your local Ollama instance. No API key. No account. No cloud.
For users who want more control, the Ollama integration docs cover manual configuration with JSON config files and Docker setups.
Pick the Right Model (This Actually Matters)
Not every model in the Ollama library works with OpenClaw. The reason? Tool calling. OpenClaw agents don't just chat. They read files, run shell commands, and call APIs. Models without reliable tool calling support turn your agent into a chatbot that can't do anything.
Here's what actually works, based on community benchmarks:
| VRAM | Model | What to Expect |
|---|---|---|
| 8GB | qwen3:8b | Barely usable. Simple tasks only. |
| 16GB | qwen2.5-coder:14b | Decent for routine work |
| 24GB | qwen3-coder:32b | The sweet spot. Recommended. |
| 48GB+ | llama3.3:70b | Near cloud quality |
| Mac 32GB unified | qwen3-coder:32b | Excellent on Apple Silicon |
I'd skip the 8B models unless you're just testing the waters. Start at 14B minimum, and if you can run 32B, do that.
Gotcha #1: The Context Window Trap
This one catches almost everyone.
Ollama defaults to a 4096 token context window. OpenClaw needs at minimum 16,000 tokens, and the official docs recommend 64,000. Without fixing this, your agent silently loses context. It looks like it's working, responds to your messages, but has no memory of what happened 10 minutes ago.
The fix: create a Modelfile.
FROM qwen3-coder:32b
PARAMETER num_ctx 32768
Then build it:
ollama create qwen3-coder-32k -f Modelfile
Or use the native Ollama API ("api": "ollama") instead of the OpenAI-compatible endpoint. The native API handles context settings correctly. The OpenAI-compatible endpoint at /v1 has documented issues with context truncation.
Gotcha #2: Tool Calls Disappearing
OpenClaw sends stream: true to all models by default. Ollama's streaming implementation doesn't properly return tool call chunks. So the model decides to read a file or run a command, but that decision vanishes. You get a text response and nothing happens.
The latest OpenClaw versions auto-detect Ollama and disable streaming for tool calls. If you're on an older version, add this to your model config:
"params": { "streaming": false }
Problem gone. GitHub issue #5769 has the full technical details if you're curious why streaming and tool calling don't play nice with Ollama.
Performance: What to Honestly Expect
An RTX 4090 running a 32B model generates around 55 tokens per second. A Mac M3 Max with 32GB unified memory hits roughly 35 tokens per second, according to independent benchmarks by Till Freitag. That's fast enough for most agent tasks, but noticeably slower than cloud models for long, complex operations.
Hardware break-even versus cloud API costs? Somewhere between 7 and 15 months depending on your setup and usage. If you're running agents heavily, local pays for itself. If you use it a few times a week, cloud APIs through free LLM tiers or a managed host are probably cheaper.
The Hybrid Approach
The smartest setup I've seen? Local for the 80% of tasks that are routine, cloud for the 20% that need serious reasoning. OpenClaw's config supports model routing:
{
"model": {
"primary": "ollama/qwen3-coder:32b",
"fallbacks": ["anthropic/claude-sonnet-4-20250514"]
}
}
Analysis by LaoZhang AI shows this hybrid approach cuts costs by 55-67% compared to running everything through a cloud provider. That's real money if you're a heavy user.
For more ways to reduce your API spend, check our token cost optimization guide.
Or Just Skip the Setup
All of the above assumes you want to manage models, configure context windows, and debug tool calling issues yourself. Some people enjoy that. Others would rather their AI agent just work.
That's what ClawHosters does. We handle the hosting, model selection, and configuration. Plans start at $19/month. No hardware required, no debugging context windows, and you can always connect your own Ollama instance later if you want the hybrid approach. See the self-hosted vs managed comparison if you're weighing the options.