Can I run OpenClaw with Ollama completely for free?

Yes. Ollama is free, OpenClaw is open source, and local models have no per-token charges. Your only cost is the hardware you already own. The `ollama launch openclaw` command handles the full setup.

Which Ollama model is best for OpenClaw?

qwen3-coder:32b if you have 24GB+ VRAM or 32GB unified memory on a Mac. It handles tool calling reliably and generates code at roughly 92% on the HumanEval benchmark. For tighter hardware, qwen2.5-coder:14b is the minimum I'd recommend.

Why does my OpenClaw agent lose context when using Ollama?

Ollama defaults to a 4096 token context window. OpenClaw needs 16,000+ tokens minimum. Create a custom Modelfile with `PARAMETER num_ctx 32768` or switch to the native Ollama API (`"api": "ollama"`) instead of the OpenAI-compatible endpoint.

Is a local LLM as good as Claude or GPT for agent tasks?

For routine tasks like file operations, message handling, and simple coding, a 32B local model is surprisingly capable. For complex reasoning, multi-step debugging, or architectural decisions, cloud models still win. The hybrid approach gives you both.

How much VRAM do I need to run OpenClaw with Ollama?

8GB is the bare minimum (with an 8B model), but you won't get reliable tool calling. 24GB is where it gets genuinely useful with 32B models. Mac users with 32GB unified memory are in an excellent spot since Apple Silicon handles these models efficiently. *Last updated: March 2026*

OpenClaw Ollama Setup: Free Local LLM Guide | ClawHosters

Zero API costs. Your data never leaves your machine. And since Ollama 0.17 shipped in February 2026, the setup takes one command. That's the pitch for running OpenClaw with a local LLM, and it's mostly true.

Mostly. There are two gotchas that will waste your afternoon if nobody warns you first.

One Command to Start

Ollama 0.17 introduced native OpenClaw support. If you've got Ollama installed, this is it:

ollama launch openclaw --model qwen3-coder:32b

That pulls the model, configures the connection, and starts OpenClaw pointed at your local Ollama instance. No API key. No account. No cloud.

For users who want more control, the Ollama integration docs cover manual configuration with JSON config files and Docker setups.

Pick the Right Model (This Actually Matters)

Not every model in the Ollama library works with OpenClaw. The reason? Tool calling. OpenClaw agents don't just chat. They read files, run shell commands, and call APIs. Models without reliable tool calling support turn your agent into a chatbot that can't do anything.

Here's what actually works, based on community benchmarks:

VRAM	Model	What to Expect
8GB	qwen3:8b	Barely usable. Simple tasks only.
16GB	qwen2.5-coder:14b	Decent for routine work
24GB	qwen3-coder:32b	The sweet spot. Recommended.
48GB+	llama3.3:70b	Near cloud quality
Mac 32GB unified	qwen3-coder:32b	Excellent on Apple Silicon

I'd skip the 8B models unless you're just testing the waters. Start at 14B minimum, and if you can run 32B, do that.

Gotcha #1: The Context Window Trap

This one catches almost everyone.

Ollama defaults to a 4096 token context window. OpenClaw needs at minimum 16,000 tokens, and the official docs recommend 64,000. Without fixing this, your agent silently loses context. It looks like it's working, responds to your messages, but has no memory of what happened 10 minutes ago.

The fix: create a Modelfile.

FROM qwen3-coder:32b
PARAMETER num_ctx 32768

Then build it:

ollama create qwen3-coder-32k -f Modelfile

Or use the native Ollama API ("api": "ollama") instead of the OpenAI-compatible endpoint. The native API handles context settings correctly. The OpenAI-compatible endpoint at /v1 has documented issues with context truncation.

Gotcha #2: Tool Calls Disappearing

OpenClaw sends stream: true to all models by default. Ollama's streaming implementation doesn't properly return tool call chunks. So the model decides to read a file or run a command, but that decision vanishes. You get a text response and nothing happens.

The latest OpenClaw versions auto-detect Ollama and disable streaming for tool calls. If you're on an older version, add this to your model config:

"params": { "streaming": false }

Problem gone. GitHub issue #5769 has the full technical details if you're curious why streaming and tool calling don't play nice with Ollama.

Performance: What to Honestly Expect

An RTX 4090 running a 32B model generates around 55 tokens per second. A Mac M3 Max with 32GB unified memory hits roughly 35 tokens per second, according to independent benchmarks by Till Freitag. That's fast enough for most agent tasks, but noticeably slower than cloud models for long, complex operations.

Hardware break-even versus cloud API costs? Somewhere between 7 and 15 months depending on your setup and usage. If you're running agents heavily, local pays for itself. If you use it a few times a week, cloud APIs through free LLM tiers or a managed host are probably cheaper.

The Hybrid Approach

The smartest setup I've seen? Local for the 80% of tasks that are routine, cloud for the 20% that need serious reasoning. OpenClaw's config supports model routing:

{
  "model": {
    "primary": "ollama/qwen3-coder:32b",
    "fallbacks": ["anthropic/claude-sonnet-4-20250514"]
  }
}

Analysis by LaoZhang AI shows this hybrid approach cuts costs by 55-67% compared to running everything through a cloud provider. That's real money if you're a heavy user.

For more ways to reduce your API spend, check our token cost optimization guide.

Or Just Skip the Setup

All of the above assumes you want to manage models, configure context windows, and debug tool calling issues yourself. Some people enjoy that. Others would rather their AI agent just work.

That's what ClawHosters does. We handle the hosting, model selection, and configuration. Plans start at $19/month. No hardware required, no debugging context windows, and you can always connect your own Ollama instance later if you want the hybrid approach. See the self-hosted vs managed comparison if you're weighing the options.

OpenClaw + Ollama: How to Run Your AI Agent With Free Local LLMs

One Command to Start

Pick the Right Model (This Actually Matters)

Gotcha #1: The Context Window Trap

Gotcha #2: Tool Calls Disappearing

Performance: What to Honestly Expect

The Hybrid Approach

Or Just Skip the Setup

Frequently Asked Questions

Sources

OpenClaw SOUL.md: The File That Controls Everything Your Agent Does

How to Build a Slack Chatbot with OpenClaw

OpenClaw Backup Guide: Protect Your Agent Before You Lose It

ClawHosters Demo

One Command to Start

Pick the Right Model (This Actually Matters)

Gotcha #1: The Context Window Trap

Gotcha #2: Tool Calls Disappearing

Performance: What to Honestly Expect

The Hybrid Approach

Or Just Skip the Setup

Frequently Asked Questions

Sources

OpenClaw SOUL.md: The File That Controls Everything Your Agent Does

How to Build a Slack Chatbot with OpenClaw

OpenClaw Backup Guide: Protect Your Agent Before You Lose It

Cookie Notice

ClawHosters Demo