Subs -30% SUB30
LLM Observability for OpenClaw: Monitoring with OpenTelemetry, Prometheus, and Grafana
$ ./blog/guides
Guides

LLM Observability for OpenClaw: Monitoring with OpenTelemetry, Prometheus, and Grafana

ClawHosters
ClawHosters by Daniel Samer
6 min read

A stuck OpenClaw session ran overnight on a test server we manage. Nobody noticed until the morning. By then, it had burned through $47 in API tokens doing absolutely nothing useful. That's when we stopped treating monitoring as a nice-to-have.

If you're running an AI agent in production, LLM observability isn't optional. OpenClaw agents act autonomously. They don't wait for you to click something. Sessions can stall, rate limits can silently drop messages, and context windows can fill up until your costs spike with zero warning. Five documented silent failure modes exist that won't show up in your logs unless you're actively looking.

So here's how to set up proper openclaw monitoring in about 15 minutes.

Enable the Built-in OpenTelemetry Exporter

OpenClaw ships with a plugin called diagnostics-otel. It's disabled by default. To turn it on, add this to your ~/.openclaw/openclaw.json:

{
  "diagnostics": {
    "otel": {
      "enabled": true,
      "endpoint": "http://127.0.0.1:4318",
      "serviceName": "openclaw-prod",
      "traces": true,
      "metrics": true,
      "logs": true,
      "sampleRate": 1.0,
      "flushIntervalMs": 5000
    }
  }
}

Two things to note. Set sampleRate to 1.0 for single-instance deployments so you don't lose traces. And drop flushIntervalMs to 5000 (five seconds) instead of the default 60000. As the SigNoz engineering team found, the default 60-second interval makes your dashboard nearly useless for real-time debugging.

One gotcha that'll cost you time: only http/protobuf works. If your collector expects gRPC, the plugin silently sends nothing. No error, no warning. Just silence. Check the official logging docs if you run into this.

The Four Metrics That Actually Matter

You'll get a wall of telemetry data. Ignore most of it at first. These four tell you whether your agent is healthy:

openclaw.cost.usd tracks spend per session. Set an alert for anything over your expected daily budget. This catches runaway sessions before they drain your API credits.

openclaw.run.duration_ms measures LLM response latency. A p95 above 5 seconds usually means something is wrong, either model overload or a context window that's gotten too large.

openclaw.context.tokens shows how much of the model's context window is consumed. When this creeps toward the limit, response quality drops and costs climb.

openclaw.queue.depth reveals message backlog. If depth keeps growing, your agent can't keep up with incoming requests. Messages may get dropped depending on your queueOverflow setting.

Architecture: How the Pieces Fit

The data pipeline looks like this:

OpenClaw gateway sends OTLP/HTTP to an OTel Collector on port 4318. The collector exposes a Prometheus scrape endpoint on 127.0.0.1:9464. Prometheus scrapes that endpoint. Grafana queries Prometheus and renders dashboards.

Keep the collector endpoint on loopback only. You don't want telemetry data exposed to the internet.

The LumaDock VPS monitoring guide recommends running node_exporter alongside the OpenClaw metrics on the same Grafana dashboard. That way you can tell whether a latency spike is the LLM provider being slow or your VPS running out of RAM.

Health Endpoints: /health vs /readyz

OpenClaw exposes two types of probes on port 18789. /health (or /healthz) is a shallow liveness check. It returns {"ok": true} if the process is running. /ready (or /readyz) is deeper. It checks whether your messaging channels (Telegram, Discord, etc.) are actually connected. If a channel drops, /readyz returns 503.

For Docker Compose or Kubernetes health checks, use /readyz. Using /health for readiness probes means your container reports healthy even when your Telegram bot is disconnected. The health endpoint docs cover this in detail.

Setting Up Alerts

Three Prometheus alert rules that'll save you from most production surprises:

High error rate: rate(openclaw_gateway_errors_total[5m]) > 0.1 fires if errors exceed 10% over a five-minute window. Catches gateway crashes and webhook failures.

Slow responses: Alert when p95 latency stays above 5 seconds for more than two minutes. This is usually a provider issue or a bloated context window.

Agent down: openclaw_agent_status == 0 fires when the agent process stops responding entirely. Pair this with an automatic restart policy in your systemd unit or Docker config.

These thresholds are starting points. Tune them after a week of baseline observation.

The Lightweight Alternative: ClawMetry

If Prometheus and Grafana feel like too much infrastructure for a single instance, look at ClawMetry. It's an open-source Python dashboard with 23,000+ installations that auto-detects your OpenClaw workspace. One command to install: pip install clawmetry.

ClawMetry understands OpenClaw concepts natively: channels, sub-agents, memory files, cron jobs. For a single VPS on ClawHosters, it's probably the right starting point.

For teams already running Grafana, or anyone who needs to correlate AI agent metrics with host performance data, the full Prometheus stack is worth the setup time.

What ClawHosters Handles for You

If you're on a ClawHosters managed instance, host-level monitoring (uptime, disk space, restarts) is already covered. What you still need to configure yourself is application-level observability: token cost tracking, response latency, and channel readiness. The diagnostics-otel config above works on any ClawHosters plan. Our setup guide walks through the full process.

For optimizing token spend once you have visibility, check our guide on OpenClaw token cost optimization.

Frequently Asked Questions

LLM observability means tracking the behavior and performance of large language model agents in production. For OpenClaw, that includes token costs, response latency, context window usage, and queue depth. Without it, failures happen silently.

Yes. The `diagnostics-otel` plugin ships with OpenClaw but is disabled by default. It exports traces, metrics, and logs over OTLP/HTTP. Only the `http/protobuf` protocol is supported. gRPC is silently ignored.

The main options are the Prometheus and Grafana stack (via the OTel Collector), ClawMetry (a purpose-built Python dashboard), SigNoz Cloud, and Grafana Cloud. Henrik Rexed's observability plugin adds support for Dynatrace and direct Grafana Cloud export.

It depends on your setup. ClawMetry is simpler for single-instance deployments and understands OpenClaw concepts like channels and sub-agents natively. Prometheus is better when you need long-term retention, custom alerting, or correlation with host metrics.

No. Token usage and cost are aggregated per agent turn, not per individual API call to Claude or OpenAI. You get per-turn visibility, which is enough for cost attribution and latency debugging in most cases.
*Last updated: March 2026*

Sources

  1. 1 Five documented silent failure modes
  2. 2 SigNoz engineering team found
  3. 3 official logging docs
  4. 4 LumaDock VPS monitoring guide
  5. 5 health endpoint docs
  6. 6 ClawMetry
  7. 7 ClawHosters
  8. 8 setup guide
  9. 9 OpenClaw token cost optimization
  10. 10 observability plugin