How do I enable OpenClaw voice mode on Discord?

Make sure native commands are enabled in your config (`commands.native: auto`), then use `/vc join` in any Discord voice channel. The agent joins, listens via VAD, and responds with TTS. On ClawHosters, the Voice Add-on handles provider configuration automatically.

Does OpenClaw voice mode work on Telegram?

Yes. Telegram supports two-way voice out of the box. Users send voice message attachments, OpenClaw transcribes them, and replies as round voice-note bubbles in Opus format. Check the Telegram setup docs for details.

Can I use OpenClaw voice mode completely for free?

You can. Run local Whisper for STT and Edge TTS or Kokoro for TTS. No API keys, no per-minute costs. The trade-off is higher latency and the need for hardware that can run the Whisper model. For most people, the managed Voice Add-on at EUR 2/month is simpler.

What is the difference between Talk Mode and Discord voice?

Discord voice is for group settings. The agent joins a shared channel server-side. Talk Mode is for personal, one-on-one conversation. It requires a local "node" device (phone, laptop) with a microphone and speaker, while the gateway runs on the server.

Which STT provider should I pick for real-time conversation?

Deepgram streaming. It costs slightly more than Whisper ($0.0077 vs $0.006 per minute) but delivers roughly one second lower latency. In a live conversation, that gap is noticeable. For async voice messages, standard Whisper works fine. *Last updated: February 2026*

OpenClaw Voice Mode: How to Add Speech to AI Agents

Picture this. Your Discord gaming server has 15 people in a voice channel, mid-raid, and someone asks "what's the cooldown on that ability?" Nobody wants to alt-tab and type. Your OpenClaw agent joins the channel, listens, and answers out loud. That's what openclaw voice mode does.

Since v2026.2.21, OpenClaw ships with native Discord voice channel support. But Discord is only part of the story. Voice works across Telegram, WhatsApp, and even as a standalone assistant on your phone.

How the Voice Loop Works

OpenClaw voice mode runs a five-step loop, and it happens fast enough that conversations feel natural.

Step 1: Voice Activity Detection (VAD) picks up when someone is speaking. It filters out background noise so your agent isn't trying to transcribe your mechanical keyboard.

Step 2: The audio goes to a Speech-to-Text provider. OpenAI Whisper, Deepgram streaming, or a local Whisper model running on your own hardware.

Step 3: The transcript hits your agent's LLM. Same brain, different input method.

Step 4: The LLM's response gets converted to audio by a Text-to-Speech provider. ElevenLabs, OpenAI TTS, Edge TTS (free), or Kokoro (free, local).

Step 5: Barge-in. If a user starts talking while the agent is still speaking, it stops immediately. This is what separates a conversational agent from a robot reading a paragraph at you.

The whole cycle takes roughly two to four seconds depending on your provider choices.

Discord Voice Channels

The /vc command landed in v2026.2.21. Your agent can join, leave, and report status in any voice channel.

The discord-voice skill documentation recommends Deepgram streaming for roughly one second lower latency compared to batch transcription. In a real conversation, that one second is the difference between "responsive" and "awkward."

One thing to watch: native commands need to be enabled in your config (commands.native: auto or enable). If /vc isn't showing up, that's probably why.

And don't set messages.tts.auto to always. It sounds like a good idea until your agent tries to read a 47-line code block out loud. Start with inbound, which means the agent only speaks when the user sent voice first.

Talk Mode vs Discord Voice

These serve different needs.

Discord voice is for communities. The agent joins a shared channel and participates alongside everyone else. It runs entirely server-side.

Talk Mode is for personal use. You run a "node" on your phone or laptop (the device with the mic and speaker), while the gateway stays on the server. It's a private, bidirectional conversation. Think voice assistant, not group chat.

If you want your agent answering questions in your Discord server, use the Discord voice skill. If you want to talk to your agent hands-free while cooking, Talk Mode on your phone is what you're after.

STT Providers: What They Cost

Provider	Cost	Latency	Notes
OpenAI Whisper	$0.006/min	Moderate	Flat rate, no volume discounts
Deepgram Streaming	$0.0077/min	Low (~1s faster)	$200 free credit on signup
Local Whisper	Free	Higher (2-5x cloud)	Needs capable hardware, fully offline

Deepgram costs slightly more per minute but the latency difference matters for conversation. For batch processing or async voice messages on Telegram, Whisper is probably fine.

TTS Providers: What They Cost

Provider	Cost	Quality	Notes
ElevenLabs	~$0.24/1K chars (Pro overage)	High	Most natural voices, 1M chars included at $99/mo
OpenAI TTS-1	$15/1M chars	Good	Six voice options, reliable
Edge TTS	Free	Decent	Microsoft neural voices, no API key needed
Kokoro	Free	Good	Local only, no network dependency

A community build called Jupiter Voice runs local Whisper plus Kokoro for a completely offline voice pipeline. Zero API costs, zero network dependency. Good option if privacy is a priority.

The ClawHosters Voice Add-on

If managing API keys and provider configs sounds like more work than you want, the ClawHosters Voice Add-on bundles everything into a single subscription.

Plan	Monthly Cost	What You Get
Starter	EUR 2/mo	Basic voice minutes
Standard	EUR 8/mo	More minutes for active use
Pro	EUR 25/mo	High-volume voice processing

No separate Deepgram or ElevenLabs accounts. No API keys to configure. Usage is tracked in processing minutes and covers both STT and TTS. If you're already on ClawHosters, it's the fastest way to get voice running. You can start a free trial and add voice later.

For users who want to understand token costs more broadly, we covered that in a separate post.

OpenClaw Voice Mode: How to Add Speech to Your AI Agent

How the Voice Loop Works

Discord Voice Channels

Talk Mode vs Discord Voice

STT Providers: What They Cost

TTS Providers: What They Cost

The ClawHosters Voice Add-on

Frequently Asked Questions

Sources

Notion MCP Server: Connect Your OpenClaw AI Agent to Notion

How to Build, Test, and Publish Custom OpenClaw Skills

How to Install OpenClaw on Windows: 3 Methods Compared (2026)

ClawHosters Demo

How the Voice Loop Works

Discord Voice Channels

Talk Mode vs Discord Voice

STT Providers: What They Cost

TTS Providers: What They Cost

The ClawHosters Voice Add-on

Frequently Asked Questions

Sources

Notion MCP Server: Connect Your OpenClaw AI Agent to Notion

How to Build, Test, and Publish Custom OpenClaw Skills

How to Install OpenClaw on Windows: 3 Methods Compared (2026)

Cookie Notice

ClawHosters Demo