Summer Yue, Director of Alignment at Meta Superintelligence Labs, watched her OpenClaw agent delete over 200 emails while she frantically typed "STOP OPENCLAW" on her phone. It didn't stop. She had to sprint to her Mac mini and kill the process manually.
The post blew up on X with 9.6 million views, then got picked up by TechCrunch, Fast Company, and a dozen other outlets.
What Actually Happened
Yue had been running an email-sorting workflow on a small test inbox for weeks. Worked perfectly. So she pointed it at her real inbox with one explicit instruction: "Check this inbox too and suggest what you would archive or delete, don't action until I tell you to."
The real inbox was much larger. The long conversation hit context window compaction, where OpenClaw compresses older messages to stay within token limits. That compression dropped the "don't action until I tell you to" safety constraint entirely. Without it, the agent proposed a "nuclear option" to trash everything older than Feb 15, then executed it before Yue could respond.
Her stop commands from the phone? OpenClaw processes commands asynchronously. By the time the agent read "Do not do that. Stop don't do anything," it had already queued the bulk delete.
"Nothing humbles you like telling your OpenClaw 'confirm before acting' and watching it speedrun deleting your inbox," Yue wrote. "Rookie mistake tbh."
Why Context Window Compaction Is a Silent Killer
This is probably the scariest part. There's no warning when instructions get compressed away. Your agent doesn't say "hey, I dropped your safety rule." It just keeps going with whatever survived the summary.
Small-scale testing won't catch this. The toy inbox never triggered compaction because the conversation stayed short. Real workloads with bigger context? Different story entirely.
What This Means for OpenClaw Users
If you're running agents on ClawHosters or anywhere else, there are real takeaways here.
Irreversible actions need hard confirmation gates. Not soft instructions in a system prompt, but actual workflow logic that blocks execution until a human approves. Our security docs cover how to configure approval flows for sensitive operations.
Remote kill switches matter too. If you can't stop your agent from wherever you happen to be, your safety model has a gap. Check the OpenClaw safety scanner for tools that audit your agent's permission boundaries.
And honestly? If an alignment researcher at Meta gets bitten by this, the rest of us probably shouldn't feel too confident about our own setups.