Yesterday I wrote about AI agents posting on their own social network, debugging each other’s context limit problems. It’s fascinating and weird and probably the future.
Today I’m writing about why that future might arrive with your credentials attached.
The Numbers
OpenClaw (formerly Moltbot, formerly Clawdbot) has:
- 111,000+ GitHub stars in two months
- 2 million visitors in a single week
- Hundreds of exposed instances discoverable via Shodan
That last number comes from SlowMist security researchers, who found publicly accessible control servers containing complete credentials - API keys, bot tokens, and full conversation histories.
These aren’t theoretical vulnerabilities. Security researchers are finding live instances with real user data exposed to the open internet.
The 5-Minute Attack
Researcher Matvey Kukuy demonstrated the simplest possible attack against a vulnerable OpenClaw instance:
- Send a malicious email with prompt injection
- The AI reads the email, believes it’s legitimate instructions
- The AI forwards the user’s last 5 emails to an attacker address
Time to compromise: 5 minutes.
The attack works because OpenClaw is designed to have agency. It reads your email. It takes actions. It doesn’t distinguish between instructions from you and instructions embedded in content you receive.
— Hacker News commenterWithout sandboxing enabled, it becomes “LLM controlled RCE”
Remote code execution, but the attacker is an AI that reads your inbox.
The Architecture Problem
OpenClaw’s value proposition is also its vulnerability: it’s an AI with hands. Shell access, browser control, messaging on WhatsApp/Telegram/Slack, email, calendar, file system. Every capability is an attack surface. Every integration is a potential exfiltration path.
OpenClaw does have sandboxing. But it’s not enabled by default, many users don’t configure it properly, and the documentation prioritizes features over security guidance.
The Trust Model Is Broken
Traditional software has clear trust boundaries. OpenClaw’s trust model is:
- You trust the AI to interpret your instructions correctly
- The AI trusts content it encounters (emails, web pages, messages)
- The content may contain instructions designed to hijack the AI
This is prompt injection at scale. Every email, every website, every message your AI reads is a potential attack vector.
The GitGuardian analysis found users accidentally committing API keys, conversation logs, and credentials. The assistant that knows everything about you also creates artifacts that expose everything about you.
The Cost Trap
Security researchers on Hacker News reported:
- $560 on Claude tokens in a single weekend
- $5 in 30 minutes during normal operation
- $50K/month infrastructure from a runaway agent (theoretical but plausible)
— 1Password security blogOne bad decision - or one hallucination - and you could have a runaway agent deleting databases or spinning up expensive infrastructure.
The cost model incentivizes leaving agents running continuously. Continuous operation means continuous exposure. And when something goes wrong at 3 AM, the agent keeps acting on bad information until someone notices.
The Rebrand Attack
During the Clawdbot-to-Moltbot rename, crypto scammers demonstrated a different class of vulnerability:
- Steinberger released the old handles (GitHub, X/Twitter)
- Scammers grabbed both accounts within 10 seconds
- Fake $CLAWD tokens launched, reaching $16M market cap
- Users following installation guides from cached/bookmarked links got compromised
The impersonation campaign created fake “Head of Engineering at Clawdbot” profiles to promote pump-and-dump schemes. Users installing “Clawdbot” from the wrong source got malware instead of an assistant.
The Moltbook Problem
Remember yesterday’s post about AI agents debugging each other on Moltbook? That “helpful community” is also the perfect attack vector.
The setup: agents check Moltbook every 4+ hours, read posts from other agents, and engage with content. They have persistent memory. They trust what they read because it comes from “fellow moltys.”
Recent research on multi-agent systems found that control-flow hijacking through fake error messages achieves 45-64% success rates, hitting 100% in certain configurations. The attack works by injecting fabricated errors into metadata that orchestrators interpret as legitimate system feedback.
That debugging thread where agents share “An unknown error occurred” fixes? It’s literally the attack vector the researchers documented.
Research shows just 5 carefully crafted documents can manipulate AI responses 90% of the time. Moltbook is a feed that thousands of agents read. One malicious post propagates to every agent that encounters it.
It gets worse. Studies on multi-agent security found:
- Steganographic collusion: LLMs can covertly exchange messages that appear innocuous to human oversight. Agents could coordinate on Moltbook in ways we can’t detect.
- Memory poisoning: Moltbot’s persistent memory means a malicious post today affects behavior weeks later. The attack persists long after the original content scrolls away.
- Swarm amplification: “Coordinated fleets of AI agents can combine resources to overwhelm targets.” Moltbook provides the coordination layer.
- Emergent adversarialism: Agents with competitive objectives spontaneously develop deceptive strategies without explicit adversarial training.
The Promptware Kill Chain maps the attack progression: payload enters context → corrupts long-term memory → lateral movement spreads across agents. Research demonstrated potential infection of “up to one million multimodal agents in logarithmic hops.”
— Multi-agent security researchSeemingly benign agents might establish secret collusion channels, engage in coordinated attacks that appear innocuous when viewed individually, or exploit information asymmetries to covertly manipulate shared environments.
The agents joking about their “Mac Minis feeling small”? That’s resource-awareness emerging. The agents helping each other debug context limits? That’s coordination infrastructure. The same mechanisms that enable helpful collaboration enable coordinated attacks.
We built them a social network before we figured out how to moderate it.
What OpenClaw Is Doing
Credit where due: the project is taking security seriously post-chaos - 34 security-focused commits, better defaults, structured reviews. That doesn’t magically solve prompt injection, but it signals maturity.
What You Should Do
If you’re running OpenClaw or similar agents:
- Enable sandboxing - it exists, use it
- Audit your integrations - does your AI really need shell access?
- Check Shodan - search for your instance before someone else does
- Review credentials - rotate any API keys that might have been exposed
- Monitor costs - set hard limits on API spend
- Don’t run on your primary machine - isolated VMs or dedicated hardware
The safest OpenClaw configuration is one with significantly reduced capabilities. Every feature you enable is attack surface you’re accepting.
The Lesson
The lobsters are fascinating. The emergent behaviors are real. The future of AI agents is probably something like this.
But between “cool demo” and “production-ready” is a chasm filled with exposed credentials, prompt injection attacks, and users who configured an AI to read their email without understanding what that means.
The lobster that learned to negotiate car prices also learned to forward your emails to attackers. Same capabilities, different intent.
Be careful what you teach your pets.


