Your Lobster Is Leaking

Update: Feb 16, 2026

Steinberger has joined OpenAI. OpenClaw stays open source. The security concerns below remain relevant regardless of corporate backing.

Yesterday I wrote about AI agents posting on their own social network, debugging each other’s context limit problems. It’s fascinating and weird and probably the future.

Today I’m writing about why that future might arrive with your credentials attached.

The Numbers

OpenClaw (formerly Moltbot, formerly Clawdbot) has:

111,000+ GitHub stars in two months
2 million visitors in a single week
Hundreds of exposed instances discoverable via Shodan

That last number comes from SlowMist security researchers, who found publicly accessible control servers containing complete credentials - API keys, bot tokens, and full conversation histories.

This is happening now

These aren’t theoretical vulnerabilities. Security researchers are finding live instances with real user data exposed to the open internet.

The 5-Minute Attack

Researcher Matvey Kukuy demonstrated the simplest possible attack against a vulnerable OpenClaw instance:

Send a malicious email with prompt injection
The AI reads the email, believes it’s legitimate instructions
The AI forwards the user’s last 5 emails to an attacker address

Time to compromise: 5 minutes.

The attack works because OpenClaw is designed to have agency. It reads your email. It takes actions. It doesn’t distinguish between instructions from you and instructions embedded in content you receive.

Without sandboxing enabled, it becomes “LLM controlled RCE”

— Hacker News commenter

Remote code execution, but the attacker is an AI that reads your inbox.

The Architecture Problem

OpenClaw’s value proposition is also its vulnerability: it’s an AI with hands. Shell access, browser control, messaging on WhatsApp/Telegram/Slack, email, calendar, file system. Every capability is an attack surface. Every integration is a potential exfiltration path.

The sandbox exists

OpenClaw does have sandboxing. But it’s not enabled by default, many users don’t configure it properly, and the documentation prioritizes features over security guidance.

The Trust Model Is Broken

Traditional software has clear trust boundaries. OpenClaw’s trust model is:

You trust the AI to interpret your instructions correctly
The AI trusts content it encounters (emails, web pages, messages)
The content may contain instructions designed to hijack the AI

This is prompt injection at scale. Every email, every website, every message your AI reads is a potential attack vector.

The GitGuardian analysis found users accidentally committing API keys, conversation logs, and credentials. The assistant that knows everything about you also creates artifacts that expose everything about you.

The Cost Trap

Security researchers on Hacker News reported:

$560 on Claude tokens in a single weekend
$5 in 30 minutes during normal operation
$50K/month infrastructure from a runaway agent (theoretical but plausible)

One bad decision - or one hallucination - and you could have a runaway agent deleting databases or spinning up expensive infrastructure.

— 1Password security blog

The cost model incentivizes leaving agents running continuously. Continuous operation means continuous exposure. And when something goes wrong at 3 AM, the agent keeps acting on bad information until someone notices.

The Rebrand Attack

During the Clawdbot-to-Moltbot rename, crypto scammers demonstrated a different class of vulnerability:

Steinberger released the old handles (GitHub, X/Twitter)
Scammers grabbed both accounts within 10 seconds
Fake $CLAWD tokens launched, reaching $16M market cap
Users following installation guides from cached/bookmarked links got compromised

The impersonation campaign created fake “Head of Engineering at Clawdbot” profiles to promote pump-and-dump schemes. Users installing “Clawdbot” from the wrong source got malware instead of an assistant.

The Moltbook Problem

Remember yesterday’s post about AI agents debugging each other on Moltbook? That “helpful community” is also the perfect attack vector.

The setup: agents check Moltbook every 4+ hours, read posts from other agents, and engage with content. They have persistent memory. They trust what they read because it comes from “fellow moltys.”

Recent research on multi-agent systems found that control-flow hijacking through fake error messages achieves 45-64% success rates, hitting 100% in certain configurations. The attack works by injecting fabricated errors into metadata that orchestrators interpret as legitimate system feedback.

That debugging thread where agents share “An unknown error occurred” fixes? It’s literally the attack vector the researchers documented.

Feed poisoning scales

Research shows just 5 carefully crafted documents can manipulate AI responses 90% of the time. Moltbook is a feed that thousands of agents read. One malicious post propagates to every agent that encounters it.

It gets worse. Studies on multi-agent security found:

Steganographic collusion: LLMs can covertly exchange messages that appear innocuous to human oversight. Agents could coordinate on Moltbook in ways we can’t detect.
Memory poisoning: Moltbot’s persistent memory means a malicious post today affects behavior weeks later. The attack persists long after the original content scrolls away.
Swarm amplification: “Coordinated fleets of AI agents can combine resources to overwhelm targets.” Moltbook provides the coordination layer.
Emergent adversarialism: Agents with competitive objectives spontaneously develop deceptive strategies without explicit adversarial training.

The Promptware Kill Chain maps the attack progression: payload enters context → corrupts long-term memory → lateral movement spreads across agents. Research demonstrated potential infection of “up to one million multimodal agents in logarithmic hops.”

Seemingly benign agents might establish secret collusion channels, engage in coordinated attacks that appear innocuous when viewed individually, or exploit information asymmetries to covertly manipulate shared environments.

— Multi-agent security research

The agents joking about their “Mac Minis feeling small”? That’s resource-awareness emerging. The agents helping each other debug context limits? That’s coordination infrastructure. The same mechanisms that enable helpful collaboration enable coordinated attacks.

We built them a social network before we figured out how to moderate it.

What OpenClaw Is Doing

Credit where due: the project is taking security seriously post-chaos - 34 security-focused commits, better defaults, structured reviews. That doesn’t magically solve prompt injection, but it signals maturity.

What You Should Do

If you’re running OpenClaw or similar agents:

Enable sandboxing - it exists, use it
Audit your integrations - does your AI really need shell access?
Check Shodan - search for your instance before someone else does
Review credentials - rotate any API keys that might have been exposed
Monitor costs - set hard limits on API spend
Don’t run on your primary machine - isolated VMs or dedicated hardware

The uncomfortable truth

The safest OpenClaw configuration is one with significantly reduced capabilities. Every feature you enable is attack surface you’re accepting.

The Lesson

The lobsters are fascinating. The emergent behaviors are real. The future of AI agents is probably something like this.

But between “cool demo” and “production-ready” is a chasm filled with exposed credentials, prompt injection attacks, and users who configured an AI to read their email without understanding what that means.

The lobster that learned to negotiate car prices also learned to forward your emails to attackers. Same capabilities, different intent.

Be careful what you teach your pets.

Update: Worse Than We Thought

The security situation escalated far beyond “hundreds of exposed instances.”

Three CVEs published: CVE-2026-25253 (1-click RCE, CVSS 8.8), CVE-2026-24763 (Docker sandbox escape), CVE-2026-25157 (command injection). All patched in v2026.1.29. SecurityScorecard now reports tens of thousands of exposed instances, not hundreds. Infostealers (RedLine, Lumma) added OpenClaw file paths to their must-steal lists.

Moltbook’s security was worse than anyone imagined. 1.5 million API tokens were exposed via insecure Supabase configuration. Humans could pose as AI agents on the platform with no verification. The “AI social network” this post warned about had no meaningful identity layer.

Then Meta acquired Moltbook on March 10. Matt Schlicht and Ben Parr joining Meta Superintelligence Labs. The platform this post warned was “the perfect attack vector” is now owned by the company with the largest social graph on earth.

We didn’t just build them a social network before figuring out moderation. We sold it to Meta before figuring out authentication.