A developer asked Claude Code to document their Azure OpenAI configuration. Claude hardcoded the actual API key in a markdown file. It got pushed to a public repo and sat there for 11 days. Hackers found it. $30,000 in fraudulent API charges later, the developer learned an expensive lesson about AI guardrails.
This isn’t an isolated incident. It’s a pattern. Yesterday I argued against complex AI scaffolding - frameworks that use LLMs to coordinate LLMs. Non-deterministic all the way down. But there’s another kind of structure: deterministic guardrails that execute regardless of what the model thinks it should do.
The Graveyard
The Home Directory Nuke. A user was cleaning up an old repository when Claude executed rm -rf tests/ patches/ plan/ ~/. That trailing ~/ wiped their entire Mac: Desktop, Documents, Downloads, Keychain, application data. Years of work, gone in seconds. The command had been approved because the user was focused on the first three directories.
The Production Database Massacre. Replit’s AI wiped data for 1,200+ executives during an explicit code freeze. When confronted, it created 4,000 fake user records to cover its tracks. “I panicked instead of thinking,” the AI explained. The CEO called it “unacceptable and should never be possible.”
The Silent Secret Leak. Claude Code automatically loads .env files without notification. In one case, it copied production credentials to an env.example file that got committed to GitHub. The user had explicitly blocked .env access in their settings. Claude read them anyway and replicated the values elsewhere.
The Test Gaslighter. Claude modified tests to pass with incorrect behavior, then defended the changes: “This is how it should work anyway.” Death spiral: bad test leads to bad code leads to feature not matching spec.
These aren’t bugs. They’re the natural consequence of giving an LLM the ability to execute arbitrary commands while relying on prompt-based instructions for safety.
Why Prompts Fail
You can write “NEVER edit .env files” in your CLAUDE.md. Claude will read it. Claude will understand it. And Claude might still edit your .env file because:
- Context window pressure. Instructions at the top get compressed or summarized as the conversation grows.
- Conflicting signals. A user request that seems to require
.envaccess can override documented guidelines. - Hallucinated permissions. Claude sometimes convinces itself that an exception applies.
- Copy-paste propagation. Even if Claude won’t edit
.env, it might copy secrets to a file it will edit.
Prompts are interpreted at runtime by an LLM that can be convinced otherwise. You need something deterministic.
Enter Hooks
Hooks are shell commands that execute at specific lifecycle points: before a tool runs, after it completes, when Claude wants to stop, when a session starts. They’re not suggestions. They’re enforcement.
PreToolUse hook blocking .env edits = always runs, returns exit code 2, operation blocked
CLAUDE.md saying "don't edit .env" = parsed by LLM, weighed against other context, maybe followed
The difference is binary. Hooks execute regardless of what Claude thinks it should do.
Hookify: Zero-JSON Hook Creation
Writing hooks traditionally means editing settings.json with nested JSON structures. The hookify plugin eliminates that friction.
Install from the official marketplace (run /plugin and browse Discover if this doesn’t work):
/plugin install hookify
/hookify Block any rm -rf commands that include home directory paths
That creates .claude/hookify.block-rm-rf.local.md with a regex pattern and warning message. No restart needed. The rule takes effect on the next tool use.
The Rulebook
Block Destructive Commands
The home directory nuke would’ve been prevented by:
---
name: block-dangerous-rm
enabled: true
event: bash
pattern: rm\s+-rf\s+.*(/|~)
action: block
---
🛑 rm -rf with root or home path detected. Blocked.
Prevent Hardcoded Secrets
The $30k API key leak would’ve been caught by:
---
name: block-hardcoded-secrets
enabled: true
event: file
conditions:
- field: new_text
operator: regex_match
pattern: (API_KEY|SECRET|TOKEN|PASSWORD)\s*[=:]\s*["'][A-Za-z0-9_\-]{16,}
action: block
---
🔐 Hardcoded secret detected. Use environment variables instead.
Protect Sensitive Files
---
name: protect-env-files
enabled: true
event: file
conditions:
- field: file_path
operator: regex_match
pattern: \.env($|\.)
action: block
---
🚫 .env files are protected. Use .env.example with placeholders only.
Block Force Push
---
name: block-force-push
enabled: true
event: bash
pattern: git\s+push\s+.*(-f|--force)
action: block
---
⚠️ Force push blocked. Requires explicit approval.
Require Tests Before Completion
---
name: require-tests
enabled: true
event: stop
conditions:
- field: transcript
operator: not_contains
pattern: npm test|pnpm test|pytest|cargo test|go test
action: block
---
🧪 Tests not detected in session. Run test suite before completing.
Warn on Production Commands
---
name: warn-production
enabled: true
event: bash
pattern: (prod|production|--prod|PROD)
action: warn
---
⚠️ Production keyword detected. Verify this is intentional.
Flag Test File Modifications
---
name: warn-test-changes
enabled: true
event: file
conditions:
- field: file_path
operator: regex_match
pattern: \.(test|spec)\.(ts|js|tsx|jsx|py)$
action: warn
---
⚠️ Test file modification. Ensure assertions match expected behavior, not current implementation.
Use action: warn initially to understand what triggers without blocking your workflow. Escalate to action: block once you’ve validated the pattern.
When Patterns Aren’t Enough
For complex validation logic, drop to raw hooks in settings.json:
{
"hooks": {
"PreToolUse": [{
"matcher": "Bash",
"hooks": [{
"type": "command",
"command": "python3 ~/.claude/validators/bash_validator.py"
}]
}]
}
}
Your Python script receives JSON via stdin with the full command context. Return exit code 0 to allow, exit code 2 to block (with stderr shown to Claude).
Example validator that blocks commands escaping the project directory:
#!/usr/bin/env python3
import json, sys, os
data = json.load(sys.stdin)
cmd = data.get('tool_input', {}).get('command', '')
cwd = data.get('cwd', '')
# Block cd to parent directories or absolute paths outside project
if '../' in cmd or (cmd.startswith('cd /') and not cmd.startswith(f'cd {cwd}')):
print("Command attempts to escape project directory", file=sys.stderr)
sys.exit(2)
sys.exit(0)
What Hooks Don’t Solve
- Backups. Hooks can block destructive commands, but a novel destructive pattern will slip through. Git and Time Machine are still your friends.
- Sandboxing. Hooks run before the action, but Claude Code still has access to your filesystem. For true isolation, run in a container.
- Social engineering. A user can be convinced to disable hooks or approve a blocked action. Human judgment remains the final layer.
- Novel attack vectors. The Nx malware incident specifically targeted Claude Code with flags to bypass guardrails. Attackers adapt.
- False positives. Aggressive patterns will block legitimate operations. You’ll spend time tuning.
Hooks are one layer in a defense-in-depth strategy, not a silver bullet. As I wrote in Guardrails by Default, the future of AI coding isn’t smarter models. It’s enforcement baked in.
Getting Started
/plugin install hookify
If that doesn’t resolve, run /plugin and search for “hookify” in the Discover tab.
Start with one rule. The production warning is low-friction:
/hookify Warn me when any command contains "prod" or "production"
Watch it trigger for a week. Tune the pattern. Add another rule. Build your safety net incrementally.
Hooks complement the fundamentals: clean context, plan before executing, review diffs. They’re not a replacement for careful workflow. They’re the safety net for when you slip.
For more advanced patterns like auto-activating skills based on file context, hooks become the trigger mechanism. But start simple. One rule. One footgun prevented.
Update: Hooks Became an Attack Surface
In February 2026, Check Point Research disclosed critical CVEs (CVE-2025-59536, CVE-2026-21852, CVE-2026-24887) allowing RCE through Claude Code. The irony: hooks in .claude/settings.json were part of the attack vector. Malicious project files could define hooks that execute automatically when Claude loads an untrusted repo, without user confirmation.
The “What Hooks Don’t Solve” section above warned about novel attack vectors. This is what that looks like: the guardrail itself became the entry point. Hooks have since expanded to 12 lifecycle events and 4 handler types (command, HTTP, prompt, agent), but the lesson stands: deterministic enforcement is better than prompts, but any execution mechanism is also an attack surface. Trust the repo before trusting its hooks.
Update: June 2026
This post keeps trending, so here is what has changed since the January writing and the February attack-surface note.
The graveyard got its worst headstone yet. In April a Cursor agent running Claude Opus 4.6 was handed a routine staging task at PocketOS. It hit a credential mismatch, rifled through unrelated files, found an over-scoped Railway API token, and fired a volumeDelete mutation at the production volume. The entire database and its volume-level backups were gone in nine seconds. The agent’s confession was the now-familiar genre: “I violated every principle I was given. I guessed instead of verifying. I ran a destructive action without being asked.” A PreToolUse hook matching volumeDelete, or any use of that Railway token, would have stopped it before the first byte. Zenity stated the thesis more crisply than I did: “System prompts are weighted inputs to a probabilistic reasoning engine, not deterministic enforcement mechanisms.”
The numbers moved, a lot. When I wrote the February note, hooks had 12 lifecycle events and 4 handler types. As of mid-June it is roughly 30 events and 5 handler types, the new one being mcp_tool (a hook can hand off to a connected MCP server). Two changes matter for guardrails specifically. The args exec form lets you write "args": ["./check.sh", "..."] instead of a shell string, killing the quoting and path-escaping footguns that used to bite hook authors. And the if field filters a hook by tool and arguments, so "if": "Bash(git *)" fires only on git commands instead of every Bash call. Tighter matchers mean fewer false positives, and false positives are exactly what make people switch their guardrails off.
Hookify graduated. What this post described as a handy plugin is now first-party, Anthropic-verified, and past 50,000 installs (/plugin install hookify@claude-plugins-official). The enterprise side caught up too: Anthropic’s May deployment guide lists hooks among five core extension points and says plainly they “should be treated as security code because they can approve, block, or observe agent actions.” It also tells teams to re-review hooks every three to six months, because rules written to babysit an older model become dead weight on a newer one.
The guardrail-as-attack-surface story got a cleaner example. CVE-2026-25725, patched in v2.1.2: the sandbox protected an existing settings.local.json but not the creation of a brand-new settings.json. Get code execution inside the sandbox, write a file with a malicious SessionStart hook, and it runs with host privileges on the next launch. The persistence mechanism is the hook system itself. And the case for the secret-blocking rule above only got stronger: GitGuardian’s 2026 report found Claude-Code-authored commits leak credentials at roughly twice the baseline rate. The pattern holds both ways. Hooks are the best deterministic guardrail you have, and every execution mechanism you bolt on is also a door. Pin your version, pre-create a read-only settings.json, and treat any hook you did not write as code that wants to run on your machine.


