Claude Skills: The Controllability Problem

The Promise

Claude Code Skills are reusable capabilities that Claude automatically invokes when relevant. You write a SKILL.md file with a description, Claude reads it, and autonomously decides when to activate it based on your request.

The pitch is compelling:

Simpler than MCP: Just Markdown with YAML frontmatter, not a whole protocol
More powerful than slash commands: Context-aware auto-invocation
Token-efficient when idle: Only ~50 tokens for name + description until activated

Skills are positioned as ambient intelligence - Claude magically knows what tools are available and reaches for them when appropriate.

What Are Skills?

Skills are model-invoked capabilities stored in ~/.claude/skills/ or .claude/skills/. Claude reads skill descriptions and autonomously decides when to activate them based on your request.

The Problem: No Control Over Invocation

Here’s the catch: you can’t control when skills activate. Claude decides using its own semantic understanding of your request - there’s no algorithmic matching, it’s LLM reasoning about whether your intent matches a skill description. You have no override mechanism. It’s not predictable code logic - it’s non-deterministic model behavior.

This creates friction for engineering workflows:

Can’t force-invoke: You need architectural analysis right now, but Claude doesn’t think your question matches the skill description
Can’t prevent invocation: You’re debugging a caching issue, but Claude triggers web search to look up “cache strategies” when you just wanted to grep the codebase
No visibility into decisions: Claude silently chooses when to load skill instructions, you don’t see the token cost until it happens
Unpredictable context consumption: A skill might consume 10k tokens of context for instructions you didn’t need right now
Breaks context engineering: You’re carefully managing what’s in the context window, then Claude auto-loads skill instructions you didn’t ask for

Context engineering means controlling what Claude processes. Skills that auto-activate break that discipline - you need to decide what’s relevant, not let the model guess.

— The control principle

Context Engineering Matters

Context isn’t just about cost. It’s about signal-to-noise ratio. Every token in the window competes for Claude’s attention. Auto-invoked skills add noise when you need signal. Control over your context is control over quality.

Real Example: Web Search

In the Claude.ai web interface, web search has a toggle - you explicitly enable it. In Claude Code, web search is available as a tool that Claude invokes based on context judgment.

The problem surfaces when:

You’re asking about a framework concept that exists in your codebase
Claude decides “this sounds like a web search query”
Burns tokens searching the web instead of grepping your code
You wanted local answers, you got web results

What you need: “Search the codebase for authentication patterns” What Claude might do: Trigger web search for “authentication best practices 2025”

You didn’t want web search. Claude thought you did. No way to prevent it.

The Ultrathink Illusion

“Ultrathink” sounds like a skill - some kind of deep reasoning mode Claude activates. It’s not. Unlike skills which use LLM semantic reasoning, ultrathink is just hardcoded string matching that sets token budgets:

“think”: 4,000 tokens
“megathink”: 10,000 tokens (also responds to “think hard”, “think deeply”, “think more”)
“ultrathink”: 31,999 tokens (oddly specific, but that’s the hardcoded value)

When Claude Code sees these strings, it sets max_thinking_length to the corresponding value. This only works in the CLI, not in the web interface or API. It feels like magic, but it’s just a config flag triggered by string matching.

Breaking the illusion: This isn’t Claude choosing to think harder based on problem complexity. It’s you typing a keyword to change a parameter. The “intelligence” is marketing, the reality is if (prompt.includes("ultrathink")) { max_thinking = 31999; }.

The irony: Skills use unpredictable LLM reasoning to decide invocation. Ultrathink uses predictable string matching. Neither gives you explicit control, but for opposite reasons - one’s too fuzzy, the other’s just hidden.

Ultrathink Limitations

Ultrathink only works in Claude Code’s terminal interface. In claude.ai or the API, it’s just regular text. The “skill” is actually hardcoded keyword detection that sets a thinking budget parameter.

What Skills Don’t Solve

Skills are positioned as ambient intelligence, but once you see past the automation, the gaps become obvious. They can’t deliver on several critical needs for engineering workflows:

Explicit invocation: No claude --skill=analysis flag to force a skill when you need it
Prevention controls: No --disable-skill=web-search to block auto-invocation
Context visibility: No token budget shown before skill instructions load into your context
Invocation logging: No way to see why Claude chose to activate a skill
Debugging: Can’t debug LLM reasoning - you can’t deterministically reproduce why Claude matched (or didn’t match) a skill
Conditional logic: Can’t say “only use this skill in X directory” or “only after Y tool fails”

The fundamental issue: skills use LLM reasoning for invocation, which is inherently non-deterministic. For exploratory conversations that’s fine. For engineering workflows where context engineering and predictable behavior matter, it’s a problem.

The Alternative: Slash Commands

Slash commands give you what skills don’t: explicit control over invocation.

Here’s how they differ:

Skills (Auto-Invoked)

Claude decides when to use based on semantic reasoning
No explicit invocation syntax
Context loaded when Claude thinks it’s relevant
Non-deterministic (LLM behavior, not code logic)
Good for: Ambient capabilities, exploratory work where unpredictability is acceptable

Slash Commands (User-Invoked)

You type /command to trigger
Explicit invocation syntax
Context loaded only when you call it
Deterministic (runs when you say, not when Claude guesses)
Good for: Engineering workflows, context engineering, predictable behavior

Slash Commands for Control

Create slash commands that wrap skill-like logic. You get reusable capabilities with explicit invocation control. Best of both worlds.

Real Example: /codex for Architectural Analysis

Here’s a working pattern that gives you skill-like reusability with command-like control:

# ~/.claude/commands/codex.md
---
allowed-tools:
  - Bash(codex:*)
  - Read(~/projects/**)
  - Glob
  - Grep
description: Launch Codex for software architecture analysis and research
---

Invoke Codex (OpenAI) to analyze software architecture, research design patterns,
or provide senior-level technical insights.

## Steps:

1. Gather context (what's the user working on?)
2. Build prompt with project context
3. Execute: codex exec --search -C [project] -s read-only --full-auto "[prompt]"
4. Return filtered output

When you type /codex should we use BLoC or Hooks?, you get:

Explicit invocation: You decided to pay for Codex analysis
Clear boundaries: Codex runs, returns results, exits
Isolated context: Codex analysis doesn’t pollute main conversation
Predictable cost: You know you’re invoking an external tool

This pattern works for any specialized capability:

/chrome - Chrome DevTools debugging (isolated MCP context)
/db - Database queries and schema inspection
/perf - Performance analysis and profiling
/codex - Senior-level architectural analysis

Each one is skill-like (reusable, documented) but command-like (explicit, controlled, predictable).

No Hybrid Workaround

There’s no way to have both a skill and a slash command that solves the control problem. If a skill exists with execution logic, Claude can still auto-invoke it. The slash command doesn’t prevent that. You have to choose: skills (auto-invoke) or slash commands (explicit control).

What I Want

Skills could work for engineering if they exposed the controls instead of hiding behind automation. Here’s what that looks like:

Explicit invocation: claude --skill=analysis "question" to force a skill when needed
Disable toggles: --disable-skill=web-search to prevent auto-invocation
Invocation logging: Show which skills activated and why
Context visibility: Show token cost before loading skill instructions
Conditional logic: Activate skills based on directory, previous tool results, or user intent patterns

Until these controls exist, slash commands are the better choice for predictable engineering workflows.

Where Skills Might Make Sense

Skills do solve a real problem: progressive disclosure for teams with many specialized patterns.

In a very large codebase (think ~1M LOC monorepo), you might have dozens of domain-specific patterns that don’t fit cleanly in CLAUDE.md. Skills let you:

Load ~50 tokens of metadata initially (“Backend service design pattern”)
Full instructions (~500+ tokens) only load when Claude matches relevance
Avoid bloating your system prompt with every possible pattern upfront

This is a legitimate use case. The question is whether the tradeoffs are worth it:

50 tokens per skill adds up: 10 skills = 500 tokens of metadata before anything loads. For comparison, “follow the established pattern in the codebase” costs ~10 tokens and often works just as well.
Non-deterministic matching is a risk for critical workflows: If you’re building production automation that needs reliability, LLM semantic matching means you can’t guarantee Claude will load the right skill at the right time.
The use case is specific: Most valuable for large teams with many specialized patterns in complex codebases. For smaller projects, the overhead may not be worth it.

If you’re in that large monorepo scenario, Skills could genuinely help. For most teams, slash commands give you similar reusability with explicit control.

The Bottom Line

Skills are great for:

Large codebases (~1M LOC monorepos) with many domain-specific patterns
Teams that need progressive disclosure to avoid bloating CLAUDE.md
Exploratory work where unpredictability is acceptable

Skills are problematic for:

Engineering workflows that need predictability
Context engineering (you want to control what’s in the window)
Scenarios where you know better than Claude when to use a tool

Slash commands are better when:

You want explicit control over invocation
You’re practicing context engineering (managing signal-to-noise)
You’re building specialized workflows that should run on-demand

The ideal future: skills with explicit invocation controls and disable flags. Until then, treat Claude as a tool that needs explicit controls, not magic that guesses your intent.

Engineering maturity means seeing past the magic and demanding control. Choose explicit over ambient when context matters.

Control Patterns in Claude Code

Control is a recurring theme in effective Claude Code usage:

Slash commands give you explicit invocation control - see Isolating MCP Context with Slash Commands for Chrome DevTools isolation and When Claude Needs a Second Opinion for architectural analysis.

Plan mode gives you approval control - Claude shows you what it will do before executing. You get to review the plan, not discover 500 lines of unwanted code after the fact. See Stop Speedrunning Claude Code for why mastering this core loop matters.

Both patterns solve the same problem: treating Claude as a tool that needs explicit controls, not magic that guesses your intent.