Heroku’s 12-factor apps shaped how a generation of engineers built cloud software. Stateless processes, config in environment variables, disposable instances. The principles became so embedded in how we think about deployment that we stopped naming them.
12 Factor Agents from HumanLayer aims to do the same for AI. The core insight is disarmingly practical: most successful AI products aren’t purely agentic loops. They combine deterministic code with strategically placed LLM decision points.
— HumanLayer, 12 Factor AgentsMost of the products out there billing themselves as ‘AI Agents’ are not all that agentic. The fastest path to quality involves incorporating modular agent concepts into existing products rather than adopting full agent frameworks.
The 12 Factors
Here’s the full list, organized by what they govern:
Control
- Own your prompts - Direct control over prompt engineering, not framework abstractions
- Own your control flow - Explicit execution paths, not delegated loops
- Stateless reducer - Agents as pure functions: input state → output state
Context
- Own your context window - Curate what enters the LLM’s attention
- Compact errors - Distill failures into concise context, not verbose logs
- Pre-fetch context - Retrieve information upfront, not mid-execution
State
- Unify execution and business state - No parallel state systems to synchronize
- Launch/pause/resume - Suspension points for human intervention
Interface
- Natural language to tool calls - LLM outputs decisions, not final execution
- Tools are structured outputs - Tool-calling is structured output generation
- Trigger from anywhere - Webhooks, cron, user actions, external events
Architecture
- Small, focused agents - Narrow responsibilities over monolithic systems
- Contact humans with tool calls - Human-in-the-loop as first-class operation
Factor 13 (pre-fetch context) appears in the appendix but deserves equal weight. Retrieving context upfront reduces mid-execution lookups and keeps agents deterministic.
The Dumb Zone
Factor 3 - own your context window - is the linchpin. The others depend on it.
Context windows aren’t just storage. They’re attention budgets. Dex Horthy (HumanLayer founder) analyzed 100,000 developer sessions and identified the “dumb zone”: the middle 40-60% of a large context window where model recall degrades and reasoning falters. Fill past 40% and diminishing returns kick in. The more you use the context window, the worse the outcomes.
This aligns with research on the “lost in the middle” problem: LLMs perform best when relevant information is at the beginning or end of context, with significant degradation for information buried in long sequences.
— Manus AI, Context Engineering for AgentsAs agents become more capable, they naturally accumulate more tools. Your heavily armed agent gets dumber.
Manus rebuilt their agent framework four times learning this. Anthropic’s context engineering guidance frames it as finding “the smallest set of high-signal tokens that maximize the likelihood of your desired outcome.”
In Claude Code, type /context to see what’s loaded. You’ll often find MCP definitions, bloated CLAUDE.md files, and conversation history consuming tokens before you’ve started real work. Clean context = sharp agent.
Factors in Practice
These principles aren’t theoretical. Anthropic’s tools embody them:
Plan Mode = Factors 2, 3, 8 Plan Mode blocks write tools at the system level. You own the prompt (factor 2), the context stays focused on planning not execution artifacts (factor 3), and you control when execution begins (factor 8). No framework abstractions hiding the decision points.
Parallel subagents = Factor 10 Plan Mode spawns lightweight Haiku agents to explore your codebase simultaneously. Each gets an isolated context window, returns condensed findings, and dies. Small, focused, disposable. Factor 10 in action.
CLAUDE.md = Factor 3 Project and user-level instruction files are curated context. Short, specific, opinionated. Not documentation for you - training for Claude. Every token in CLAUDE.md is a token not available for understanding your actual code.
Agent harnesses = Factors 5, 6
Progress files and feature lists unify execution state with business state. The agent reads claude-progress.txt before touching code. Launch/pause/resume happens through git commits and human review, not magic framework hooks.
Small Over Monolithic
Factor 10 - small, focused agents - is the antidote to framework bloat.
The community built elaborate scaffolding: BMAD with 19 specialized agents, Spec-Kit with multi-stage workflows, external orchestration layers. These existed because the tools lacked native structure.
Now native features absorb the patterns. Plan Mode does what manual plan/act splits did. Parallel subagents do what external orchestration did. The scaffolding becomes friction.
The 12-factor philosophy aligns: don’t build monolithic agent systems. Build small agents with clear interfaces. Let them compose. The complexity lives in the composition, not the individual agents.
What Factors Don’t Solve
Factor 7 (contact humans with tool calls) enables human-in-the-loop workflows. It doesn’t replace the need for human judgment.
What stays human:
- Strategic vision - What problem to solve, what market to enter
- Novel architecture - Cross-cutting decisions that require deep system intuition
- Ambiguous requirements - When the spec is unclear, agents can’t resolve it themselves
- Final accountability - Engineers own what ships
The SDLC collapse pattern holds: delegate mechanical work, review for correctness, own the judgment calls. The factors optimize the delegation. They don’t automate the ownership.
The first 90% of the code takes 90% of the time. The remaining 10% takes the other 90%. Agents accelerate the first pass. The long tail of edge cases, integration issues, and iterative refinement remains long.
Applying the Factors
If you’re building with AI agents:
- Treat context as a scarce resource - Every MCP, every tool definition, every line of CLAUDE.md consumes attention budget. Keep it lean.
- Own the control flow - Don’t hand execution to framework loops. Know exactly when and why the LLM makes decisions.
- Build small agents - One agent, one job. Compose them rather than building monoliths.
- Design pause points - Factor 6 isn’t optional. Agents need suspension points for human review, especially for irreversible operations.
- Pre-fetch aggressively - Factor 13 reduces mid-execution surprises. Gather context upfront, not when the agent stumbles.
- Stay out of the dumb zone - Keep context under 40% capacity.
/clearbetween tasks. Let auto-compact handle overflow.
The 12 factors aren’t new ideas. They’re the patterns that emerged from building agents that actually work in production. Now they have names.
The full 12 Factor Agents repo includes implementation examples and deeper explanations for each factor. Worth reading alongside Anthropic’s context engineering guidance.


