12 Factor Agents: Principles for AI That Actually Work

Heroku’s 12-factor apps shaped how a generation of engineers built cloud software. Stateless processes, config in environment variables, disposable instances. The principles became so embedded in how we think about deployment that we stopped naming them.

12 Factor Agents from HumanLayer aims to do the same for AI. The core insight is disarmingly practical: most successful AI products aren’t purely agentic loops. They combine deterministic code with strategically placed LLM decision points.

Most of the products out there billing themselves as ‘AI Agents’ are not all that agentic. The fastest path to quality involves incorporating modular agent concepts into existing products rather than adopting full agent frameworks.

— HumanLayer, 12 Factor Agents

The 12 Factors

Here’s the full list, organized by what they govern:

Control

Own your prompts - Direct control over prompt engineering, not framework abstractions
Own your control flow - Explicit execution paths, not delegated loops
Stateless reducer - Agents as pure functions: input state → output state

Context

Own your context window - Curate what enters the LLM’s attention
Compact errors - Distill failures into concise context, not verbose logs
Pre-fetch context - Retrieve information upfront, not mid-execution

State

Unify execution and business state - No parallel state systems to synchronize
Launch/pause/resume - Suspension points for human intervention

Interface

Natural language to tool calls - LLM outputs decisions, not final execution
Tools are structured outputs - Tool-calling is structured output generation
Trigger from anywhere - Webhooks, cron, user actions, external events

Architecture

Small, focused agents - Narrow responsibilities over monolithic systems
Contact humans with tool calls - Human-in-the-loop as first-class operation

The appendix factor

Factor 13 (pre-fetch context) appears in the appendix but deserves equal weight. Retrieving context upfront reduces mid-execution lookups and keeps agents deterministic.

The Dumb Zone

Factor 3 - own your context window - is the linchpin. The others depend on it.

Context windows aren’t just storage. They’re attention budgets. Dex Horthy (HumanLayer founder) analyzed 100,000 developer sessions and identified the “dumb zone”: the middle 40-60% of a large context window where model recall degrades and reasoning falters. Fill past 40% and diminishing returns kick in. The more you use the context window, the worse the outcomes.

This aligns with research on the “lost in the middle” problem: LLMs perform best when relevant information is at the beginning or end of context, with significant degradation for information buried in long sequences.

As agents become more capable, they naturally accumulate more tools. Your heavily armed agent gets dumber.

— Manus AI, Context Engineering for Agents

Manus rebuilt their agent framework four times learning this. Anthropic’s context engineering guidance frames it as finding “the smallest set of high-signal tokens that maximize the likelihood of your desired outcome.”

Check your context

In Claude Code, type /context to see what’s loaded. You’ll often find MCP definitions, bloated CLAUDE.md files, and conversation history consuming tokens before you’ve started real work. Clean context = sharp agent.

Factors in Practice

These principles aren’t theoretical. Anthropic’s tools embody them:

Plan Mode = Factors 2, 3, 8 Plan Mode blocks write tools at the system level. You own the prompt (factor 2), the context stays focused on planning not execution artifacts (factor 3), and you control when execution begins (factor 8). No framework abstractions hiding the decision points.

Parallel subagents = Factor 10 Plan Mode spawns lightweight Haiku agents to explore your codebase simultaneously. Each gets an isolated context window, returns condensed findings, and dies. Small, focused, disposable. Factor 10 in action.

CLAUDE.md = Factor 3 Project and user-level instruction files are curated context. Short, specific, opinionated. Not documentation for you - training for Claude. Every token in CLAUDE.md is a token not available for understanding your actual code.

Agent harnesses = Factors 5, 6 Progress files and feature lists unify execution state with business state. The agent reads claude-progress.txt before touching code. Launch/pause/resume happens through git commits and human review, not magic framework hooks.

Small Over Monolithic

Factor 10 - small, focused agents - is the antidote to framework bloat.

The community built elaborate scaffolding: BMAD with 19 specialized agents, Spec-Kit with multi-stage workflows, external orchestration layers. These existed because the tools lacked native structure.

Now native features absorb the patterns. Plan Mode does what manual plan/act splits did. Parallel subagents do what external orchestration did. The scaffolding becomes friction.

The 12-factor philosophy aligns: don’t build monolithic agent systems. Build small agents with clear interfaces. Let them compose. The complexity lives in the composition, not the individual agents.

What Factors Don’t Solve

Factor 7 (contact humans with tool calls) enables human-in-the-loop workflows. It doesn’t replace the need for human judgment.

What stays human:

Strategic vision - What problem to solve, what market to enter
Novel architecture - Cross-cutting decisions that require deep system intuition
Ambiguous requirements - When the spec is unclear, agents can’t resolve it themselves
Final accountability - Engineers own what ships

The SDLC collapse pattern holds: delegate mechanical work, review for correctness, own the judgment calls. The factors optimize the delegation. They don’t automate the ownership.

The 90/90 rule still applies

The first 90% of the code takes 90% of the time. The remaining 10% takes the other 90%. Agents accelerate the first pass. The long tail of edge cases, integration issues, and iterative refinement remains long.

Applying the Factors

If you’re building with AI agents:

Treat context as a scarce resource - Every MCP, every tool definition, every line of CLAUDE.md consumes attention budget. Keep it lean.
Own the control flow - Don’t hand execution to framework loops. Know exactly when and why the LLM makes decisions.
Build small agents - One agent, one job. Compose them rather than building monoliths.
Design pause points - Factor 6 isn’t optional. Agents need suspension points for human review, especially for irreversible operations.
Pre-fetch aggressively - Factor 13 reduces mid-execution surprises. Gather context upfront, not when the agent stumbles.
Stay out of the dumb zone - Keep context under 40% capacity. /clear between tasks. Let auto-compact handle overflow.

The 12 factors aren’t new ideas. They’re the patterns that emerged from building agents that actually work in production. Now they have names.

Read the original

The full 12 Factor Agents repo includes implementation examples and deeper explanations for each factor. Worth reading alongside Anthropic’s context engineering guidance.