Agents Don't Refactor

Traditionally, when you touched a file you tidied it. You renamed the variable you’d been tripping over. You collapsed the two functions that did almost the same thing. You deleted the dead comment. Uncle Bob called it the Boy Scout Rule, and for a stretch of about twenty years it was the ambient hygiene of any half-decent engineering team:

Leave the campground cleaner than you found it.

— Robert C. Martin, Clean Code

In 2026, nobody is doing this anymore.

What Made the Rule Work

The Boy Scout Rule was load-bearing because the person writing the code was usually the person who had to read it again in six weeks. Skin in the tidiness. If you left the mess, you paid the mess tax yourself. The rule was half moral, half self-interested.

Yesterday I argued that Clean Code’s economics start to wobble when the reader is no longer human. The Boy Scout Rule looks, on the surface, like more of the same Clean Code canon due for retirement. It isn’t. It’s the opposite. The Boy Scout Rule gets more important when humans stop reading the code, not less. Here’s the flip.

Agents Add. They Don’t Subtract.

LLMs optimise for the path of least resistance, and under pressure the path of least resistance is additive. Given a request, an agent will bolt on a new path almost every time rather than refactor the existing one. This is documented behaviour, not folklore. Research out of Kiro and others shows LLMs consistently ignore most of the code around them, even when reading it would tell them what could be reused, so you end up with copy-pasta code. Under real pressure the model will even pick the easiest valid output and delete everything to replace it with a placeholder.

Fallbacks are the specific variant worth naming. An agent writing a function reaches for defensive scaffolding every time. Try/catch wrapped around operations that can’t fail. Null checks on values the caller already validated. Log-and-swallow blocks that silently eat real errors. “Just in case” branches for scenarios this codebase will never actually hit. Each one looks like prudence in isolation. Stacked ten deep across a service, they’re noise obscuring the actual logic, and they compound on the next pass when the next agent treats the fallback as load-bearing and wraps it in its own fallback.

Two weeks of this and your codebase has three login flows, two date helpers, a utils.ts next to a helpers.ts next to a misc.ts, and four of them import the same function under different names. Nobody who approved the PRs noticed, because nobody read them in the old sense of reading them. The diff was green. The tests passed. Merge.

Net-positive line counts, forever.

Why This Matters (It’s Hygiene, Not Craft)

I want to be specific about what this post is not. This is not a return to the hand-stitched Clean Code worship I argued against yesterday. Nobody needs to care about your function length or your file structure. Agents don’t care. You shouldn’t either.

What they do care about, because it measurably degrades their performance, is signal-to-noise in the codebase itself. The cost of accretion shows up in three concrete places:

Context budget. Every byte of stale code the agent reads to do its job is a byte it can’t spend reasoning. Context windows are large in 2026 and still not infinite.
The grep problem. Ask an agent “how do we do authentication?” and if there are four answers in the codebase, it picks one roughly at random (often the worst one) or averages them into a fifth worse option. Having one way to do things is now an alignment tool.
Pattern propagation. Agents pattern-match on what they see. Five examples of a deprecated pattern teach the agent to write a sixth. Old code isn’t inert. It’s a teacher.

Bad precedent compounds weekly

Before agents, a deprecated pattern in the codebase survived because nobody got around to replacing it. After agents, a deprecated pattern in the codebase multiplies. The next ten features generated by the next ten agents will look like it, because it was the nearest example they saw. Bloat that used to be a slow leak is now exponential.

The Technique: Cleanup as a Gate

The move is to stop hoping the agent will do the Boy Scout thing unprompted, because it won’t, and make cleanup an explicit gate in every task. A few patterns that actually work:

Survey before you build. First instruction of any non-trivial task: “before writing any code, list existing files and functions related to this task. For each, say whether it should be extended, replaced, or left alone. Report before implementing.” This one habit catches half the parallel-path cases at the entry point.
End-of-task diff pass. Last instruction before commit: “review your own diff. What’s now duplicated, superseded, or unused? Propose deletions as a separate cleanup commit.” Agents are surprisingly good at this when asked. They just never do it unless asked.
Justify every fallback. Ask the agent to name the concrete scenario each try/catch, null check, and defensive branch is guarding against. Delete anything that can’t answer. Unexamined defensive code is padding the agent added so its output looked diligent.
Net-negative days. One session a week with exactly one goal: reduce line count without changing behaviour. Let the agent propose what to cut. Review, merge, move on.
Skin in the CLAUDE.md. One sentence near the top: prefer extending existing code over adding parallel paths. If you’re adding a second way to do something, stop and propose removing the first. The agent will argue with itself about it. That’s the point.
Language server, not LLM reasoning. When a rename or move is structural, drive it through the IDE or language server, not through the model’s imagination of the dependency graph. Refactoring is a constraint satisfaction problem, not a creative one.
Wrap it in a skill. Claude Code ships a /simplify slash command that reviews recent changes for reuse, quality, and efficiency. Custom skills like impeccable:distill (strip to essence) or impeccable:harden (clean up error handling) do the same for specific flavours of cleanup. The best version of this gate is one your agent can invoke by name. The technique stops being discipline and becomes a button.

The meta-technique: the agent will do the hygiene work when it’s written into the task. It will not do it as an ambient virtue. There is no ambient virtue in an LLM. There is only the prompt.

What This Doesn’t Fix

This is maintenance, not salvation. Honest limits:

Agents still miss things. They lack the whole-graph view of the codebase. An end-of-task pass catches local duplication, not architectural drift. That still wants a human eye at review time.
Real renames need real tools. Anything that touches type definitions or import graphs still wants a language server behind the agent, not model-guessed find-and-replace.
Some smells are ineffable. The pattern you can feel is off but can’t articulate doesn’t fit into a cleanup prompt. That’s still the senior engineer’s job, and probably always will be.
The rule is a floor, not a ceiling. The Boy Scout Rule was never meant to replace architecture. It’s the baseline hygiene underneath the architecture. Don’t expect it to do the structural work.

Bring It Back

The Boy Scout Rule survived the last era because humans had skin in the tidiness. It survives this era because agents inherit everyone’s mess and multiply it. Bad precedent that used to sit quietly in a dark corner now colonises the codebase one generated PR at a time.

Leave the campground cleaner than you found it. Make the agent do it.

Agents Don't Refactor

What Made the Rule Work

Agents Add. They Don’t Subtract.

Why This Matters (It’s Hygiene, Not Craft)

The Technique: Cleanup as a Gate

What This Doesn’t Fix

Bring It Back

Share this article

Related Posts

The Idiot Savant Needs Guardrails

Your Architecture Is Showing

Receipts: SWE-bench Pro and the Lab That Walked Away