Boris Cherny, head of Claude Code, said recently that the tool’s own codebase was rewritten every few months. No code from six months prior remaining in production at any given time. Perhaps 80% of the code being less than a couple months old.
I installed Claude Code on launch day: Feb 24, 2025. One year later, the tool I use shares almost nothing with the tool I installed. Five distinct eras. Eighty-plus posts on this blog documenting the journey. And here’s the thing that only becomes visible at the twelve-month mark:
The tool changed completely. Five times. The discipline required to use it didn’t change once.
If you’re new here, An Alien Tool With No Manual is where the philosophy starts.
The Arc
The year moved in eras, each one making the previous approach look primitive. What felt like mastery in March was table stakes by August.
- The Core Loop (Feb - Oct 2025): Clean context, plan mode, CLAUDE.md discipline, slowing down. The community invented ROADMAP.md, ULTRATHINK, SCRATCHPAD.md. All obsolete overnight when Anthropic shipped native alternatives.
- Context Wars (Oct - Nov 2025): The bottleneck shifted from capability to context efficiency. The community discovered a hidden MCP flag, then the deeper insight: context isolation was architecture, not optimization.
- Multi-Model Routing (Nov - Dec 2025): From “Claude does everything” to “Claude orchestrates everything.” Codex for architecture, Gemini for vision, Haiku for fast subtasks. No model is best at everything.
- Controllability (Dec 2025 - Jan 2026): The framework trap became clear as Anthropic shipped the scaffolding natively. Hooks replaced prompt-based guardrails with deterministic triggers.
- Agent Teams (Jan - Feb 2026): kieranklaassen found TeammateTool hidden in the binary. Two weeks later, Anthropic flipped the switch. Beads became Tasks. The community isn’t just using the tool. It’s designing the next version.
Every era felt like the final form. None of them were.
The tool maintains the thing that documents the tool. That’s either poetic or a warning.
What I Got Wrong
This is the section that matters. The wins are easy to list. The wrong assumptions are where the learning actually lives.
”Autonomous loops will replace most human oversight”
I believed this longer than I should have. The Ralph Wiggum technique worked so well for mechanical tasks (test-fix cycles, lint passes, migration scripts) that I started assuming the same pattern would scale to judgment-heavy work. Architecture decisions. API design. User-facing copy.
It didn’t. Unsupervised loops produce confident garbage when the task requires taste. Not because the model lacks capability, but because “good enough to pass tests” and “good enough to ship” are different bars. The model can’t tell the difference. That’s what judgment means.
I wrote about this when it clicked. The lesson wasn’t “AI can’t do hard things.” It was “the hard things aren’t the ones you think."
"One model is enough”
This was partly ego. I’d invested heavily in understanding Claude’s strengths and weaknesses, and admitting I needed other models felt like admitting Claude wasn’t good enough. It is good enough. It’s just not good enough at everything.
Codex sees architecture patterns Claude misses. Gemini catches visual issues Claude can’t perceive. Haiku handles bulk operations where Opus is overkill. The workflow that works routes to the right model, not the favourite model. That shift took months of resistance before it clicked.
”CLAUDE.md solves the knowledge problem”
Partially true. CLAUDE.md gives the model project context, and that matters enormously. But there’s a gap between “knows about this project” and “knows about me.” Session-level memory, cross-project preferences, learned patterns from past mistakes. CLAUDE.md was the first layer. The memory system that followed filled gaps I didn’t know existed until they were filled.
”Quality is monotonically improving”
It’s not. Regressions happen. Model updates break workflows that were working yesterday. A prompt that produced perfect output with Sonnet 4.5 generates garbage with 4.6. The emotional cycle is predictable now: denial, frustration, adaptation. Knowing the cycle doesn’t make it painless. It just makes the adaptation phase shorter.
This one stings because it challenges the narrative. AI progress is supposed to be a line that goes up. In practice, it’s a sawtooth. The peaks get higher, but the dips between them can wreck a week’s productivity if you’re not watching.
Every time a model improves, you build workflows that depend on the new capability. When the next update regresses in that area, you don’t just lose the improvement. You lose the workflows you built on top of it. The blast radius of a regression is larger than the footprint of the original gain.
What Changed in Me
This is harder to write than the technical retrospective. Tools change features. People change habits, instincts, standards.
I stopped defaulting to code. A year ago, my first instinct for any problem was to start typing. Now my first instinct is to describe the problem clearly and let the model propose the approach. The shift sounds small. It changed everything. I spend more time thinking about what to build and less time thinking about how. The “how” got cheaper. The “what” got more valuable.
I got more skeptical, not less. You’d expect a year of AI-assisted development to make me more trusting. The opposite happened. I’ve seen too many confident, well-structured, completely wrong outputs to take anything at face value. I read the code more carefully now than I did before AI wrote it. The model’s fluency makes its errors harder to spot, which makes verification more important, not less.
I stopped asking “can the AI do this?” and started asking “should the AI do this?” That question doesn’t get easier with better models. It gets harder. More capability means more surface area where delegation is possible but not wise. The boundary between “automate this” and “think about this yourself” is the most important judgment call in the workflow. No tool helps you make it.
I learned to hold tools loosely. Every workaround I built was eventually replaced by a native feature. Every “definitive guide” I wrote had a shelf life measured in weeks. Boris’s team deletes code every time a new model ships. I wrote about this when Beads became Tasks: the CLI tooling layer has no moat. Any good pattern gets absorbed into the product. The lesson for users is the same: don’t get attached to the scaffolding. The tool underneath is moving faster than your adaptations to it.
The Constant
Across five eras, two complete model generations, and a tool that rewrote itself every few months, one thing didn’t change: the fundamentals.
Clean context. Explicit goals. Plan before executing. Read before editing. Verify before trusting.
That was the advice in month one. It’s still the advice in month twelve with agent teams, multi-model routing, hooks, memory, and everything else layered on top. The surface complexity exploded. The core discipline held.
That’s the real takeaway from a year of this. The tool will keep changing. It’ll be rewritten again before this post is six months old. The part that matters - the judgment, the discipline, the decision about when to use it and when not to - that’s yours to develop. No model update ships that.
80+ posts. Five eras. One constant. The tool changed faster than anyone could document. I tried anyway. I’ll keep trying.


