Guardrails by Default: Why AI Coding's Next Evolution Isn't Smarter Models

The vibe coding problem is real. Ship features fast, but no structure. No tests. Code scattered everywhere. It’s the dirty secret of AI-assisted development: velocity without quality.

Luke from Factory AI thinks the solution isn’t better prompting or more powerful models. It’s AI that enforces software engineering best practices by default.

I would expect, especially using tools like Droid, that the output you get resembles that of a software organization a lot more than that of a vibe coder.

— Luke, Factory AI

Linting. Testing. CI/CD. Proper architecture. All baked in. Not as optional features you configure, but as the default behavior.

Factory AI and the Droid Approach

Factory AI builds Droid, an AI coding agent that currently tops Terminal-Bench 2.0 with a 64.9% score - ahead of Junie CLI (64.3%) and Codex CLI (60.4%). But the interesting part isn’t the benchmark. It’s their philosophy.

Droids work across the entire SDLC: autonomous coding, incident resolution, codebase research, spec creation, code review. The key insight is their “DroidShield” layer - real-time static analysis that catches security vulnerabilities, bugs, and IP issues before code is committed.

This is enforcement as default, not enforcement as afterthought.

Three Predictions That Stuck

Luke made three predictions about where this is heading:

Monorepos Become Standard

It has never been more important for your CI to go green altogether. If you can guarantee that, you know that whatever changes an agent has made are working together in concert.

— Luke, Factory AI

When agents can make changes across multiple services simultaneously, you need confidence that everything still works together. Separate repos make that orchestration harder. Monorepos let CI be the single source of truth.

I’ve built tooling for exactly this problem - managing 20+ repos with unified CLI commands, parallel git operations, and shared development stacks. The experience mimics a monorepo: one command to pull all repos, one dashboard to see status, one workflow regardless of which service you’re changing.

But Luke’s right. This is friction I’ve engineered around, not eliminated. True monorepos make CI the arbiter. My tooling makes humans the arbiter. As agents get more capable, the former scales better.

Migrations Die

This idea of building a product that is a prototype and then goes straight into a product you want to release - I think that’s the new way of thinking about migrations.

— Luke, Factory AI

Luke pointed to Cursor 2.0 as an example: they didn’t migrate their app, they rebuilt it from scratch. Factory is doing the same with their web app - three weeks to replicate a year’s worth of work.

I’ve seen this in my own projects. ParseIt started as PocketBase + Go + Next.js. Now it’s Convex + React + Vite. I didn’t migrate - I rebuilt. The old code sits in a legacy/ folder, not as dead weight, but as context. When I prompt “rebuild the auth flow,” the agent can reference what worked before without inheriting the implementation decisions that didn’t.

Your previous project becomes context for your next project. That’s the new migration strategy: rebuild with memory.

The same applies to third-party integrations. Instead of wrestling with someone else’s SDK, describe what the integration does and build it yourself. The function you needed is the context. The implementation is fresh, fits your architecture, and has no dependency baggage.

The weight of legacy code that large companies carry - Luke mentioned working at a major tech company with 30-year-old file systems - becomes optional. Not for everything. But for more things than we currently assume.

Self-Hostable Models Make This Affordable

Luke sees open-weight, self-hostable models as the next frontier:

If you can make the pricing a lot more approachable for developers, I think you’ll see a lot more adoption of these tools.

— Luke, Factory AI

Frontier models are great for complex tasks. But for enforcement - linting, pattern checking, basic code review - you don’t need Opus. You need something fast, cheap, and good enough. Self-hostable models running locally could handle the continuous enforcement layer while cloud models handle the complex reasoning.

What This Actually Changes

The shift from “AI writes code” to “AI enforces quality” has structural implications:

Code review changes. If AI catches 80% of issues before PR, human review focuses on architecture and business logic, not style and obvious bugs.
Hiring changes. If tooling enforces patterns by default, you’re not selecting for “knows our stack” anymore. You’re selecting for judgment, taste, systems thinking. (This ties directly to why companies reject AI-fluent engineers - they’re testing for skills that enforcement will make obsolete.)
Onboarding changes. New engineers learn patterns through guardrails. The tooling rejects requests that violate architecture, teaching conventions implicitly.

What This Doesn't Solve

Guardrails can’t replace judgment. Novel architecture, spec ambiguity, tradeoff decisions - these stay human. Over-enforcement kills productivity (linters that reject valid code are worse than no linters). Early-stage teams need flexibility; guardrails work at scale. The goal is encoding solved problems so humans can focus on unsolved ones.

The Boring Win

This is the Prettier pattern at scale. Prettier solved formatting wars by making them impossible. Everyone runs it, nobody argues about semicolons anymore.

The next version: AI agents that solve architecture wars. Not by being smarter about architecture, but by enforcing whatever patterns you’ve established. Standards move from implicit (“we all just write it this way”) to explicit (“the tool won’t let you write it differently”).

That’s not glamorous. It’s not “AI achieves AGI.” It’s boring infrastructure work that compounds over time. Which is exactly why it’s likely to happen.

Try Droid

Factory offers a free tier to experiment with Droid’s approach to enforcement-first AI coding. The interesting part isn’t whether it writes better code than Claude Code - it’s whether the enforcement layer changes your workflow.

Watch the full interview

The conversation with Luke from Factory AI goes deeper on monorepos, migrations, and where AI coding tools are headed.

The predictions aren’t radical. They’re logical extrapolations of what’s already working. Enforcement by default. Monorepos for CI confidence. Rebuilding over migrating. Affordable local models for the enforcement layer.

If Luke’s right, the next year isn’t about models getting smarter. It’s about the scaffolding around them doing more.

Guardrails by Default: Why AI Coding's Next Evolution Isn't Smarter Models

Factory AI and the Droid Approach

Three Predictions That Stuck

Monorepos Become Standard

Migrations Die

Self-Hostable Models Make This Affordable

What This Actually Changes

The Boring Win

Share this article

Related Posts

When AI Isn't Fit for Purpose: Lessons from Salesforce's Agentforce Pivot

The $0 SaaS Stack: Ship Fast, Pay Later

Few-Shot Learning for Document Parsing: Training AI on Human Corrections