Opus 4.5 scores 80.9% on SWE-bench Verified. The same model scores 45.89% on the contamination-free Pro split. OpenAI has quietly stopped reporting Verified at all. Vendor benchmark cards are marketing.
Read more →
Anthropic markets MCP as the universal AI tooling standard, but a 200,000-server RCE class is 'expected behavior.' You can't be both.
Read more →
Uncle Bob, father of TDD, posted on X that TDD is 'very inefficient for AIs' and that the agent is best thought of as 'a highly focused idiot savant.' Testing didn't die. It got more important. And the review target flipped.
Read more →
Traditional coders touched a file and tidied it. The Boy Scout Rule. Now nobody does. Agents add, they don't subtract, and the codebase accretes faster than ever. A technique for putting cleanup back in as an explicit gate, not a virtue you hope for.
Read more →
Vercel got breached through Context.ai, an AI tool an employee installed with OAuth scopes into Google Workspace. It's the latest in a pattern: Trivy into litellm, axios maintainer hijack, now this. The safest AI tool is the one you didn't install.
Read more →
Your exec summaries, delivery plans, and Gantt charts belong in git. AI agents can synthesize planning docs from scattered sources and produce polished, print-ready briefs. The repo is the PM tool.
Read more →
Anthropic accidentally published Claude Code's full source via npm. Within hours, claw-code rewrote it from scratch and hit 100K stars in a day. The interesting part isn't the leak - it's what the architecture reveals.
Read more →
Enterprise architecture patterns were designed for a world where code was expensive to write and expensive to change. That world ended. The patterns didn't get the memo.
Read more →
FameCake's AI journey: from 15 style transforms as the headline feature to content moderation and outpainting as the survivors. What five months taught us about AI in products.
Read more →
Cloudflare rebuilt Next.js in a week with one engineer and 800 Claude sessions. The real story isn't the speed - it's what happens when test suites become machine-readable specs.
Read more →
35% of enterprises have already replaced SaaS with custom builds. The cost of building collapsed. The cost of buying didn't. And corporate procurement hasn't caught up.
Read more →
AI collapsed the cost of rebuilding. Corporate decision-makers haven't caught up. The reasoning behind 'but we already built it' no longer holds.
Read more →
Tokens are nouns. Patterns are verbs. The missing layer is grammar: a shared vocabulary that spans Figma, web, and native without breaking when someone ships a 'small' change.
Read more →
Two weeks ago we found TeammateTool hiding in Claude Code's binary. Now it's official. Here's what changed, what didn't, and what the docs reveal about where multi-agent is heading.
Read more →
Salesforce quietly walked back autonomous AI agents to deterministic scripting. The pattern reveals when LLMs work - and when they don't.
Read more →
Anthropic built a full multi-agent orchestration system into Claude Code. It's feature-flagged off. The community found it anyway.
Read more →
Factory AI's Luke predicts the future isn't more powerful models - it's AI that enforces software engineering best practices by default. Here's why that matters more than you think.
Read more →
Convex, Vite, Clerk, shadcn, Cloudflare, Resend. A modern stack where every component has a generous free tier, agents do the heavy lifting, and you don't touch infrastructure until you have paying customers.
Read more →
From 20 lines of shell to production apps. Anthropic renamed Claude Code SDK to Agent SDK because deep research is now a first-class use case.
Read more →
Claude Code loves to jump straight into implementation. Sometimes you need a model that thinks first. Here's how I use Codex for systems thinking and architecture decisions.
Read more →
Converting text to images for 20x token compression. Interesting research or production-ready breakthrough? A critical look at the trade-offs.
Read more →
How I built a self-improving document parser that learns from corrections without fine-tuning. The pragmatic alternative to model training.
Read more →
Real-time AI generation vs curated libraries: lessons from building the same product twice with radically different architectures.
Read more →
When expensive SSO was just a symptom of deeper architectural problems, we redesigned our multi-tenant system from first principles and cut costs significantly in the process.
Read more →
Building a multi-stage AI content pipeline where each generation depends on the last. Lessons from generating thousands of hybrid creatures with resilient error handling.
Read more →
Real lessons from shipping multiplayer games with Firebase: what works for small groups, where it breaks down, and the scalability limits you need to know upfront.
Read more →
Flutter state management without the boilerplate using Hooks, RxDart, and Functional Widgets for reactive, testable code.
Read more →
Part 2: Adding inter-service communication to .NET Core microservices using HttpClient and custom logging services.
Read more →
Building lightweight microservices with .NET Core and Nancy framework before ASP.NET Core MVC reached maturity.
Read more →
Enabling frontend developers to work on ASP.NET Core views without writing C# by building custom middleware for Razor-only routing.
Read more →