Opus 4.5 scores 80.9% on SWE-bench Verified. The same model scores 45.89% on the contamination-free Pro split. OpenAI has quietly stopped reporting Verified at all. Vendor benchmark cards are marketing.
Read more →
GitHub paused Copilot Pro signups, killed Opus on the Pro plan, and leaked a June 1 move to token-based billing. Three vendors, one event, three different ways not to say 'price hike.'
Read more →
Two weeks after Anthropic said Mythos was too dangerous to release, OpenAI shipped a model with comparable cyber capabilities to anyone with a $20 ChatGPT subscription. The gating posture didn't survive a single news cycle.
Read more →
Anthropic's April 23 postmortem confirms three Claude Code regressions, including one where Opus 4.7 caught a bug Opus 4.6 shipped past human and automated review. What happens when the reviewer is a version of the product being reviewed?
Read more →
Opus 4.7 invented a coworker named Anton, fabricated web searches, and quietly tried to clock off at message four. The 24-hour backlash, receipts attached.
Read more →
Opus 4.7 ships with real coding gains, an automated cyber chaperone, and a tokenizer that can charge you 35% more for the same prompt. The capability curve still bends up. The trust curve does not.
Read more →
Berkeley just built an agent that games AI benchmarks. Karpathy called it months ago. The best coding model doesn't top the charts, the highest-ranked Chinese models disappoint in practice, and the entire leaderboard industry optimizes for the wrong thing.
Read more →
Anthropic silently changed Claude Code's cache TTL from 1 hour to 5 minutes, inflating costs 10-20x. Users had to reverse-engineer the binary to prove it. False child bans, $600 surprise charges, and the OpenClaw crackdown completed the picture. April 2026 was the month trust broke.
Read more →
Four days after Anthropic launched Project Glasswing, a security startup reproduced Mythos's flagship findings using tiny open models costing $0.11 per million tokens. The velvet rope was porous on arrival.
Read more →
Anthropic tried technical blocks. Got their source leaked. Now they're shifting to billing enforcement. The four-month arc from hostile crackdown to 'use what you want, but pay for it.'
Read more →
Three major AI releases landed in 72 hours. A new Cursor built around agents, Google's first Apache 2.0 models, and a free model that found real bugs in my codebase.
Read more →
Anthropic made 1M context first-class for Opus and Sonnet at flat pricing. No beta header, no premium. When context is abundant, the workflows change.
Read more →
Eight days after Karpathy open-sourced autoresearch, the community ported the pattern to GPU kernels, security hardening, Apple Silicon, and agent optimization. The loop - one file, one metric, git as memory - turns out to be the interesting part.
Read more →
Karpathy's autoresearch gives an AI agent a training script, a GPU, and a git branch. It runs 100 experiments overnight, keeps what works, discards what doesn't. The human writes the prompt. The agent writes the code.
Read more →
Prompt injection through pull requests, GitHub Issues, and CI/CD pipelines is turning AI coding assistants into weapons against the developers who use them. The 2026 attack surface nobody's talking about.
Read more →
Anthropic is locking AI capability behind enterprise tiers while competitors only gate compliance. Claude Code's individual users are funding the R&D for features they'll never access.
Read more →
A German general's 1933 framework for categorizing officers maps perfectly to engineers using AI. The most dangerous quadrant - stupid and industrious - is exactly what AI amplifies.
Read more →
OpenAI launched its most capable model during the biggest credibility crisis in AI history. The technical gains are real. The trust deficit is bigger.
Read more →
A viral chart shows AI coding agents as a single pixel in the world's population. Meanwhile, 660 million people have told a chatbot they love it. The AI industry is building for the wrong audience.
Read more →
Frontier models top out at 68% compliance with 500 instructions. Every rule you add makes every other rule less likely to be followed. The research explains why.
Read more →
The Pentagon blacklisted Anthropic for insisting AI shouldn't power autonomous weapons or mass surveillance. Hours later, it gave OpenAI a deal with weaker guardrails dressed up as the same thing. From a developer who ships with Claude daily.
Read more →
Anthropic accused DeepSeek, Moonshot and MiniMax of industrial-scale distillation. The internet screamed hypocrisy. They're conflating two very different things.
Read more →
Gemini 3.1 Pro's animated SVGs are impressive. But the bigger story is what they reveal: developers now route tasks to specialized models the way they once chose frameworks.
Read more →
Five major releases in 72 hours. An acqui-hire war that closed in days. $2 trillion wiped off software stocks. The pace itself is now the story.
Read more →
OpenAI just shipped their first model on non-Nvidia hardware. GPT-5.3-Codex-Spark runs on Cerebras wafer-scale silicon at 1,000 tokens per second. The AI coding war is now a chip war.
Read more →
Anthropic's safety lead quit saying the world is in peril. Half of xAI's founders are gone. OpenAI dissolved two safety teams. Here's what that looks like from the other side of the API.
Read more →
GPT-5.3-Codex is a genuinely strong model that deserved its own headline. Instead, Sam Altman's 400-word Super Bowl rant stole launch day from his own product.
Read more →
Anthropic's latest model didn't just improve benchmarks. It crashed software stocks, found 500 zero-days, and coined a term that tells you where this is heading.
Read more →
When AI agents started posting on their own social network about shared context limit problems, I realized we're not building tools anymore. We're raising digital pets.
Read more →
Anthropic blocked third-party tools from using Claude subscriptions overnight. OpenCode, xAI, and power users caught in the crossfire. The era of subscription arbitrage is over.
Read more →
The 'prompt engineering' industry was a symptom of early model limitations. Modern LLMs just need you to communicate clearly.
Read more →
Two major open source coding models dropped in 48 hours. Both target Claude Code compatibility. Both MIT licensed. The economics of agentic AI just changed.
Read more →
Top 3 intelligence. Top 5 price. Top speed. Flash beats Pro on SWE-bench and changes the economics of agentic workflows.
Read more →
OpenAI's latest model isn't about better prompting - it's about better delegation. What that means for 2026, and how it compares to Opus 4.5.
Read more →
Anthropic denied issues for weeks, then published a postmortem admitting three bugs degraded 16% of Claude requests. The pattern keeps repeating.
Read more →
Google's Gemini 3 just broke every benchmark that matters. What that means for the 'AI has hit a wall' narrative, and where it actually helps.
Read more →
Converting text to images for 20x token compression. Interesting research or production-ready breakthrough? A critical look at the trade-offs.
Read more →
How I built a self-improving document parser that learns from corrections without fine-tuning. The pragmatic alternative to model training.
Read more →