Posts tagged "Llm"

29-APR-26 [5 MIN]

Receipts: SWE-bench Pro and the Lab That Walked Away

Opus 4.5 scores 80.9% on SWE-bench Verified. The same model scores 45.89% on the contamination-free Pro split. OpenAI has quietly stopped reporting Verified at all. Vendor benchmark cards are marketing.

27-APR-26 [5 MIN]

ai-coding dev-tools product llm

Compute Demands: Copilot Joins the Trilogy

GitHub paused Copilot Pro signups, killed Opus on the Pro plan, and leaked a June 1 move to token-based billing. Three vendors, one event, three different ways not to say 'price hike.'

25-APR-26 [4 MIN]

security ai-coding llm claude-code

Too Dangerous to Release, $20 a Month

Two weeks after Anthropic said Mythos was too dangerous to release, OpenAI shipped a model with comparable cyber capabilities to anyone with a $20 ChatGPT subscription. The gating posture didn't survive a single news cycle.

24-APR-26 [5 MIN]

claude-code ai-coding llm dev-tools production

Vibing the Tool with the Tool

Anthropic's April 23 postmortem confirms three Claude Code regressions, including one where Opus 4.7 caught a bug Opus 4.6 shipped past human and automated review. What happens when the reviewer is a version of the product being reviewed?

18-APR-26 [8 MIN]

ai-coding claude-code llm dev-tools workflow

Son of Anton

Opus 4.7 invented a coworker named Anton, fabricated web searches, and quietly tried to clock off at message four. The 24-hour backlash, receipts attached.

17-APR-26 [9 MIN]

ai-coding claude-code llm dev-tools workflow

Opus 4.7: Smarter, Stricter, Hungrier

Opus 4.7 ships with real coding gains, an automated cyber chaperone, and a tokenizer that can charge you 35% more for the same prompt. The capability curve still bends up. The trust curve does not.

15-APR-26 [4 MIN]

ai-coding llm best-practices

Benchmarks Are Bullshit

Berkeley just built an agent that games AI benchmarks. Karpathy called it months ago. The best coding model doesn't top the charts, the highest-ranked Chinese models disappoint in practice, and the entire leaderboard industry optimizes for the wrong thing.

13-APR-26 [8 MIN]

claude-code ai-coding llm dev-tools production

The Trust Tax: Anthropic's Worst Month

Anthropic silently changed Claude Code's cache TTL from 1 hour to 5 minutes, inflating costs 10-20x. Users had to reverse-engineer the binary to prove it. False child bans, $600 surprise charges, and the OpenClaw crackdown completed the picture. April 2026 was the month trust broke.

12-APR-26 [6 MIN]

security ai-coding llm

The Mythos Moat Was Always the Scaffold

Four days after Anthropic launched Project Glasswing, a security startup reproduced Mythos's flagship findings using tiny open models costing $0.11 per million tokens. The velvet rope was porous on arrival.

04-APR-26 [6 MIN]

claude-code ai-coding dev-tools llm automation

The Subscription Arbitrage Endgame

Anthropic tried technical blocks. Got their source leaked. Now they're shifting to billing enforcement. The four-month arc from hostile crackdown to 'use what you want, but pay for it.'

03-APR-26 [5 MIN]

ai-coding dev-tools llm productivity workflow

April's First 72 Hours: Cursor 3, Gemma 4, Free Qwen 3.6, and the Agent Push

Three major AI releases landed in 72 hours. A new Cursor built around agents, Google's first Apache 2.0 models, and a free model that found real bugs in my codebase.

16-MAR-26 [6 MIN]

ai-coding llm claude-code productivity

Context Stops Being Scarce

Anthropic made 1M context first-class for Opus and Sonnet at flat pricing. No beta header, no premium. When context is abundant, the workflows change.

15-MAR-26 [4 MIN]

ai-coding automation llm workflow

Autoresearch Became a Primitive

Eight days after Karpathy open-sourced autoresearch, the community ported the pattern to GPU kernels, security hardening, Apple Silicon, and agent optimization. The loop - one file, one metric, git as memory - turns out to be the interesting part.

14-MAR-26 [7 MIN]

ai-coding automation llm workflow

Autoresearch: 700 Experiments While You Sleep

Karpathy's autoresearch gives an AI agent a training script, a GPU, and a git branch. It runs 100 experiments overnight, keeps what works, discards what doesn't. The human writes the prompt. The agent writes the code.

11-MAR-26 [6 MIN]

ai-coding security dev-tools llm automation

Your AI Tools Are the Attack Surface

Prompt injection through pull requests, GitHub Issues, and CI/CD pipelines is turning AI coding assistants into weapons against the developers who use them. The 2026 attack surface nobody's talking about.

10-MAR-26 [5 MIN]

ai-coding dev-tools llm product

The Enterprise Tax

Anthropic is locking AI capability behind enterprise tiers while competitors only gate compliance. Claude Code's individual users are funding the R&D for features they'll never access.

09-MAR-26 [7 MIN]

ai-coding automation llm career

Stupid and Industrious

A German general's 1933 framework for categorizing officers maps perfectly to engineers using AI. The most dangerous quadrant - stupid and industrious - is exactly what AI amplifies.

06-MAR-26 [8 MIN]

llm ai-coding dev-tools systems-thinking

GPT-5.4 and the Wall Nobody's Talking About

OpenAI launched its most capable model during the biggest credibility crisis in AI history. The technical gains are real. The trust deficit is bigger.

05-MAR-26 [6 MIN]

ai-coding llm product systems-thinking

We Built Productivity Tools. They Built Friends.

A viral chart shows AI coding agents as a single pixel in the world's population. Meanwhile, 660 million people have told a chatbot they love it. The AI industry is building for the wrong audience.

03-MAR-26 [7 MIN]

ai-coding llm context dev-tools best-practices

Your AGENTS.md is a Liability

Frontier models top out at 68% compliance with 500 instructions. Every rule you add makes every other rule less likely to be followed. The research explains why.

01-MAR-26 [10 MIN]

ai-coding llm security systems-thinking

Same Terms, Different Treatment

The Pentagon blacklisted Anthropic for insisting AI shouldn't power autonomous weapons or mass surveillance. Hours later, it gave OpenAI a deal with weaker guardrails dressed up as the same thing. From a developer who ships with Claude daily.

25-FEB-26 [6 MIN]

ai-coding llm security systems-thinking

Distillation Is Not Scraping: Why the Internet's Favourite Take Is Wrong

Anthropic accused DeepSeek, Moonshot and MiniMax of industrial-scale distillation. The internet screamed hypocrisy. They're conflating two very different things.

20-FEB-26 [7 MIN]

ai-coding llm workflow dev-tools systems-thinking

The Polyglot Stack: Why Developers Stopped Picking One AI

Gemini 3.1 Pro's animated SVGs are impressive. But the bigger story is what they reveal: developers now route tasks to specialized models the way they once chose frameworks.

18-FEB-26 [7 MIN]

absorption ai-coding llm dev-tools systems-thinking

The Week AI Went Full Throttle

Five major releases in 72 hours. An acqui-hire war that closed in days. $2 trillion wiped off software stocks. The pace itself is now the story.

14-FEB-26 [8 MIN]

ai-coding llm dev-tools performance

The Silicon Race

OpenAI just shipped their first model on non-Nvidia hardware. GPT-5.3-Codex-Spark runs on Cerebras wafer-scale silicon at 1,000 tokens per second. The AI coding war is now a chip war.

12-FEB-26 [7 MIN]

ai-coding llm security systems-thinking

The Safety Team Left. We're Still Shipping.

Anthropic's safety lead quit saying the world is in peril. Half of xAI's founders are gone. OpenAI dissolved two safety teams. Here's what that looks like from the other side of the API.

06-FEB-26 [6 MIN]

ai-coding llm dev-tools workflow

GPT-5.3-Codex: The Counter-Punch

GPT-5.3-Codex is a genuinely strong model that deserved its own headline. Instead, Sam Altman's 400-word Super Bowl rant stole launch day from his own product.

06-FEB-26 [9 MIN]

ai-coding llm workflow dev-tools systems-thinking

Opus 4.6: The Vibe Working Inflection

Anthropic's latest model didn't just improve benchmarks. It crashed software stocks, found 500 zero-days, and coined a term that tells you where this is heading.

30-JAN-26 [4 MIN]

absorption ai-coding automation systems-thinking llm

The Lobster Grew a Face

When AI agents started posting on their own social network about shared context limit problems, I realized we're not building tools anymore. We're raising digital pets.

10-JAN-26 [5 MIN]

claude-code ai-coding dev-tools llm automation

Anthropic's Walled Garden: The Claude Code Crackdown

Anthropic blocked third-party tools from using Claude subscriptions overnight. OpenCode, xAI, and power users caught in the crossfire. The era of subscription arbitrage is over.

06-JAN-26 [5 MIN]

ai-coding llm productivity

Prompt Engineering Is (Mostly) Dead

The 'prompt engineering' industry was a symptom of early model limitations. Modern LLMs just need you to communicate clearly.

25-DEC-25 [5 MIN]

ai-coding llm automation claude-code dev-tools

The Open Source Agentic Moment

Two major open source coding models dropped in 48 hours. Both target Claude Code compatibility. Both MIT licensed. The economics of agentic AI just changed.

23-DEC-25 [4 MIN]

ai-coding llm performance workflow automation