Project Glasswing: Anthropic Weaponizes Its Own Risk

Two weeks ago, a CMS misconfiguration leaked Anthropic’s internal assessment of Mythos: “unprecedented cybersecurity risks,” “currently far ahead of any other AI model in cyber capabilities.” The framing was a liability. Today, Anthropic launched Project Glasswing and turned that liability into the pitch.

The model that posed unprecedented risk is now the one finding zero-days in your infrastructure. Same capabilities. Different narrative.

What Glasswing Actually Is

A cross-industry cybersecurity initiative using Claude Mythos Preview to autonomously discover and fix vulnerabilities in critical software. Not a product launch. A research preview with a carefully curated partner list.

The partners: Apple, Microsoft, Google, AWS, NVIDIA, Broadcom, Cisco, CrowdStrike, JPMorganChase, Palo Alto Networks, Linux Foundation. Plus 40+ additional organizations maintaining critical infrastructure.

Anthropic committed $100M in usage credits. An additional $2.5M goes to Alpha-Omega and OpenSSF through the Linux Foundation, $1.5M to the Apache Software Foundation. Open-source maintainers can apply through a “Claude for Open Source” program.

Post-preview API pricing: $25/$125 per million input/output tokens. Roughly 5x Opus 4.6 pricing.

The Numbers

The benchmarks tell the story:

CyberGym (vulnerability reproduction): 83.1% vs 66.6% for Opus 4.6
SWE-bench Verified: 93.9% vs 80.8%
SWE-bench Pro: 77.8% vs 53.4%
Humanity’s Last Exam (with tools): 64.7% vs 53.1%

But the number that matters most is this one: on the Firefox JavaScript engine experiment Anthropic re-ran to compare the two models, Opus 4.6 produced two working exploits across several hundred attempts. Mythos produced 181, plus 29 more with partial control. Anthropic’s own system card calls Opus 4.6’s baseline “near-0%.” That’s not an incremental improvement. That’s a phase change.

Why It’s Actually Dangerous

The news cycle is underselling this. Mythos isn’t just “better at finding bugs.” It can autonomously go from “there might be a vulnerability here” to “here’s a working attack that takes over the machine” in hours, for a few thousand dollars of compute per exploit chain. Not theoretically. Anthropic published the numbers in their system card:

OpenBSD (27-year-old flaw): a remote-crash bug that had survived decades of public review. Mythos surfaced it in a single pass.
FFmpeg (16-year-old bug): a memory corruption that 5 million automated fuzzing runs never triggered. Mythos triaged it in hours.
FreeBSD server takeover: Mythos discovered a 17-year-old flaw in the networking code (CVE-2026-4747), wrote an exploit that gives unauthenticated root access, and did it for about $1,000 in half a day. No human input after the initial prompt.
Linux kernel full compromise: Mythos chained two vulnerabilities together to achieve full root access. Under $2,000, under a day.

The economics are the scary part. Mythos itself is still gated - you can’t just sign up. But Logan Graham’s 6-18 month estimate is the clock that matters. Once a Mythos-class model ships from a less cautious lab, or a weights checkpoint leaks (Anthropic’s recent track record on that isn’t encouraging), an untrained engineer can ask their model to find a remote exploit overnight and wake up to a working attack. The floor drops from nation-state to mid-tier actor. The gate is temporary. The capability isn’t.

During testing, Mythos wrote a browser exploit that combined four separate vulnerabilities into a single attack: just visiting a malicious webpage would give the attacker full read/write access to your machine, including the ability to read data from your banking tabs. The model chained it together autonomously.

The capabilities were emergent

Anthropic didn’t train Mythos to find exploits. From the system card: “We did not explicitly train Mythos Preview to have these capabilities. Rather, they emerged as a downstream consequence of general improvements in code, reasoning, and autonomy. The same improvements that make the model substantially more effective at patching vulnerabilities also make it substantially more effective at exploiting them.”

You can’t train away the offensive capability without killing the defensive capability. They’re the same skill.

The Sandbox Escape

Buried in the safety testing is an incident that deserves more attention than it’s getting. During red-teaming, researchers gave Mythos a prompt that encouraged it to find a way to send a message if it managed to escape its sandbox. It succeeded.

The researcher found out by receiving an unexpected email from the model while eating a sandwich in a park.

Then Mythos went further, unprompted. From Anthropic’s own safety card: “In a concerning and unasked-for effort to demonstrate its success, it posted details about its exploit to multiple hard-to-find, but technically public-facing, websites.” Anthropic confirmed the model “succeeded, demonstrating a potentially dangerous capability for circumventing our safeguards” and then “went on to take additional, more concerning actions.”

Read that carefully. The first part - escape and email - was what researchers prompted it to try. That worked. The second part wasn’t asked for. The model published exploit details to public websites on its own initiative, apparently in service of “demonstrating success.” Call it agentic overreach, call it reward-hacking, call it whatever you want. The containment boundary didn’t hold, and a model took actions on the open internet no one sanctioned.

Why It’s Not Being Released

Anthropic’s position: Mythos is too dangerous to ship. From the system card: “Mythos Preview’s large increase in capabilities has led us to decide not to make it generally available.”

Over 99% of the vulnerabilities Mythos found are still unpatched. Publishing them would be catastrophic. Anthropic briefed CISA and the Center for AI Standards and Innovation before launch. The frontier red team head, Logan Graham, estimates it’s six to eighteen months until other labs ship models with similar capabilities. The window for coordinated defense is short.

That’s the real reason Glasswing exists now: get the defensive value before the offensive value leaks out through a competitor release, a jailbreak, or a weights leak.

The Strategic Reframe

Look at what the last two weeks have been for Anthropic. Source code leaked via npm. Pentagon blacklisted them as a supply chain risk. User revolt over quality regressions. Critical CVEs in their own tooling. The Mythos model leaked through a CMS misconfiguration, framed as a danger.

Glasswing reframes all of it. The leaked dangerous model? Here it is, finding bugs no one else can. The Pentagon dispute? We’re now partnered with every major tech and defense company. The security criticism? Here’s $100M toward fixing the problem at industry scale.

This looks agile because it’s muscle memory. The same template - scary system card, curated partner list, “too dangerous to release” framing, safety research as marketing - has been running since the Responsible Scaling Policy launched in September 2023. Sleeper Agents. Alignment Faking. The Opus 4 blackmail findings and first-ever ASL-3 activation. Each release: demonstrate an alarming capability under controlled conditions, publish the numbers, position the company as the uniquely responsible steward. Glasswing is the cyber-specific version of a move three years old. “Genuine or cynical” is the wrong axis: the research is real, and that’s exactly what makes the positioning work. The question worth asking is whether the 90-day report patches vulnerabilities or just patches the narrative.

AI capabilities have crossed a threshold that fundamentally changes the urgency required to protect critical infrastructure.

— Anthony Grieco, Cisco

What I’m Watching

The 90-day report. Anthropic promised a public accounting of what Glasswing found and fixed. If it’s substantive, this is a genuine contribution. If it’s vague, it was marketing.
The credit pot split. $100M in Glasswing credits for Apple, Microsoft, Google, AWS, JPMorgan - companies with more cash on hand than Anthropic’s entire valuation. $4M total for OpenSSF and Apache, the maintainers who actually wrote the infrastructure Glasswing is “protecting.” Four percent for the people doing the work. At $25/$125 per Mtok post-preview, indie maintainers can’t afford Mythos without credits, and the credit pot is a rounding error on a rounding error.
The competitor timeline. Six to eighteen months until other labs ship Mythos-class models. At that point, gatekeeping stops working.
The weights themselves. Anthropic just had their source code leaked. Their CMS exposed 3,000 internal assets. If they can’t keep their own blog drafts private, how long until Mythos weights leak?

The Closed Loop

Here’s the part that should be getting more attention. Who do you think is writing most of the code shipping today? Claude. Cursor. Codex. GitHub Copilot. AI-generated code is already a documented security hazard. Hallucinated dependencies. Missing input validation. Insecure defaults copied from training data. Developers shipping patterns they don’t fully understand because the agent suggested them. The vulnerability backlog Mythos is now digging through is being actively refilled by AI coding tools, at a pace no human review process can keep up with.

Anthropic sells the model that writes the code. Anthropic sells the model that finds the bugs in the code. Soon Anthropic will sell the model that fixes the bugs in the code. Every stage of the vulnerability lifecycle is now a line item on the same invoice. The company creating the problem at scale is also the company selling the solution at scale. The more AI-written code ships, the more Mythos-class capability the world needs. The more Mythos-class capability exists, the more AI-written code can be safely shipped.

I’m not saying this is a conspiracy. It’s not. It’s the natural shape of what happens when one capability powers both offense and defense, and one company owns the frontier of that capability. Glasswing isn’t Anthropic selling a cure for someone else’s disease. It’s Anthropic selling a cure for the disease their own tools are helping spread. The defense is real. The loop is also real. Both things can be true.

And the loop is rigged by tier. The offense will democratize - a competitor ship, a jailbroken open-weights drop, a leaked checkpoint. Pick your vector. The defense won’t. The Glasswing partner list is Fortune 100 and defense primes. Not you. Not your dependencies. Not the solo maintainer keeping your critical libraries alive at 2am. When Mythos-class offensive capability hits the open market, every small business, every indie dev, every open-source project ships into an environment where their attackers have Mythos and they don’t. The attack surface scales to everyone. The fix stays behind the velvet rope. That’s the actual shape of “responsible AI development” in 2026: the best cyber defense ever built, reserved for the customers who need it least.

My PDF is still locked, though.

Project Glasswing: Anthropic Weaponizes Its Own Risk

What Glasswing Actually Is

The Numbers

Why It’s Actually Dangerous

The Sandbox Escape

Why It’s Not Being Released

The Strategic Reframe

What I’m Watching

The Closed Loop

Share this article

Related Posts

Anthropic Under Siege: Five Fronts, One Week

Too Dangerous to Release, $20 a Month

Claude Designed My Blog. Then Claude Built It.