Project Glasswing puts AI to work on software bugs

OraCore Editors

[IND] April 9, 20268 min readOraCore Editors

Project Glasswing puts AI to work on software bugs

Anthropic’s Project Glasswing gives 40+ groups access to Claude Mythos Preview after it found thousands of zero-days across major systems.

Claude Mythos Preview Anthropic Project Glasswing

Share LinkedIn

Project Glasswing puts AI to work on software bugs

Anthropic says its new Project Glasswing brings together 12 major partners, more than 40 additional organizations, and up to $100M in usage credits to hunt software flaws with AI. That is a big number, but the more interesting one is this: the company says its unreleased Claude Mythos Preview model has already found thousands of high-severity vulnerabilities across major operating systems and browsers.

This is one of those announcements that sounds abstract until you read the examples. A 27-year-old bug in OpenBSD, a 16-year-old flaw in FFmpeg, and a multi-step Linux kernel escalation were all found by the model, then patched after disclosure. If those claims hold up under outside scrutiny, the message is simple: AI is now good enough to help break critical software at scale, which means defenders need to move faster than they ever have.

What Project Glasswing actually is

Get the latest AI news in your inbox

Weekly picks of model releases, tools, and deep dives — no spam, unsubscribe anytime.

No spam. Unsubscribe at any time.

Glasswing is Anthropic’s attempt to turn frontier-model cyber skills into a defensive program instead of a weapons race. The launch group includes Amazon Web Services, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, the Linux Foundation, Microsoft, NVIDIA, and Palo Alto Networks.

Anthropic says these partners will use Mythos Preview in defensive security work and share what they learn. The company also opened access to more than 40 other organizations that build or maintain critical infrastructure software, so they can scan first-party code and open-source projects. The funding piece matters too: Anthropic is putting up to $100M in usage credits behind the effort, plus $4M in direct donations to open-source security groups.

12 launch partners joined on day one
40+ additional organizations got access for defensive work
$100M in model usage credits is reserved for the program
$4M goes directly to open-source security organizations

That mix tells you what Anthropic is betting on. This is not a demo or a one-off red-team stunt. It is an attempt to make AI part of the normal security workflow for companies that run operating systems, browsers, cloud infrastructure, and the open-source libraries behind them.

Why the timing matters

Anthropic’s core warning is that the cost of finding and exploiting software bugs has dropped. That matters because software bugs are everywhere, and the ones that survive for years are often the hardest to spot. If a model can inspect code, reason through edge cases, and generate exploit paths without much human steering, then defenders gain speed, but attackers do too.

The company puts the global annual cost of cybercrime at around $500B. That number is always messy, but it gives a sense of scale. A single flaw in a browser, kernel, or media library can cascade across millions of machines. In Anthropic’s telling, AI has moved from helping with code review to something closer to autonomous vulnerability research.

“The window between a vulnerability being discovered and being exploited by an adversary has collapsed—what once took months now happens in minutes with AI.” — Elia Zaitsev, Chief Technology Officer, CrowdStrike

That quote is useful because it captures the operational problem better than the marketing copy does. Security teams are no longer just racing against human attackers. They are racing against automated systems that can inspect huge codebases much faster than a human team can.

Anthropic also says Mythos Preview found vulnerabilities that had survived decades of human review and millions of automated tests. If that is accurate, it suggests current tooling misses entire classes of bugs, especially in mature codebases that everyone assumes are already well understood.

The numbers behind the claim

Anthropic shared a benchmark comparison that gives some shape to the performance gap. On CyberGym, a vulnerability reproduction benchmark, Mythos Preview scored 83.1%, while Claude Opus 4.6 scored 66.6%. That is a large spread for a task where small gains can mean the difference between a missed bug and a patched one.

The company also says the model identified thousands of zero-day vulnerabilities across every major operating system and every major web browser. It did not disclose all of them yet, but it did publish technical details for a subset that have already been fixed. In several cases, the model also generated related exploits without human direction.

CyberGym: 83.1% for Mythos Preview vs 66.6% for Opus 4.6
Thousands of zero-days were found across major operating systems and browsers
OpenBSD bug: 27 years old, could remotely crash a machine
FFmpeg bug: 16 years old, missed by tests run 5 million times
Linux kernel chain: moved from user access to full machine control

Those examples matter because they cover different kinds of software failure. OpenBSD shows that even security-focused systems can hide old bugs. FFmpeg shows that automated testing is still blind to some defects. The Linux kernel example shows how multiple small issues can combine into a serious compromise.

Anthropic says it reported the bugs to maintainers and that they are now patched. For other flaws, the company is only publishing cryptographic hashes for now, with details to follow after fixes land. That is the right call if the goal is defense first, because it reduces the chance that the same findings get reused before patches ship.

How this compares with the rest of the field

Glasswing also reveals how quickly the AI security market is splitting into two tracks: tools that help defenders write better code, and tools that can independently discover flaws. Anthropic is clearly aiming at the second track. That puts it in a different category from general coding assistants such as OpenAI’s Codex or GitHub Copilot, which are useful for productivity but are not being marketed as autonomous vulnerability hunters.

It also puts pressure on the security vendors in the room. CrowdStrike, Microsoft, and Palo Alto Networks are all part of the effort, which suggests the big players think the model is good enough to be worth testing in real workflows. That is a stronger signal than a lab benchmark alone.

Anthropic is pitching autonomous bug discovery, not just code suggestions
GitHub Copilot targets developer productivity, not vulnerability research
Microsoft says Mythos Preview improved on its CTI-REALM benchmark
The Linux Foundation is involved because open source carries much of modern infrastructure

Jim Zemlin of the Linux Foundation put the open-source angle plainly: “By giving the maintainers of these critical open source codebases access to a new generation of AI models that can proactively identify and fix vulnerabilities at scale, Project Glasswing offers a credible path to changing that equation.” That is the key practical point. Most critical software is maintained by small teams, often with limited time and budget, and AI can help fill that gap if the output is accurate enough.

What to watch next

Glasswing is important because it turns a scary capability into a coordinated defensive program, but the real test is whether the disclosures lead to fewer shipped bugs, faster patch cycles, and better secure-by-default code in the systems we all depend on. The next few months should show whether the model’s findings are repeatable by outside teams and whether partners can fold the results into everyday security operations.

My guess: the biggest near-term impact will not be dramatic new attack chains. It will be a steady increase in patch volume for old, embarrassing bugs that have lived in mature code for years. If Anthropic’s numbers keep holding up, the question for every infrastructure team becomes simple: do you want to wait until AI finds your worst bugs first, or do you want the model pointed at your code before attackers get there?

// Related Articles

Project Glasswing puts AI to work on software bugs

What Project Glasswing actually is

Get the latest AI news in your inbox

Why the timing matters

The numbers behind the claim

How this compares with the rest of the field

What to watch next

WebX 2026 turns speaker hype into a conference brief

AI Weekly: 2026-07-06 ~ 2026-07-13

The AI Act should be treated as Europe’s operating system for AI

Booz Allen’s OpenAI Deal Is Real Advantage, Not Hype

OpenSearch’s vector search benchmark in 5 parts

Vector Databases That Work in Production