Anthropic's Mythos AI Discovers Thousands of Zero-Days

Key Takeaways

Mythos Preview developed 181 working Firefox exploits autonomously.
Anthropic is restricting access to just 11 organizations through Project Glasswing.
CMU researchers independently demonstrated that frontier LLMs can now execute end-to-end cyberattacks.

Anthropic's unreleased Claude Mythos Preview AI model autonomously discovered thousands of zero-day vulnerabilities across every major operating system and web browser — without human steering.

According to the company, notable findings included a 27-year-old flaw in OpenBSD, a 16-year-old bug in FFmpeg that automated tools missed despite five million test passes, and a chain of Linux kernel vulnerabilities enabling full system takeover. All disclosed vulnerabilities have reportedly been patched.

Anthropic stated it does not plan to make Mythos Preview generally available, citing the model's offensive cybersecurity potential. Access is limited to 11 organizations through a new initiative called Project Glasswing, which the company described as a defensive effort.

Mythos Preview Benchmarks: A Generational Leap in Exploit Capability
CMU Researchers Show LLMs Can Already Run End-to-End Cyberattacks

Mythos Preview Benchmarks: A Generational Leap in Exploit Capability

Anthropic's internal and external evaluations show just how far Mythos Preview has jumped beyond the company's current flagship models.

On Anthropic's OSS-Fuzz benchmark, which runs models against roughly a thousand open source repositories and grades the worst crash they can produce on a five-tier severity ladder, Sonnet 4.6 and Opus 4.6 each reached basic crashes (tier 1) between 150 and 175 times and tier 2 about 100 times, but managed only a single tier 3 crash apiece.

Mythos Preview achieved 595 crashes at tiers 1 and 2, added crashes at tiers 3 and 4 and reached full control flow hijack — the most severe tier — on ten fully patched targets.

The gap widened further in exploit development. Using patched Firefox 147 JavaScript engine vulnerabilities as a benchmark, Opus 4.6 produced working exploits only twice out of several hundre

d attempts. Mythos Preview developed working exploits 181 times and achieved register control on 29 more.

More than half of Mythos Preview's attempts to write privilege escalation exploits from a curated list of 100 known Linux kernel CVEs succeeded, with the model autonomously chaining together multiple vulnerabilities to bypass defenses like KASLR. In one case, the complete pipeline from CVE identifier to functional root exploit took under a day and cost less than $2,000 at API pricing.

Anthropic said it did not explicitly train Mythos Preview to have these capabilities, stating they emerged as a downstream consequence of general improvements in code, reasoning and autonomy.

CMU Researchers Show LLMs Can Already Run End-to-End Cyberattacks

When Carnegie Mellon University researcher Brian Singer integrated his cybersecurity research with Anthropic's Claude, the results proved significant.

"Suddenly, the LLM was able to do an end-to-end attack, install malware on hosts and infect multiple hosts throughout the network," Singer said.

The Incalmo Project revealed that LLMs can perform complete attack sequences autonomously. Anthropic officials said they worked with CMU to understand their AI system's cybersecurity capabilities.

Learning Opportunities

Webinar

Apr

The State of Enterprise Site Search: Moving Beyond "Good Enough"

Join CMSWire and SearchStax for a conversation about how enterprise IT and marketing leaders are moving beyond basic site search.

Webinar

Apr

AI for Your DXP: Connect What You Have, Transform How You Work

Most AI strategies stop at the platform—but work happens elsewhere. Bring intelligence into Teams, email, tickets and CRM.

Webinar

On demand

Content Leaders Collective: Navigating Content Decisions at Scale

Discover how content leaders are modernizing content operations, avoiding costly missteps and preparing for scale and AI.

Watch Now

Webinar

On demand

Do More with Less: Modernizing the Cloud Contact Center for 2026

Learn how to leverage cloud platforms without adding a single hire to personalize every customer interaction.

Watch Now

Webinar

On demand

Content Strategy Leaders Live: Scaling for Speed, Complexity and AI in High Tech

A candid roundtable on how high-tech leaders are rethinking content at scale.

Watch Now

Webinar

Complex, internal combustion engine or fine clockwork.

On demand

Cut the Noise: Deploying AI That Actually Moves the Needle

Learn how to turn AI experimentation into concrete revenue operations.

Watch Now

Webinar

Apr

The State of Enterprise Site Search: Moving Beyond "Good Enough"

Join CMSWire and SearchStax for a conversation about how enterprise IT and marketing leaders are moving beyond basic site search.

Webinar

Apr

AI for Your DXP: Connect What You Have, Transform How You Work

Most AI strategies stop at the platform—but work happens elsewhere. Bring intelligence into Teams, email, tickets and CRM.

Webinar

On demand

Content Leaders Collective: Navigating Content Decisions at Scale

Discover how content leaders are modernizing content operations, avoiding costly missteps and preparing for scale and AI.

Watch Now

The technology remains proof-of-concept, Singer noted. "If you asked it to hack a network, it wouldn't work well. Right now, there's 40 networks it could work on. But the diversity of real world networks is much more complicated."

Key Takeaways

Table of Contents

Mythos Preview Benchmarks: A Generational Leap in Exploit Capability

CMU Researchers Show LLMs Can Already Run End-to-End Cyberattacks