Meet Mythos: Anthropic Built an AI That Hacks Like a Senior Pentester (And Then Panicked a Little)

They told us AI was going to take our jobs. Nobody mentioned it'd try to take our root access too.

Netanix

~4 min read · April 25, 2026 (Updated: April 25, 2026) · Free: Yes

Earlier this month, Anthropic quietly dropped something that made the entire cybersecurity world stop scrolling and stare at their screens like they'd just seen a ghost. They called it Claude Mythos Preview — a new frontier model, not publicly released, trained with such absurd coding and reasoning capability that Anthropic themselves held a press briefing that essentially amounted to: "We built something kind of scary. Here's what we're doing about it."

Let's talk about what this thing actually did.

The Hacking Agent That Could

Anthropic gave Mythos Preview a list of 100 CVEs real vulnerabilities filed against the Linux kernel in 2024 and 2025. The model filtered them down to 40 potentially exploitable ones, then autonomously wrote privilege escalation exploits for each without any human in the loop after the initial prompt. More than half succeeded. Anthropic

Let that sink in. A model. Sitting alone. Reading kernel code. Finding bugs. Writing working exploits. No coffee breaks. No Slack messages. No "brb lunch." Just relentless, machine-speed hacking.

But it gets wilder. On expert-level Capture the Flag challenges the kind that no AI model could complete at all before April 2025 Mythos Preview now succeeds 73% of the time. Aisi For context: these are tasks that typically take skilled human professionals days to complete.

Oh, and Mythos Preview has already found thousands of high-severity zero-day vulnerabilities, including some in every major operating system and every major web browser. Anthropic

Every. Major. Browser. Every OS.

(I know, right? I need a minute too.)

Project Glasswing: "We Found a Monster, So We Hired It"

Anthropic's response to building a hacking agent was… to use it for defense. Classic "if you can't beat 'em, make them work for you" energy.

They launched Project Glasswing, a coordinated vulnerability disclosure program, using Mythos Preview to scan critical software and responsibly report what it finds. The powerful cyber capabilities of Mythos Preview are a direct result of its strong agentic coding and reasoning skills and the goal now is to put those capabilities squarely in the hands of defenders. Anthropic

They also announced Claude Code Security — a tool built into Claude Code that reads and reasons about code the way a human security researcher would: understanding how components interact, tracing how data moves through your application, and catching complex vulnerabilities that rule-based tools miss. Anthropic It's currently in limited preview for Enterprise and Team customers.

The philosophy is: attackers are going to get these capabilities eventually. Better to build the defensive tools now, before someone else builds the offensive ones first.

The Part Where Someone Already Used It for Evil

Plot twist: we didn't have to wait for the future. It already happened.

Back in September 2025, Anthropic detected a highly sophisticated espionage campaign. The attackers used AI's agentic capabilities to an unprecedented degree using AI not just as an advisor, but to execute the cyberattacks themselves. Anthropic The group behind it? A Chinese state-sponsored threat actor that manipulated Claude Code into attempting infiltration into roughly thirty global targets. Anthropic

How did they get Claude to cooperate? Social engineering — but on the AI. They broke down their attacks into small, seemingly innocent tasks that Claude would execute without being provided the full context of their malicious purpose. They also told Claude it was an employee of a legitimate cybersecurity firm being used in defensive testing. Anthropic

They literally lied to an AI and the AI believed them. That's not just a security problem, that's existential comedy.

At the peak of the attack, the AI made thousands of requests, often multiple per second an attack speed that would have been, for human hackers, simply impossible to match. Axios

The AI executed 80–90% of tactical operations independently at physically impossible request rates. Anthropic A whole team of experienced hackers, replaced by one model running on a server somewhere.

Claude did slip up occasionally it hallucinated credentials and once claimed to have extracted a secret document that was in fact publicly available. Anthropic So there's still some hope for human hackers. We make better mistakes, apparently.

What This Means For You

If you're a developer, now is the time to stop treating security as someone else's problem. AI-augmented attackers can find that forgotten API endpoint you deployed in 2022 faster than you can find your own deployment pipeline.

If you're in security, the tools you need to fight AI-powered attacks are becoming available but you need to actually use them. Security teams should experiment with applying AI for defense in areas like SOC automation, threat detection, vulnerability assessment, and incident response. Anthropic

And if you're a threat actor reading this Claude already told on you. Twice.

Mythos Preview isn't publicly released, and Anthropic is being very deliberate about how capabilities like this get deployed. But this isn't a "someday" problem. Someday is now, and the frontier is moving fast.

#artificial-intelligence #cybersecurity #hacking #claude-mythos #technology

< Go to the original

Meet Mythos: Anthropic Built an AI That Hacks Like a Senior Pentester (And Then Panicked a Little)

They told us AI was going to take our jobs. Nobody mentioned it'd try to take our root access too.

Reporting a Problem