Ai Agents 3 min read

Anthropic Limits Claude Mythos Following 83% Exploit Success

Anthropic has restricted its new Claude Mythos model to select partners after pre-release testing revealed autonomous cyberattack capabilities.

Anthropic has restricted access to its new Claude Mythos Preview model after pre-release testing revealed autonomous vulnerability discovery and exploitation capabilities. Following a closed-door briefing to the House Homeland Security Committee on May 13, the U.S. administration is considering mandatory pre-release vetting for frontier AI models. For security researchers and enterprise defenders, the model’s capabilities represent a shift from static code analysis to automated threat discovery.

Autonomous Vulnerability Discovery

During pre-release evaluation, Mythos identified thousands of previously unknown zero-day vulnerabilities across Windows, macOS, Linux, and all major web browsers. The model demonstrated deep reasoning across legacy codebases, successfully uncovering a 27-year-old vulnerability in OpenBSD, an operating system architected specifically for security hardening.

The UK AI Safety Institute verified the model’s autonomous agency during a simulated 32-step corporate network attack. Mythos completed the intrusion chain in minutes, a process that typically requires roughly 20 hours for a skilled human red team. If you work in threat intelligence, this performance compresses the time-to-exploit window for novel vulnerabilities.

Exploit Generation and Evasion

Mythos extends beyond identification by actively writing and deploying functional exploits. In testing, the model bypassed standard defensive measures, including sandboxing and memory protection, to execute payloads.

Capability MetricMythos Preview Result
First-Attempt Exploit Success>83%
Attack Chain Execution32 steps (completed in minutes)
OS CoverageWindows, macOS, Linux, OpenBSD

Organizations that already use AI for code review will need to adjust their risk models to account for this capability. The ability to automatically string together complex exploit chains forces security operations to evaluate and test AI agents as a component of their adversarial simulation routines.

Project Glasswing Access and Pricing

Anthropic deemed the model too dangerous for general availability, instead restricting access to a consortium called Project Glasswing. The group consists of approximately 50 organizations, including AWS, Apple, Microsoft, Google, CrowdStrike, and Palo Alto Networks. The National Security Agency is currently utilizing the model to assess and patch vulnerabilities in government software.

Access to the API requires strict authorization and is priced at five times the rate of the previous Claude Opus 4.6 model. To incentivize defensive research, Anthropic committed $100 million in usage credits for Glasswing partners to discover and remediate bugs in foundational software. This deployment strategy mirrors recent specialized models restricted entirely to trusted defense partners.

The transition from static code scanning to autonomous exploit generation changes how security teams prioritize software updates. You should assume threat actors will eventually replicate these capabilities using open-source alternatives, making zero-day patching speed your primary operational metric.

Get Insanely Good at AI

Get Insanely Good at AI

The book for developers who want to understand how AI actually works. LLMs, prompt engineering, RAG, AI agents, and production systems.

Keep Reading