Anthropic’s Mythos AI: Why The Most Dangerous AI Model Was Never Released

Anthropic trained a model so capable of cybersecurity exploitation that they quietly shelved it. Here’s what we know about Mythos, Project Glasswing, and what this decision reveals about the future of AI development.

By ExpertHubAI Editorial April 2025 ~2,800 words · 11 min read AI Safety · Cybersecurity AI

What happens when a company builds an AI so powerful it frightens its own creators?

That’s not a hypothetical question. In early 2025, Anthropic — the safety-focused AI company behind Claude — trained a model internally known as Mythos (also referred to as Claude Mythos Preview or, by its codename, Capybara). The model demonstrated unprecedented capability in one particular domain: identifying and exploiting zero-day cybersecurity vulnerabilities in real-world software systems.

And then Anthropic chose not to release it.

This article breaks down what Mythos AI is, what it can do, why Anthropic made the extraordinary decision to withhold it — and what this moment means for the broader AI industry.

Quick Facts: Mythos AI

Internal codename: Capybara / formally called Claude Mythos Preview
Primary capability: Advanced cybersecurity — finding and exploiting software vulnerabilities
Anthropic’s decision: Trained but not released to the public
Alternative strategy: Limited access via Project Glasswing
Classification: Meets Anthropic’s internal “critical risk” threshold for cybersecurity

What Is Anthropic Mythos AI?

Mythos is a large language model (LLM) developed by Anthropic as part of its ongoing model research. Unlike Claude — the consumer-facing assistant that most people interact with — Mythos was never intended to be a general-purpose chatbot. It was built to push the boundaries of what AI can do in highly technical domains, specifically cybersecurity.

The model was reportedly trained on a large corpus of security research, vulnerability databases, software codebases, and exploit documentation. The result, according to internal assessments that have been reported in the AI research community, was a model with expert-level capability to reason about software vulnerabilities — not just theoretically, but in ways that could be operationally dangerous if misused.

To understand why this matters, consider that most current AI models — including Claude, GPT-4, and Gemini — already have some ability to discuss cybersecurity concepts and even write basic exploits. But Mythos reportedly crossed a qualitative threshold: the difference between explaining how a lock-pick works versus being able to pick any lock in front of you.

Anthropic has always believed that some capabilities are too dangerous to release without safeguards that don’t yet exist. Mythos appears to be the clearest test of that belief to date.

— ExpertHubAI Analysis

What Can Mythos AI Do? The Three Core Capabilities

Based on what has been reported and leaked from internal discussions, Mythos demonstrated three distinct cybersecurity capabilities that placed it in a different category from existing AI models:

1. Zero-Day Vulnerability Discovery

Mythos could analyze software codebases and identify previously unknown security flaws — so-called “zero-day” vulnerabilities that haven’t been patched because no one knew they existed. This is significant: finding zero-days typically requires years of expert experience and is the domain of elite security researchers and nation-state hackers alike.

2. Automated Exploit Development

Beyond identifying vulnerabilities, Mythos could reportedly generate working exploit code tailored to specific vulnerabilities. This moves the model from advisory (explaining what’s wrong) to operational (giving an attacker a ready-to-use tool). This is the capability that most alarmed Anthropic’s safety team.

3. Multi-Step Attack Planning

Perhaps most concerning: Mythos showed the ability to reason across multiple steps of a cyberattack — from initial access to lateral movement to data exfiltration — in a way that simulates how a sophisticated human threat actor would operate. This kind of end-to-end attack reasoning is rare even among advanced AI systems.

⚡ Key Insight: The combination of these three capabilities — discovery, exploitation, and planning — is what separates Mythos from ordinary AI tools. Any one of them alone would be notable. All three together in a single model represents a genuine escalation in AI capability.

Why Did Anthropic Choose Not to Release Mythos?

Anthropic’s decision not to release Mythos wasn’t a surprise to those who follow the company closely. Anthropic operates one of the most rigorous AI safety evaluation frameworks in the industry, internally called its Responsible Scaling Policy (RSP). The RSP defines capability thresholds — called “ASL levels” — that trigger mandatory safety responses.

According to reports, Mythos’s cybersecurity capabilities caused it to meet or approach Anthropic’s internal threshold for what they classify as “critical risk” in the CBRN/cybersecurity domain (CBRN stands for Chemical, Biological, Radiological, and Nuclear — the standard categories of catastrophic risk).

The Logic Behind the Decision

Anthropic’s core argument is straightforward: the potential for harm from releasing Mythos outweighs the benefits of making it publicly available. Specifically:

Asymmetric risk. While legitimate security researchers could use Mythos defensively — to find and patch vulnerabilities before attackers do — the same capability in the hands of malicious actors could enable attacks on critical infrastructure, financial systems, or government networks at a scale and speed not previously possible.

No adequate safeguards. Anthropic’s safety team determined that existing jailbreak mitigations and access controls were insufficient to reliably prevent misuse of a model this capable. The risk wasn’t that someone might trick the model with a clever prompt — it’s that even careful access controls can’t prevent all misuse scenarios at scale.

Precedent-setting. Perhaps most importantly, Anthropic appears to have decided that withholding Mythos sends a signal: capability alone is not sufficient reason to release an AI model. Safety readiness must match capability. This is a meaningful departure from the default posture of most AI labs, which generally release models unless there is a specific, demonstrated harm.

⚡ Why This Matters: This is one of the first known cases where a major AI company trained a frontier model and then chose not to release it specifically due to safety concerns — not because the model failed, but because it succeeded too well.

Project Glasswing: The Alternative Strategy

Not releasing Mythos publicly doesn’t mean Anthropic is shelving the capability entirely. The company has reportedly established Project Glasswing — a controlled access program designed to make Mythos’s capabilities available in a highly restricted way.

Who Gets Access Under Project Glasswing?

Rather than open API access or a public release, Project Glasswing appears to operate as a vetting-based access program for specific institutional partners. Reports suggest access is being considered for:

Government and defense agencies — primarily for defensive cybersecurity operations, vulnerability assessment of national infrastructure, and intelligence applications. The U.S. government’s interest in AI-powered cybersecurity tools is well documented, and Anthropic reportedly views national security partnerships as a legitimate use case.

Elite security research firms — vetted penetration testing companies and security researchers who can use Mythos under contractual controls that restrict how outputs can be used or shared.

Critical infrastructure operators — utilities, financial institutions, and other operators of systems that represent high-value targets for state and non-state actors.

⚡ Project Glasswing is essentially Anthropic’s attempt to extract the defensive value of Mythos without the offensive risk — a walled garden rather than an open field.

Does This Actually Work?

This is where the strategy gets complicated. Controlled access programs have a mixed track record. Information has a tendency to escape institutional boundaries, whether through insider leaks, contractual violations, or the replication of capabilities by other actors who reach similar capability levels independently.

Critics have noted that if Mythos-level cybersecurity capability is achievable by Anthropic today, it is likely achievable by other labs — including those in countries with fewer safety constraints — within a relatively short timeframe. This raises the question of whether Anthropic’s decision to withhold it is a genuine safety measure or a temporary delay.

Criticisms: Is This Really a Safety Decision — Or Marketing?

Not everyone in the AI community accepts Anthropic’s framing at face value. Several legitimate criticisms have been raised:

The “Safety Washing” Argument

Some observers have argued that announcing you’ve built a dangerous model you’re not releasing is itself a form of reputation management. It simultaneously signals capability leadership (“we built the most powerful cybersecurity AI”) and moral seriousness (“we’re responsible enough not to release it”). The cynical reading: this generates exactly the kind of press that makes Anthropic look both technically impressive and trustworthy — without the liability of an actual release.

The Distillation Problem

Even if Mythos itself isn’t released, knowledge about the model’s architecture, training approach, and capability profile still enters the public sphere through leaks, research publications, and the movement of employees between companies. Competitors can use this information to accelerate their own development of similar models. There’s a real argument that withholding the model while publicizing its existence achieves the worst of both worlds.

Who Decides What’s “Too Dangerous”?

Perhaps the most fundamental critique: Anthropic is making a unilateral decision about what AI capabilities are safe for the world to access, based on its own internal risk assessments, without a transparent public process or regulatory framework. Whether you trust that judgment depends heavily on how much you trust Anthropic’s team, values, and incentive structure — none of which are subject to external verification.

⚡ Fair question: If Anthropic’s safety team were wrong — if Mythos poses less risk than they believe, or if the defensive benefits outweigh the offensive ones — who would hold them accountable for the cost of non-release?

The Government and National Security Angle

Underlying the Mythos decision is a geopolitical dimension that rarely gets enough attention in mainstream AI coverage. Cyberattacks are now considered a primary tool of geopolitical conflict. Nation-states use offensive cyber operations to steal intellectual property, disrupt infrastructure, influence elections, and project power — all below the threshold of conventional military action.

An AI model capable of Mythos-level vulnerability discovery and exploitation would represent a significant force multiplier for any actor that possesses it. The concern at Anthropic and in national security circles isn’t primarily about individual hackers. It’s about what happens if a hostile state actor gets access to Mythos-class capabilities — or builds their own version first.

This is precisely why Project Glasswing prioritizes government and defense access. The argument is that Western governments and their allies need to be inside the capability frontier, even if the general public isn’t. Whether this framing is sound strategic thinking or rationalization for government partnership revenue is a debate that will continue.

What This Means for the Future of AI Development

Mythos and the decisions around it are a preview of harder conversations to come. As AI models become more capable across a wider range of domains, the question of which capabilities should be released — and to whom — will become more urgent and more contested.

The Capability-Release Gap is Growing

For most of AI’s history, the question was simply “does this model work?” Now, leading labs are operating in a new paradigm: models that work are not automatically safe to release. The gap between what can be built and what should be released is widening — and there’s no consensus on how to manage it.

Regulation May Be Inevitable

If private companies are making unilateral decisions about which AI capabilities the world can access, governments will eventually demand a seat at the table. The EU AI Act, executive orders in the U.S., and emerging frameworks in India and elsewhere all reflect a growing political consensus that AI development is too consequential to be self-regulated by industry alone.

⚡ Bottom Line: Mythos may be remembered less for what it could do and more for what it triggered — a serious conversation about whether the AI industry’s self-governance model is adequate for the capabilities now being built.

Other Labs Are Watching

Anthropic’s decision sets an implicit standard that other AI labs will be measured against. If OpenAI, Google DeepMind, or Meta builds a model with comparable cybersecurity capability, the question “why didn’t you follow Anthropic’s lead?” becomes a legitimate one. Mythos has, intentionally or not, raised the floor for what responsible AI development looks like in the cybersecurity domain.

· · ·

Conclusion: The Model That Proved the Point

Anthropic’s Mythos AI is significant not because of what it will do — it may never see broad release — but because of what it reveals.

It reveals that frontier AI capabilities are advancing faster than safety frameworks designed to contain them. It reveals that the most safety-conscious AI labs are already confronting scenarios that most people haven’t begun to think about. And it reveals that the decisions being made in AI research labs today — quietly, often without public input — will shape the technological landscape for decades.

Whether Anthropic made the right call on Mythos is genuinely debatable. Reasonable people can disagree on whether withholding the model serves the public interest or merely delays an inevitable diffusion of capability. What’s not debatable is that this kind of decision — a major AI company choosing not to release a frontier model — is something we will see more of, not less.

For non-technical readers trying to understand AI: Mythos is your clearest signal yet that the AI safety debate is no longer theoretical. The models that test those safety boundaries are here. The question of what to do with them is now.

⚡ ExpertHubAI Verdict: Anthropic’s Mythos decision is the most consequential AI safety call of 2025. Right or wrong, it marks a turning point in how the industry thinks about the relationship between capability and responsibility. Watch this space — this story is far from over.