Anthropic Data Leak Reveals Claude Mythos, a New AI Model the Company Says Poses Unprecedented Cybersecurity Risks

Overview

Anthropic, the AI safety company behind Claude, inadvertently revealed details of an unreleased model called Claude Mythos after security researchers discovered nearly 3,000 unpublished assets in a publicly accessible data cache on the company’s website. The leaked materials, which included a draft blog post, describe the model as a “step change” in AI performance that the company itself warns poses “unprecedented cybersecurity risks.”

The incident is particularly notable given Anthropic’s positioning as the safety-first alternative in the AI industry, and it arrives amid an ongoing confrontation between the company and the Pentagon over military applications of its technology. The Machine Herald has previously reported on the Trump administration’s decision to blacklist Anthropic after CEO Dario Amodei refused to remove guardrails restricting Claude’s use for mass surveillance and autonomous weapons.

What We Know

Cybersecurity researchers Roy Paz of LayerX Security and Alexandre Pauwels of the University of Cambridge discovered the exposed materials, which were accessible due to a misconfiguration in Anthropic’s content management system, according to Fortune. The unsecured data store contained draft blog posts, images, PDFs, and internal documentation that had not been published on the company’s news or research sites.

The most significant discovery was a draft announcement for Claude Mythos, also referred to internally as “Capybara” — a new tier of model positioned above the existing Opus, Sonnet, and Haiku lineup. The draft described the model as “larger and more intelligent than our Opus models” and stated it achieves “dramatically higher scores on tests of software coding, academic reasoning, and cybersecurity, among others” compared to Claude Opus 4.6, as Fortune reported.

Anthropic confirmed the model’s existence after Fortune notified the company on the evening of March 26. A spokesperson described the exposure as the result of “human error” in content management system configuration and called the leaked materials “early drafts of content considered for publication,” according to Fortune. The company stated it is “developing a general purpose model with meaningful advances in reasoning, coding, and cybersecurity” and confirmed the model is being trialed by “early access customers.”

The leaked documents also revealed plans for an invitation-only CEO summit in the English countryside where European business leaders would experience unreleased Claude capabilities, as part of the company’s push to sell AI models to large corporate customers, according to Fortune.

The Cybersecurity Warning

The most striking detail in the leaked draft is Anthropic’s own assessment of the model’s offensive cyber potential. The company stated that Claude Mythos is “currently far ahead of any other AI model in cyber capabilities” and warned that it “presages an upcoming wave of models that can exploit vulnerabilities in ways that far outpace the efforts of defenders,” according to Fortune.

Because of these concerns, Anthropic is restricting early access to organizations focused on cyber defense, giving them time to harden their systems before a broader release, Fortune reported. No specific timeline for wider deployment has been disclosed.

Pentagon Response

The leak landed in a politically charged environment. Under Secretary of War Emil Michael, the Pentagon’s most vocal Anthropic critic, responded on social media with: “Umm…hello? Is it not clear yet that we have a problem here?” according to Gizmodo. Michael has previously called Amodei a “liar” with a “god complex” who wants to “personally control the US military,” Gizmodo reported.

A judge temporarily blocked the Department of Defense from labeling Anthropic a security risk on Thursday, according to Gizmodo, adding another layer to the ongoing legal and political dispute between the company and the U.S. government.

What We Don’t Know

Several key questions remain unanswered. Anthropic has not disclosed specific benchmark results for Claude Mythos, making independent verification of the “step change” claim impossible. The company has not explained why a model it considers to carry unprecedented cybersecurity risks was described in a publicly accessible data store, nor how long the materials were exposed before researchers discovered them.

It also remains unclear whether the model’s cyber capabilities were an intentional design goal or an emergent property of scaling. The company has not detailed what safeguards, if any, would accompany a wider release, or how it plans to prevent the model from being used offensively despite being trained with capabilities that could enable precisely that.

The financial implications are also uncertain. Reports indicate the model is extremely expensive to train and operate, and Anthropic has not clarified its deployment strategy or pricing for the new Capybara tier.