Anthropic will not publicly release its Claude Mythos Preview AI model due to security risks.
The model can identify and exploit software vulnerabilities with unprecedented accuracy.
Access will be restricted to vetted researchers under controlled conditions for defensive study.
The decision reflects Anthropic's safety-first principles and the dual-use dilemma of advanced AI.
📖 Full Retelling
Artificial intelligence company Anthropic announced on Tuesday that it will not release its new Claude Mythos Preview AI model to the public, citing the model's unprecedented capability to identify and exploit software vulnerabilities as a significant security risk. The decision was made public from the company's headquarters in San Francisco, following internal safety evaluations that revealed the model's potential for misuse in cyberattacks.
Anthropic, known for its focus on developing safe and controllable AI, stated that the Claude Mythos Preview demonstrated a level of precision in vulnerability discovery and exploitation that far exceeds current public tools. During testing, the model could autonomously analyze codebases, identify zero-day vulnerabilities, and generate functional exploit code. This capability, while a technical breakthrough, raised immediate red flags within the company's safety and policy teams, who determined the risks of public release outweighed any potential benefits for security research.
The company emphasized that this is a proactive, safety-first decision aligned with its constitutional AI principles, which prioritize preventing harm. Anthropic will instead restrict access to a small group of vetted security researchers under strict controlled conditions for further study on defensive applications. This move highlights the growing industry dilemma of balancing AI advancement with security ethics, as models become increasingly capable of dual-use tasks that can be weaponized.
Anthropic's stance contrasts with some industry practices of releasing powerful models broadly, setting a precedent for more cautious deployment of advanced AI in sensitive domains. The company plans to publish a detailed technical paper on the model's architecture and safety findings while maintaining the code and weights private, aiming to contribute to academic understanding without enabling malicious use.
# Anthropic PBC
**Anthropic PBC** is an American artificial intelligence (AI) safety and research company headquartered in San Francisco, California. Established as a public-benefit corporation, the organization focuses on the development of frontier artificial intelligence systems with a primary e...
Claude is a series of large language models developed by Anthropic. The first model was released in March 2023, and the latest, Claude Opus 4.6, in February 2026.