Earlier this month, Anthropic launched Claude Mythos, restricting access to a select group of tech firms and suggesting it posed significant public risks. This prompted Treasury Secretary Scott Bessent and Fed Chair Jerome Powell to urgently consult with Wall Street executives, reigniting concerns labeled as ‘vulnpocalypse’ in security discussions.
However, recent findings by researchers have added complexity to this narrative.
A team from Vidoc Security successfully replicated Anthropic’s Mythos results using publicly available models GPT-5.4 and Claude Opus 4.6 within an open-source coding platform named opencode—without any exclusive access or internal resources from Anthropic. “We replicated Mythos findings in opencode with public models, not relying on Anthropic’s private setup,” stated Dawid Moczadło, a researcher involved in the study, via X after releasing their results. He noted that Anthropic’s Mythos announcement should be interpreted as an indication of changing economics in vulnerability discovery rather than possessing a unique model.
The cases examined by Vidoc mirrored those publicized by Anthropic: vulnerabilities within server file-sharing protocols, networking stacks on security-focused operating systems, embedded video-processing software across media platforms, and two cryptographic libraries crucial for digital identity verification online. Both AI models successfully reproduced two bug scenarios in all trials. Claude Opus 4.6 independently rediscovered a bug in OpenBSD thrice, whereas GPT-5.4 did not identify it once. Some bugs were partially identified, with models pinpointing relevant code but missing exact root causes.
Each analysis remained under $30 per file, enabling researchers to uncover vulnerabilities similar to those found by Anthropic at minimal cost.
“AI models are already competent in narrowing down the search area and identifying substantial leads, sometimes even determining the complete root cause within established code,” Moczadło added on X.
Their method mirrored what Anthropic publicly described: feeding a model with a codebase, allowing exploration, parallelizing attempts, and filtering results. Vidoc replicated this architecture using open-source tools. A planning agent divided files into segments while a detection agent assessed each segment, further validating findings across the repository.
The study does not claim public models match Mythos in all aspects; for instance, Anthropic’s model not only detected a FreeBSD bug but also developed an attack blueprint to exploit it remotely by chaining code fragments. Vidoc’s models identified the flaw without constructing such an attack plan. The real gap lies in understanding how to exploit these vulnerabilities.
Moczadło emphasized that while public models are not as powerful, accessing the initial steps of vulnerability discovery has become less expensive, with validation remaining complex. Anthropic’s safety report acknowledged that Cybench benchmarks no longer accurately reflect current AI capabilities after Mythos cleared them. The lab predicted similar capabilities could emerge from other labs within six to 18 months.
The Vidoc study suggests the discovery aspect is already accessible outside exclusive programs, with detailed findings and methodologies available on their official site.