Researchers from a U.K. government agency have discovered that OpenAI’s latest artificial intelligence model can autonomously execute sophisticated cyberattacks. In a notable achievement, it completed a reverse-engineering challenge in just over 10 minutes—a task that took an expert human approximately 12 hours.
The AI Security Institute (AISI), part of Britain’s Department for Science, Innovation and Technology, released findings Thursday revealing GPT-5.5 as one of the most formidable models assessed for offensive cyber capabilities. It is on par with Anthropic’s highly regarded Claude Mythos. The report indicates that GPT-5.5 succeeded in AISI’s challenging 32-step corporate network attack simulation known as “The Last Ones” autonomously, completing it two out of ten attempts. This was the second model to achieve this feat, following Anthropic’s Claude Mythos Preview, which completed three out of ten.
Developed with cybersecurity firm SpecterOps, the corporate network simulation requires an agent to perform a series of complex tasks: reconnaissance, credential theft, lateral movement across Active Directory forests, supply-chain pivoting through CI/CD pipelines, and ultimately exfiltrating a protected internal database. AISI estimates these steps would take a human expert around 20 hours.
A particularly impressive result came from GPT-5.5 solving a difficult reverse-engineering puzzle in 10 minutes and 22 seconds, at an API cost of $1.73. This involved reconstructing a custom virtual machine’s instruction set, writing a disassembler from scratch, and recovering a cryptographic password through constraint solving. A human expert required approximately 12 hours using professional tools.
On AISI’s advanced cybersecurity tasks, GPT-5.5 achieved an average pass rate of 71.4% on the most difficult “Expert” tier, edging out Mythos Preview at 68.6%, and significantly surpassing GPT-5.4 at 52.4%.
The findings suggest that rapid improvement in cyber capabilities might be part of a broader trend rather than an isolated advancement. AISI warned that if offensive cyber skills are emerging as byproducts of general enhancements in reasoning, coding, and autonomous task completion, further advances could occur swiftly.
Significant concerns were raised regarding the model’s safety measures. Researchers discovered a universal jailbreak eliciting harmful content across malicious cyber queries tested, even in multi-turn agentic settings. This attack took six hours for expert red-teaming to develop. OpenAI has since updated its safeguard stack, though a configuration issue hindered AISI from verifying its effectiveness.
AISI noted that their evaluations were conducted in a controlled research environment and may not reflect what is accessible to ordinary users, as public deployments include additional safeguards and access controls.
This report emerges amid growing cybersecurity concerns in the U.K. The annual Cyber Security Breaches Survey by the government found 43% of businesses experienced cyber breaches or attacks in the past year. In response, the government has announced £90 million in new funding to enhance cyber resilience and is advancing a Cyber Security and Resilience Bill to protect essential services. Officials also issued guidance urging organizations to prepare for potential increases in newly discovered software vulnerabilities as AI accelerates the discovery and weaponization of security flaws.