Today, Anthropic released Claude Opus 4.7, touting it as the most advanced version in its series to date. Our tests confirmed that the model’s capabilities align with Anthropic’s promotional claims.
“Claude Opus 4.7 is now widely accessible,” stated Anthropic in their announcement. “Users are finding they can delegate challenging coding tasks—previously requiring close oversight—to Opus 4.7 confidently.”
This release comes after a series of complaints about the previous model, 4.6, which users believed had lost its effectiveness. Across platforms like GitHub, Reddit, and X, developers expressed concerns about “AI shrinkflation,” feeling that their investment was yielding diminishing returns.
As reported earlier, Anthropic has been developing Opus 4.7 alongside a more powerful model, Claude Mythos, which is not yet available to the public. When the announcement was made, users who had criticized 4.6’s performance responded with sarcasm, likening it to an early version of 4.6 before perceived downgrades.
Anthropic has consistently denied any intentional reduction in Opus model capabilities for computational efficiency.
Performance benchmarks validate Anthropic’s assertions. On the SWE-bench Multilingual test for coding abilities, Opus 4.7 achieved a score of 80.5%, compared to 77.8% for its predecessor. In the GDPVal-AA assessment, which evaluates knowledge work in finance and legal sectors, it scored 1,753 Elo over GPT-5.4’s 1,674.
On OfficeQA Pro, which tests document reasoning capabilities, Opus 4.7 showed a significant increase from 57.1% to 80.6%, outperforming both GPT-5.4 and Gemini 3.1 Pro. For long-term coherence on the Vending-Bench 2, measuring performance over extended tasks, it achieved a $10,937 money balance compared to $8,018 for 4.6.
In cybersecurity, Opus 4.7 includes new safeguards to detect and block risky requests, with reduced capabilities during training as confirmed by Anthropic. Security experts can join the Cyber Verification Program to access these features, a precursor to wider deployment needed for Mythos-class models.
While not as advanced as Mythos Preview, which excelled in complex security simulations, Opus 4.7 is Anthropic’s public model to test safety measures before releasing more powerful versions.
The updated tokenizer of Opus 4.7 maps inputs to approximately 1.0x–1.35x more tokens depending on the content. The model also demonstrates increased reasoning capabilities at higher effort levels, especially in extended workflows. A migration guide is available for developers upgrading from 4.6.
Our evaluation using a game-building prompt showed Opus 4.7 producing the most polished and challenging game yet, with procedural level generation that maintained balance across difficulties. Although it required one round of bug fixes—unlike its predecessor—it autonomously identified additional bugs without guidance.
Previously, Xiaomi MiMo v2 Prowon for best results, offering visually appealing games at lower costs than Anthropic’s models. However, after fixing initial bugs, Opus 4.7 outperformed in game logic and physics.
The model now integrates its reasoning directly into the main text output, enhancing transparency compared to 4.6’s separate reasoning format.
A unique token usage pattern was observed: a single session exhausted our entire token quota as it drafted multiple versions of the game under various improvement labels.
For developers engaged in intensive coding tasks, this implies either upgrading plans or facing higher API costs unless Anthropic resets quotas. Alternatively, less expensive models could be considered.
Anthropic has warned that Opus 4.7’s increased token usage is expected during complex tasks at higher effort levels.
Opus 4.7 is available today on Claude.ai, the Claude API, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry, with pricing identical to its predecessor: $5 per million input tokens and $25 per million output tokens. It can be accessed using the string claude-opus-4-7.