Xiaomi Unveils MiMo 2.5 Pro AI: Integrated Vision, Hearing, and Action in One Model

Xiaomi has once again expanded its family of AI models with the introduction of MiMo-V2.5 and MiMo-V2.5-Pro. Previously, the company released MiMo-V2-Pro, a trillion-parameter model previously known as ‘Hunter Alpha’ on OpenRouter. This new model was tested by our team and demonstrated impressive capabilities.

In this latest release, Xiaomi integrates multimodal functionalities—vision, hearing, and action—into a single package for the first time in its AI suite. While the V2-Pro focused solely on text and code, MiMo-V2-Omni handled multimodal tasks separately but with lower efficiency. The MiMo-V2.5 series consolidates these capabilities into one more powerful model.

This advancement holds significant implications for everyday users. For instance, uploading a photo of your fridge can prompt the AI to suggest dinner recipes; entering a video tutorial might yield a step-by-step summary; recording meetings allows extraction of action items—all seamlessly integrated within this single platform.

Xiaomi asserts that MiMo-V2.5-Pro marks a substantial progression in agentic capabilities, software engineering complexity, and handling long-term tasks, claiming parity with leading models like Claude Opus 4.6 and GPT-5.4 across various benchmarks. While it shows strong performance on most coding and agent evaluations, some gaps persist in more challenging reasoning scenarios.

The base model, MiMo-V2.5, offers everyday utility at a faster rate (100–150 tokens per second) and lower cost ($0.40 input / $2.00 output), supporting all modalities except those exclusive to the Pro version. Both models support a 1M-token context window.

Performance on SWE-bench Pro, a coding evaluation involving real-world startup codebases, shows MiMo-V2.5-Pro resolving 57.2% of tasks, near the top in its category. It performs comparably on τ3-bench and ClawEval, yet lags behind GPT-5.4 by 10 points on Humanity’s Last Exam, a test covering graduate-level problems across various fields.

MiMo-V2.5-Pro excels in token efficiency, reportedly using 42% fewer tokens than Kimi K2.6 for similar scores, and MiMo-V2.5 requires nearly half the tokens of Muse Spark at equivalent performance levels—a critical factor for developers processing vast numbers of requests daily.

On multimodal tasks, MiMo-V2.5’s results align closely with GPT/5.4 and Gemini 3.1 Pro, nearing Opus 4.6 standards.

Since December 2025, Xiaomi has released three major AI models: starting with MiMo-V2-Flash, followed by the V2-Pro/Omni/TTS trio in March, and now the V2.5 series. The company announced a commitment of $8.7 billion in AI investment over the next three years.

The rapid release pace is partly driven by Xiaomi’s models accounting for roughly 21% of OpenRouter traffic as of early April—a 42% increase over seven days—indicating significant resource allocation and pressure to innovate swiftly. This momentum was likely fueled by the popular agentic AI tool Hermes, which partnered with Xiaomi, offering users free access to MiMo v2 Pro temporarily.

Xiaomi plans to open source these models soon. While immediate access through its AI Studio was unavailable upon launch, the models are accessible via the Xiaomi MiMo API for developers.

The company is already developing the next generation of models with enhanced reasoning, tool integration, and real-world grounding, suggesting that future announcements may not be far off.

Platform Hexoria 24 officieel vertrouwd platform voor AI-handel