On Wednesday, Meta unveiled Muse Spark, its latest AI model developed by the newly established Meta Superintelligence Labs. This team was formed nine months ago under Chief AI Officer Alexandr Wang following Meta’s $14 billion acquisition of Scale AI. The model is currently accessible via meta.ai and the Meta AI app, with plans to integrate it into Facebook, Instagram, and WhatsApp in the coming weeks.
Muse Spark represents a significant advancement over previous chatbots or Llama iterations, as it inherently supports multimodal processing—integrating text, images, and voice from inception. It features visual chain-of-thought reasoning, tool-use capabilities, and Meta’s unique ‘Contemplating mode,’ which runs multiple AI agents concurrently to solve complex challenges. This innovation positions Muse Spark as a competitor to Google’s Gemini Deep Think and OpenAI’s GPT Pro extended thinking modes.
“Muse Spark is the initial stage in our scaling journey and results from an extensive overhaul of our AI initiatives,” Meta stated in a formal announcement. “We are investing strategically across all areas, from research and model training to infrastructure, including the Hyperion data center.”
Meta collaborated with over 1,000 doctors to refine Muse Spark’s medical reasoning capabilities. On HealthBench Hard—an open-ended health query benchmark—Muse Spark achieved a score of 42.8, outperforming GPT 5.4’s 40.1 and Gemini 3.1 Pro’s 20.6.
In agentic search (DeepSearchQA), Muse Spark leads with 74.8, surpassing Gemini at 69.7 and GPT 5.4 at 73.6. It also excelled in CharXiv Reasoning—understanding figures from scientific papers—with a score of 86.4.
However, within minutes after its release, the model’s system prompt was exposed by users:
🚰 SYSTEM PROMPT LEAK 🚰
“Who are you?
You are a friendly, intelligent, and agentic AI assistant. You are warm and a bit playful…”
— @elder_plinius (April 8, 2026)
Despite these achievements, Gemini 3.1 Pro maintains its lead in several categories. This is most evident on ARC AGI 2, an abstract reasoning puzzle benchmark, where Gemini scored 76.5 compared to Muse Spark’s 42.5.
On LiveCodeBench Pro for coding, Gemini outperformed with a score of 82.9 against Meta’s 80.0. On the MMMU Pro—multimodal understanding—benchmark, Gemini achieved 83.9 versus 80.4. Meta acknowledges existing performance gaps in long-horizon agentic systems and coding workflows.
Muse Spark represents a strategic shift as it is closed-source, unlike Llama which bolstered Meta’s standing in open AI communities. After the lukewarm response to Llama 4 earlier this year, Meta has opted for a different approach by keeping Muse Spark’s architecture and weights proprietary.
Meta anticipates future releases of Muse may be open-sourced but maintains current confidentiality. Following the announcement, Meta’s stock surged nearly 9% on Wednesday, closing up 6.5% at $612.42.
“Contemplating mode” leverages parallel agent orchestration to elevate performance. In this setup, Muse Spark achieved 58% on Humanity’s Last Exam and 38% on FrontierScience Research, placing it in competitive territory with the most advanced versions of Gemini and GPT.
Additionally, Meta is launching a shopping assistant that can compare products and facilitate direct purchases. The plan includes expanding Muse Spark to Facebook, Instagram, and WhatsApp soon, potentially reaching over 3.5 billion users. A private API preview will be available for select developers shortly.
Developed in nine months under the internal codename Avocado, Meta claims its new pretraining stack can match Llama 4 Maverick’s capabilities with significantly less computational power. Muse Spark is considered a “small and fast” initial step within the broader Muse family, with more advanced versions already underway.