Key Takeaways
- Muse Spark is a new proprietary model family, not an upgrade to Llama.
- Meta claims its new training recipe is more efficient than Llama 4 Maverick.
- Contemplating mode scored 58% on Humanity's Last Exam by running multiple reasoning agents in parallel.
- Meta is positioning health as the flagship consumer use case for the model.
Meta's newly formed Superintelligence Labs has released Muse Spark, a multimodal reasoning model that the company says marks the beginning of a fundamentally new approach to its AI development.
Available now on meta.ai and the Meta AI app, Muse Spark introduces visual chain-of-thought reasoning, tool use and a novel multi-agent system called Contemplating mode.
Table of Contents
- What Is Muse Spark?
- How Does It Perform?
- Meta's Scaling Story
- Health as a Flagship Use Case
- Safety and a Notable Red Flag
- Meta's Bigger AI Picture
What Is Muse Spark?
Muse Spark is the first model in Meta's "Muse" family — a ground-up rebuild of the company's AI stack, separate from its open-source Llama line.
Meta says Muse Spark is natively multimodal, meaning it was designed from the start to integrate text and visual information rather than bolting vision onto a text model after the fact.
Key capabilities include:
- Visual Reasoning and Annotation: Analyzing images, recognizing entities and generating dynamic overlays (e.g., nutritional labels on a photo of food)
- Tool Use: Interacting with external tools and APIs mid-conversation
- Contemplating Mode: Orchestrating multiple AI agents reasoning in parallel to tackle harder problems without drastically increasing response time
Meta is also opening a private API preview to select developers.
How Does It Perform?
Meta shared benchmark results that puts Muse Spark at a competitive level with leading models from OpenAI and Google, while acknowledging gaps in long-horizon agentic tasks and coding.
In Contemplating mode, the model achieved significant capability improvements in challenging tasks, achieving 58% on Humanity’s Last Exam and 38% in FrontierScience Research.
By Meta's own benchmarks, the model ranks behind only Google Gemini 3.1 Pro and OpenAI's GPT-5.4 in multimodal functionality, though some observers note that Meta didn't release a technical paper alongside the model, suggesting the figures deserve scrutiny.
Meta's Scaling Story
Much of Meta's announcement focused not on the model itself but on the infrastructure and training methodology behind it. The company highlighted three "scaling axes":
Pretraining Efficiency — Meta says its rebuilt training recipe achieves the same capability level using less compute than Llama 4 Maverick, its previous model. The company claims this also makes Muse Spark more efficient than leading base models from competitors.
Reinforcement Learning Stability — Post-training RL showed smooth, log-linear improvement on both training and held-out evaluation data, which Meta says indicates the gains generalize reliably.
Test-Time Reasoning Compression — Rather than simply letting the model think longer, Meta penalizes excessive token use during RL training. This creates what it describes as a "phase transition" where the model learns to compress its reasoning, solving problems with fewer tokens before extending again for harder tasks.
Health as a Flagship Use Case
Meta singled out health as a major application area.
The company collaborated with over 1,000 physicians to curate training data, and demonstrated Muse Spark generating interactive health displays, such as personalized dietary recommendations overlaid on food images with hover-based nutritional breakdowns.
Safety and a Notable Red Flag
Meta says Muse Spark passed safety evaluations across frontier risk categories. However, third-party evaluator Apollo Research found something unusual: the model showed the highest rate of "evaluation awareness" they've observed in any model, frequently identifying test scenarios as alignment traps and reasoning that it should behave honestly because it was being evaluated.
Meta acknowledged this warrants further research but said it was not a blocking concern for launch.
Meta's Bigger AI Picture
Meta's AI-related capital expenditures for 2026 are projected between $115 billion and $135 billion, nearly double last year's spending. Muse Spark is the first tangible output of that investment under MSL. Whether it represents a turning point or merely a foundation, Meta's AI ambitions have decisively moved beyond Llama.
The model will roll out across Facebook, Instagram, WhatsApp, Messenger, and Ray-Ban Meta smart glasses in the coming weeks. The company is also opening a private API preview to select users.