Meta unveils Muse Spark, its first new AI model since hiring Alexandr Wang

Meta has unveiled Muse Spark, the first AI model produced by its Meta Superintelligence Labs, the new AI research unit it created last year and has spent billions of dollars to staff and equip.
The model is, according to benchmark tests that Meta published, competitive with leading AI models from OpenAI, Anthropic, and Google across many tasks, although it does not surpass them across the board. Still, if the benchmark results hold up when tested by independent experts, Muse Spark seems to put Meta back in the AI race after its last AI model, Llama 4, which was released in April 2025, was widely panned as a dud.
In the past, however, Meta has been caught manipulating the published benchmark results of an AI model to make it appear more capable than the version available to most users actually was. This was the case with Meta’s Llama 4 benchmarks, in which the company later admitted to using specialized, unreleased versions of the model, fine-tuned for specific tasks, to boost benchmark scores in those areas, while the general version made available to all users did not perform as well.
And there’s another catch. Few people will be able to use the new Meta model outside of the company’s own product ecosystem. Unlike Meta’s previous AI models, which were released as “open weight” models—meaning anyone could download the models for free and run them on their own equipment, as well as modify and fine-tune them as they wished—Muse Spark is, at least for the moment, primarily an in-house tool for Meta.
The model currently powers the Meta AI assistant in the company’s stand-alone Meta AI app and on meta.ai. The company said it will be rolling it out to WhatsApp, Instagram, Facebook, Messenger, and Meta’s Ray-Ban AI glasses in the coming weeks. It also said it will offer the model in a “private preview” to select partners through an application programming interface (API). That makes Muse Spark even more proprietary than the paid proprietary models offered by Meta’s rivals. (Meta said in a blog post that it hopes to open-source future versions of the model.)
Muse Spark is Meta’s first reasoning model, meaning it can work through a process in a step-by-step fashion, using different strategies if its initial approach doesn’t work. The company’s previous models were all designed to produce an instant answer based on the model’s training. Muse Spark is also a multimodal model that can take in and output both text and images. The model also supports the use of other software tools and can help orchestrate the work of multiple subagents, according to a technical blog post released by Meta.
In its blog post announcing the new model, Meta describes Muse Spark as “small and fast by design, yet capable enough to reason through complex questions in science, math, and health.” It describes the model as the first in a series of new models, with Muse Spark being used to validate the architecture and training regime Meta is using, before the company scales this up to larger and even more powerful models in the same family.
The model also has a “contemplating” or “thinking” mode in which it can spin up subagents to reason about different parts of a task in parallel. Meta said in a technical blog it published on the new model that this mode allows Muse Spark “to compete with the extreme reasoning modes of frontier models such as Gemini Deep Think and GPT Pro.”
The benchmark results published alongside the launch paint a picture of a model that is competitive but not dominant. For instance, on the GPQA Diamond benchmark, which is supposed to test PhD-level reasoning skill, Muse Spark scored 89.5%, which slightly trailed both Gemini 3.1 Pro’s 94.3% as well as the 92.7% and 92.8% that Anthropic’s Claude Opus 4.6 and OpenAI’s GPT-5.4 scored respectively. On a leading health benchmark, HealthBench Hard, Muse Spark beat all rival models with a score of 42.8%, which was far better than either Opus 4.6 or Gemini 3.1 Pro, and slightly better than GPT-5.4.
Meta acknowledged the performance gaps. Its technical blog post states that the company continues “to invest in areas with current performance gaps, specifically long-horizon agentic systems and coding workflows.”
The Muse Spark launch is the most tangible product yet of the sweeping reorganization Meta undertook after the Llama 4 fiasco. In June 2025, Meta spent $14.3 billion to acquire a 49% nonvoting stake in Scale AI and brought in its cofounder and CEO, Alexandr Wang, as Meta’s first-ever chief AI officer.
Wang has been tasked with leading a newly created Meta Superintelligence Labs unit. Wang and Zuckerberg went on a talent acquisition spree, offering AI researchers at rival AI labs pay packages that reportedly climbed into the hundreds of millions of dollars when equity was included. The company has also committed hundreds of billions of dollars to build out AI computing infrastructure to support its new AI drive.
There has since been further reorganization, even as Muse Spark was in development. In March 2026, Meta created a new applied AI engineering organization led by Maher Saba, a vice president who previously worked in Meta’s Reality Labs virtual and augmented reality unit. Saba reports directly to Meta chief technology officer Andrew Bosworth. Saba’s unit works alongside Wang’s Superintelligence Labs to build what an internal memo described as “the data engine that helps our models get better, faster.” The move was widely interpreted as Zuckerberg hedging his bets—ensuring product-focused AI development continues even as Wang pursues longer-term superintelligence research.
In a technical blog post, Meta says that over the past nine months its team rebuilt its AI stack from the ground up, including improvements to model architecture, optimization, and data curation. The company claims these advances allow it to achieve the same capabilities with “over an order of magnitude less compute” than Llama 4 Maverick, Meta’s previous model. Meta also says its reinforcement learning pipeline now delivers “smooth, predictable gains,” and that Muse Spark is the first step on a deliberate “scaling ladder” where each generation validates the last before the company trains larger models.
On safety, Meta says Muse Spark underwent extensive evaluation before deployment, following the company’s updated safety framework. The model reports impressive results for safety around potential bioweapons engineering—on one benchmark, it refused 98% of requests that the benchmark designers judged as potentially helping someone develop a bioweapon.
However, the blog post also said third-party evaluator Apollo Research found that Muse Spark demonstrated the highest rate of “evaluation awareness” of any model Apollo has observed, frequently identifying test scenarios as “alignment traps.” Meta says its own follow-up investigation found initial evidence that this awareness may affect model behavior on a small subset of alignment evaluations, but concluded it was “not a blocking concern for release.”
In 2001, Fortune first convened “The Smartest People We Know,” bringing together CEOs and founders, builders and investors, thinkers and doers. Since then, Fortune Brainstorm Tech has been the place where bold ideas collide. From June 8–10, we will return to Aspen—where it all began—to mark 25 years of Brainstorm. Register now.


