Full evaluation results across reasoning, science, medical, coding, and retrieval benchmarks
| Category | Benchmark | Muse Spark | GPT-5.4 | Gemini 3.1 Pro | Notes |
|---|---|---|---|---|---|
| Overall | AA v4.0 | 52 | 57 | 57 | Comprehensive index |
| Chart Understanding | CharXiv | 86.4 | 82.8 | 80.2 | Chart comprehension |
| Medical | HealthBench Hard | 42.8 | 40.1 | 20.6 | Medical QA |
| Deep Search | DeepSearchQA | 74.8 | — | 69.7 | Research retrieval |
| Reasoning | HLE (Fast) | 36.5 | 43.9 | 48.4 | — |
| Reasoning | HLE (Contemplating) | 50.2 | 43.9 | 48.4 | Extended reasoning |
| Science | FrontierScience | 38.3 | 36.7 | 23.3 | Research frontier |
| Abstract | ARC AGI 2 | 42.5 | 76.1 | 76.5 | Pattern reasoning |
| Coding | Terminal-Bench 2.0 | 59.0 | 75.1 | 68.5 | Agentic coding |
Source: Meta AI / Artificial Analysis, April 2026
Muse Spark leads on chart understanding, medical reasoning, deep search, science, and extended reasoning.
Try Muse Spark at meta.ai →