April 2026

Muse Spark Benchmark Results

Full evaluation results across reasoning, science, medical, coding, and retrieval benchmarks

CategoryBenchmarkMuse SparkGPT-5.4Gemini 3.1 ProNotes
OverallAA v4.0525757Comprehensive index
Chart UnderstandingCharXiv86.482.880.2Chart comprehension
MedicalHealthBench Hard42.840.120.6Medical QA
Deep SearchDeepSearchQA74.869.7Research retrieval
ReasoningHLE (Fast)36.543.948.4
ReasoningHLE (Contemplating)50.243.948.4Extended reasoning
ScienceFrontierScience38.336.723.3Research frontier
AbstractARC AGI 242.576.176.5Pattern reasoning
CodingTerminal-Bench 2.059.075.168.5Agentic coding

Source: Meta AI / Artificial Analysis, April 2026

86.4
CharXiv
#1 Chart Understanding
42.8
HealthBench Hard
#1 Medical QA
50.2
HLE (Contemplating)
#1 Extended Reasoning

Muse Spark leads on chart understanding, medical reasoning, deep search, science, and extended reasoning.

Try Muse Spark at meta.ai →
Muse Spark Benchmarks — Full Results vs GPT-5.4 & Gemini 3.1