[LABS]
Where multimodal
reasoning is built.
Reka Labs is our core research engine — developing world language action models and the foundational science behind the next generation of multimodal AI.
New · PhysicalRealismBench-U
Attributable Physical Realism
Evaluation for Video World Models
Reka AI Research · April 2026 · Track U: Physical Understanding
We release PhysicalRealismBench — a physics-first benchmark for evaluating physical understanding in VLMs — built on 550 synthetic videos with known ground truth. Even state-of-the-art models fail at fundamental physical reasoning tasks: the best-performing model (Gemini 3.1 Pro) achieves an F1 of only 39.5%. Our findings show that VLMs skip frames, rely on spatial heuristics, and miss violations any human would catch immediately.
[BENCHMARKS]
Dataset at a glance
Total videos
550
Synthetic, physics-correct
Matched pairs
250
Valid + violation per scene
Violation categories
5
Grounded in fundamental physics
Best model F1
39.5%
Gemini 3.1 Pro · joint metric
[PHYSICS ONTOLOGY]
Five violation categories tested
Spontaneous impulse
Motion without cause — violates Newton's first law.
e.g. object accelerates with no applied force
Random disappearance
Objects vanish unexpectedly — violates conservation and permanence.
e.g. object present in frame N, absent in N+1
Pass-through
Objects interpenetrate solid surfaces — violates impenetrability.
e.g. ball passes through a wall
Gravity violation
Objects hover or fall inconsistently — violates gravity and support.
e.g. unsupported object remains suspended
[KEY FINDINGS]
Why VLMs fail at physical reasoning
[PEOPLE]
Globally recognized
founding researchers
A world-class roster of 50+ multimodal AI researchers and engineers, including pioneers behind models like Gemini and Veo.

Mikel Artetxe
Founding researcher
Ex. DeepMind, Meta

Mik Binkowski
Founding researcher
Ex. DeepMind

Mateusz Malinowski
Founding researcher
Ex. DeepMind

Qi Liu
Founding researcher
Ex. DeepMind, Meta

Chris Burgess
Founding researcher
Ex. DeepMind, Runway

Dani YOgatama
Founding researcher
Ex. DeepMind

Yu-Chuan Su
Founding researcher
Ex. Google
+43
more researchers
and engineers
Build with
our research.
Access Reka models through Reka Infer — purpose-built for enterprise scale.