Reka

[REKA LABS]

AI that reasons, simulates,
and acts in the physical world.

50+ multimodal AI researchers and engineers — including pioneers behind Gemini and VEO.

New · PhysicalRealismBench-U

Attributable Physical Realism

Evaluation for Video World Models

Reka AI Research · April 2026 · Track U: Physical Understanding

We release PhysicalRealismBench — a physics-first benchmark for evaluating physical understanding in VLMs — built on 550 synthetic videos with known ground truth. Even state-of-the-art models fail at fundamental physical reasoning tasks: the best-performing model (Gemini 3.1 Pro) achieves an F1 of only 39.5%. Our findings show that VLMs skip frames, rely on spatial heuristics, and miss violations any human would catch immediately.

[BENCHMARKS]

Dataset at a glance

Total videos

550

Synthetic, physics-correct

Matched pairs

250

Valid + violation per scene

Violation categories

5

Grounded in fundamental physics

Best model F1

39.5%

Gemini 3.1 Pro · joint metric

[PHYSICS ONTOLOGY]

Five violation categories tested

Spontaneous impulse

Motion without cause — violates Newton's first law.

e.g. object accelerates with no applied force

Random disappearance

Objects vanish unexpectedly — violates conservation and permanence.

e.g. object present in frame N, absent in N+1

Pass-through

Objects interpenetrate solid surfaces — violates impenetrability.

e.g. ball passes through a wall

Gravity violation

Objects hover or fall inconsistently — violates gravity and support.

e.g. unsupported object remains suspended

[KEY FINDINGS]

Why VLMs fail at physical reasoning

[PEOPLE]

Globally recognised
founding researchers

A superstar roster of 50+ multimodal AI researchers and engineers — including pioneers behind models like Gemini and VEO.

Dani Yogatama

Founding researcher

Ex. DeepMind

Mik Binkowski

Founding researcher

Ex. DeepMind

Mateusz Malinowski

Founding researcher

Ex. DeepMind

Qi Liu

Founding researcher

Ex. DeepMind, Meta

Chris Burgess

Founding researcher

Ex. DeepMind, Runway

Mikel Artetxe

Founding researcher

Ex. DeepMind, Meta

Yu-Chuan Su

Founding researcher

Ex. Google

+43

more researchers

and engineers

Build with
our research.

Access Reka models through Reka Infer — purpose-built for enterprise scale.

Reka Edge