Training interactive world models requires data that is hard to find: ego-centric video sequences with densely aligned action signals — keyboard inputs, camera motion, and ego state — all synchronized to the visual stream. Real-world embodied data is costly to collect. Synthetic data often lacks the visual richness or behavioral diversity needed for generalization.
Counter-Strike 2 demos offer a compelling middle ground. Matches are recorded as a deterministic replay — a compact file that encodes the full state trajectory of all players across every tick of the game. From a single demo file, we can reconstruct clean first-person video for any player at any point in the match, and extract the precise control inputs that drove each visual change.
Today we release CS2-10k, a large-scale egocentric gameplay dataset built from professional CS2 matches. It contains 646,578 player-round videos spanning 11,072 hours of first-person footage, paired with per-frame annotations covering keyboard state, mouse movement, and 3D player trajectory. We are also releasing cs2-dem-renderer, the open-source pipeline used to produce it.
Dataset Overview
CS2-10k is built from public professional match demos sourced from HLTV. For each demo, we render clean first-person video at 1280×720 resolution and 48 fps using the demo replay tool inside CS2, producing one video per player per round. Alongside each video, we store a parquet file containing per-frame annotations synchronized to the video timeline.
Annotation Schema
Every video clip has its corresponding anotations stored in a .parquet file:

Per-Frame Annotations
Each entry in frame_data contains:

The combination of video and per-frame control signals creates a tight action-observation loop.
No Abrupt Visual Changes
Each clip is a contiguous segment of a single round from a single player's perspective. There are no mid-round cuts, no editing transitions, and no UI HUD. The camera moves in a physically plausible relationship in the world and we hide the player weapon to get rid of sudden visual changes caused by weapon recoil, reloads, and weapon switching.
Use Cases for World Models
Designed specifically as a training substrate for interactive world models — models that must predict how first-person visual observations evolve in response to control inputs. Below are representative use cases:
Rendering Pipeline
The pipeline that produced CS2-10k is open-source at github.com/reka-ai/cs2-dem-renderer. Given a .dem file, it performs a two-pass parse to extract per-player spawn/death intervals and per-frame button inputs, then drives CS2's built-in demo replay system via a lightweight server plugin to render clean first-person video for each player round. Frames are streamed in real time from CS2's movie output to ffmpeg (VAAPI HEVC), producing .mp4 clips alongside synchronized .parquet annotation files. A worker mode processes entire directories of demos with automatic deduplication, making it straightforward to run at the scale of CS2-10k.


