Reka Edge: Frontier-Level Edge Intelligence for Physical AI

We are excited to introduce the latest version of Reka Edge, our next-generation vision language model. At Reka, we are continually advancing visual intelligence to build powerful models optimized for practical, real-world deployment. We believe that frontier intelligence should be fast, lean, and deployable anywhere. Reka Edge is specifically engineered for physical AI applications on the edge or as a much cheaper alternative to other cloud-based models. It excels at deep visual reasoning and physical grounding. This extremely efficient 7B parameter model is optimized to deliver industry-leading performance across key capabilities, including image understanding, video analysis, object detection, and agentic tool-use.

Streamlining Vision Encoding from the Ground Up

At the heart of Reka Edge is an architectural optimization to build a VLM with a convolutional encoder based on ConvNeXt V2. The architecture consists mainly of:

A 657M ConvNeXt V2 vision encoder for efficient processing of streaming videos.
A 6.4B transformer backbone, trained from scratch for reasoning and generation.

Reka Edge is optimized to produce only 64 tokens per image tile. This is a design decision to make it token-efficient, allowing it to process high-definition visual data without filling up the context window or slowing down inference. This token efficiency directly translates to faster response times and lower inference costs.

We trained the model on petabytes of multimodal synthetic and real data to provide robust physical grounding and precise spatial awareness. For object detection and localization, Reka Edge is designed to process detection instructions using the following prompt format:

Detect: {expression}

The expression can refer to a single object (e.g., green banana or cups on the table) or multiple objects (e.g., red car, man with a hat).

The expected target output maps the expression to bounding box coordinates along with the object labels:

<ref>object1</ref><bbox>x1,y1,x2,y2;x3,y3,x4,y4;...</bbox><ref>object2</ref><bbox>x1,y1,x2,y2;...</bbox>...

High Performance Across the Board

We conducted an extensive evaluation across a wide range of physical AI benchmarks, including visual understanding, object detection, robustness to hallucinations, and agentic tool use. We benchmarked Reka Edge against top-tier open-weight models of similar sizes, such as Qwen3.5 9B and Cosmos Reason2 8B, as well as a much larger frontier model Gemini 3 Pro. The results demonstrate that Reka Edge achieves state-of-the-art performance for its size in video understanding, exhibits a lower hallucination rate, and is highly suitable for agentic tool use. Notably, it approaches the performance of Gemini 3 Pro despite having a fraction of the parameter size.

Key Performance Highlights

Exceptional Video and Image Understanding. Reka Edge excels at comprehending video and images, significantly outperforming the next-best model of comparable size on MLVU Test and MMVU. This makes it an ideal choice for applications requiring advanced temporal reasoning, multi-image analysis, and comprehensive video understanding.

Precise Object Detection and Grounding. We engineered the model for precise localization and grounding capabilities. Consequently, Reka Edge demonstrates strong object detection performance on RefCOCO benchmarks, a critical feature for robotics, automotive, and visual search applications.

Best-in-Class Tool Use. Reka Edge demonstrates frontier-level tool calling abilities on Mobile Actions. This is essential for building robust on-device autonomous systems that seamlessly interact with external tools and APIs.

Robust Reliability and Truthfulness. Strong performance across leading hallucination benchmarks confirms Reka Edge’s reliable factual grounding and resistance to challenging adversarial examples.

Industry-Leading Latency and Token Efficiency

We put Reka Edge's inference performance to the test, comparing it against leading open-weight models in both streaming and high-volume batch processing scenarios. To ensure a fair comparison, all open-weights models are hosted on our inference cluster and served with vLLM.

We focused on key metrics: throughput, average latency, Time to First Token (TTFT), and the number of input tokens required to process a 1024x1024 image under a concurrent workload of 32 requests. The combination of our fully convolutional vision encoder and compact visual representations delivers substantial improvements across the board.

Token Efficiency, Throughput, and Latency

Reka Edge's efficiency starts with its compact visual representations. This design allows it to process visual data with a fraction of the typical context overhead while maintaining top-tier performance. When analyzing a standard 1024x1024 image, Reka Edge requires approximately 3x fewer tokens compared to other models of comparable size. This efficiency is a foundation for its speed.

By operating on significantly fewer input tokens, Reka Edge achieves massive gains in throughput and speed. In our experiments, it processes 5.46 images per second*. That's more than 2x faster than Cosmos Reason2 8B and Qwen3.5 9B.

This combination of low overall latency and high throughput makes Reka Edge an ideal model for demanding agentic workflows, where fast, end-to-end execution is critical. Furthermore, for interactive applications where users expect instant feedback, Reka Edge shines by delivering a Time to First Token (TTFT) of just 0.522 seconds.

*Note: Gemini is evaluated via API. Other models are hosted locally for fair comparison.

Deploy Anywhere

At 7B parameters, Reka Edge is a powerful yet efficient model, perfectly suited for deployment across a variety of environments: on-device, on-premise, and in the cloud. The applications are extensive and diverse, empowering you to build:

Real-time Physical AI: Powering robotic, drone, car, and advanced cameras that demand instant object detection and deep scene understanding from multiple video streams.
Media Intelligence: Delivering real-time video captioning, logo detection, highlight generation, and automated metadata tagging for both video archives and live content platforms.
Extended Reality Experiences: Enabling low-latency applications like real-time visual assistance, contextual awareness, and cutting-edge augmented reality on smartphones and smartglasses.
Visual-Driven Automation: Creating sophisticated agentic workflows that can orchestrate tools and APIs based on seamless, instant visual input.

Reka Edge not only delivers the frontier-level performance your applications require but does so with the infrastructure efficiency your business demands.

Reka Edge is optimized for memory efficiency through quantization, enabling real-world deployment even on devices with strict power and VRAM constraints. Using 4-bit quantization, it cuts memory consumption significantly, from 13GB to just 5GB—a 62% reduction—while maintaining the model's high-quality multimodal capabilities across text, image, and video understanding. This allows teams to bring powerful on-device AI into production without compromising user experience: Reka Edge retains over 98% of its original performance and delivers up to 2.3× higher throughput. Further memory savings are available with our signature Reka Quant method, which supports a 3.5-bit quantization option. Please contact us for access to these quantized models.

Reka Edge is engineered for broad compatibility, easily deploying on platforms including NVIDIA Jetson, Apple Mac devices with Apple Silicon, Linux and Windows PCs, as well as cellphones and wearable devices running Qualcomm Snapdragon.

Get Started Today

Reka Edge is now available through multiple access options, making it easy to explore and deploy in the environment that best fits your needs.

Get started immediately in the Reka Playground for a hands-on introduction to Reka Edge. For developers seeking to integrate the model into applications, our API documentation provides comprehensive guidance.

For teams requiring self-hosted deployment, Reka Edge can be run locally using your own devices. Please consult the model card for instructions on accessing the model via HuggingFace or deploying it with vLLM.

To discuss how Reka Edge can power your specific multimodal use cases, please contact our team.