Products

company

resources

Jul 10, 2025

Reka Flash 3.1 and Reka Quant

Reka Flash 3.1 and Reka Quant

At Reka, we build intelligence from the ground up to power our multimodal solutions such as Reka Research and Reka Vision. Today, we are excited to open source a few of our building blocks:

  • Reka Flash 3.1, an improved version of Reka Flash 3 due to significant advances in our reinforcement learning stack. Reka Flash 3.1 is particularly strong on coding and as a base model to be finetuned on agentic tasks.

  • A 3.5-bit quantized version of Reka Flash 3.1 that delivers state-of-the-art performance at low bitwidths using calibrated error reduction and self distillation.

  • Reka Quant, our quantization library that supports self-distillation, fast distributed proxy Hessian computation for fast LDLQ, and export to popular llama.cpp datatypes such as Q3_K and Q4_K.

Reka Flash 3.1 improves by 10 points on LiveCodeBench v5 from Reka Flash 3. For coding related tasks, Reka Flash 3.1 is competitive with models such as Qwen3-32B and o3-mini. These advances come from major upgrades to our RL stack, including a new RL algorithm and significant scalability improvements. If you want to learn more about how we do reinforcement learning for Reka Flash 3.1, please check out this post.

While Reka Flash 3.1 is already compact as a 21 billion parameter model, quantization allows us to reduce its memory footprint even further, allowing it to work in resource-constrained settings and be served cost efficiently. Reka Quant achieves near-lossless quantization to 3.5 bits when quantizing Reka Flash 3.1 to Q3_K_S datatype in llama.cpp, incurring only a 1.6 average performance degradation. In contrast, Q3_K_S quantization routine results in a 6.8 average performance degradation. We provide a more detailed discussion about our quantization approach in this post.

How Reka Flash 3.1 Powers our Solutions

Strong reasoning and coding skills are important capabilities to support multimodal agentic use cases, and near-lossless quantization allows us to deploy our models anywhere. A multimodal version of Reka Flash 3.1 serves as a base model for our core products Reka Research and Reka Vision.

Reka Research is a state-of-the-art agentic AI that can browse the web and private documents to answer complex questions. A customized version of Reka Flash 3.1 is used as the orchestrator for Reka Research. We optimize it end-to-end with the same reinforcement learning infrastructure that we use to train Reka Flash 3.1. Learn more about it here.

Reka Vision is our platform designed for visual understanding and search. It relies on multimodal understanding capabilities of a multimodal version of Reka Flash 3.1. See this post for more details about Reka Vision.

Try It Now

Reka Flash 3.1 and Reka Quant are available for free from our Hugging Face and Github accounts. Reka Flash 3.1 is also available via our API with model name reka-flash-3.1 and on our Playground.

Please contact us for more information about how you can customize them for your organizations. We are hiring! If you are excited to advance the frontier of multimodal AI research, apply to join us here.