Reka Flash Updates: Advanced Multimodal Understanding, Improved Reasoning, Better Agent Building Blocks, and Speech Input-Output

At Reka, we develop multimodal AI models that enable next-generation AI products—including AI agents that can see, hear, and speak. Today, we are excited to announce the latest update of Reka Flash.

Reka Flash is one of the few models publicly available that supports interleaved multimodal inputs (text, image, video, audio) in a single model with a context length of 128K tokens. It delivers outsize performance at 21 billion parameters, making it a cost-effective solution.

We have significantly improved existing capabilities and added new ones to Reka Flash as well as our other models. These capabilities address the most prevalent needs of both consumers and enterprises, forming a solid foundation for developing robust and powerful applications.

The new Reka Flash is available today on Reka Chat and Reka API. It can also be deployed on-premises or in a private cloud setting (contact us for more details). In addition, we are partnering with NVIDIA to package this model as NVIDIA NIM which optimizes the model for higher throughput and lower latency. The NIM microservice will soon be available on ai.nvidia.com.

1. Image: documents, tables, diagrams, structured output

Reka Flash supports arbitrary image resolution and aspect ratio, allowing users to upload images of varying sizes. This update delivers enhanced OCR capability and improved understanding of documents, tables, charts, and diagrams.

In addition, the latest version is more capable at following complex multimodal instructions and generating structured outputs.

Reka Flash reads pages in a PDF document as images. Unlike typical RAG systems that solely operate in text modality by retrieving textual chunks, our version understands both text and images in the documents. This is a powerful solution for accessing background knowledge stored in large private document collections. For on-premise deployments, we provide the necessary orchestration to couple this with a retrieval system.

See how our partners use Reka models to enhance their digital library with descriptive metadata or understand multimodal business data.

2. Video: temporal grounding, native audio understanding

Reka Flash now has high-level temporal understanding and can natively capture audio conversations and environmental sounds in a video. It does not have the ability to ground timestamps (refer to specific moments) yet, but we are working to incorporate this capability in the next version.

The new Reka Flash supports longer videos up to 3-5 minutes at a time (even longer videos can be processed by streaming them), a major upgrade from the first version which only understands short videos up to 1 minute.

These capabilities unlock new use cases such as video segment retrieval. For example, our customers can refer to and summarize specific parts of a video or identify relevant clips based on textual descriptions to create video trailers.

3. General reasoning

Updated Reka Flash has improved reasoning and general chat capabilities, which primarily come from advances in our post-training stack. In addition to core algorithmic improvements, we have significantly improved our human and synthetic data workflows, including extensive distillation from our internal larger and more powerful models. In our internal evaluation, Reka Flash ELO score improved by 43 points and the overall quality score increased from 66.1% to 72.2%. This is also validated on LMSYS Chatbot Arena, where Reka Flash ELO has gradually climbed from 1148 to 1204 (+56) over the past few months.

4. Multilingual

Reka Flash was pretrained on 32 languages. While the new version is better across all languages, we focus on supporting 12 major languages: English, Indonesian, Thai, Vietnamese, Tamil, Hindi, Arabic, Japanese, Korean, Spanish, French, and Italian. The focus on these languages reflects the massive international opportunity for customers and developers in our ecosystem.

5. Instruction following and instruction chaining

Reka Flash significantly improved at instruction following, including multi-step instructions, making it suitable as a building block to support agentic tasks that often require correctly executing a series of instructions.

We also observed that the first version of Reka Flash performs poorly on a long list of instructions, often mixing instructions and input data for a given task, skipping some steps, or hallucinating outputs. We believe this is a key capability that is missing to make agentic workflow feasible. We created an internal benchmark, MegaTask, to measure progress in this area. Below is an example of a MegaTask prompt and a possible correct response.

MegaTask is a challenging benchmark where leading turbo-size models such as Gemini Flash (Sep 2024) and GPT-4o-mini (2024-07-18) only achieved 40.4% and 25.9% respectively. Reka Flash improved to 51.8% from an earlier version that was not able to perform this task at all.

6. Function calling, tool use, structured output

Reka Flash is a proficient coder, an important basic capability needed to accomplish many agentic tasks. With this update, Reka Flash now supports function calling, enabling seamless interaction with user-defined functions. Function calls are returned as structured output, streamlining its use for agentic applications. See how to use function calling with our API here.

On following general output format instructions (e.g., asking the model to reply with a single word, in JSON, XML or any other format that allows the user to easily parse the response), Reka Flash improved from 40.4% to 83.6% in our internal benchmark.

7. Speech input and speech output

Reka Flash now understands audio natively without using a separate automatic speech recognition system and generates speech tokens that can be decoded into audio (our public API still does not support speech output, contact us for access to this experimental feature). For now, speech input-output works best in English. We are adding support for more languages and voice cloning in the coming weeks.


8. Public Benchmark Results

These updates greatly enhance Reka Flash's capabilities, making it even more valuable for developers, researchers, and businesses alike. Reka Flash is one of the most powerful models of its size, with rich multimodal support that can be deployed flexibly.

We look forward to seeing how you use this new model and the innovative solutions you create with it. Thank you for being a part of the Reka community!

Previous
Previous

Introducing Space

Next
Next

Unlock Your Multimodal Business Data with Reka and MongoDB