Introducing Parallel Thinking for Reka Research

At Reka, research is at the heart of what we do. When we approach a challenging problem, we brainstorm together about the possible approaches to tackle it, weigh them against evidence, and converge on the strongest solution.

We designed Parallel Thinking to bring this same spirit of exploration into AI reasoning. Rather than following a single line of thought, the model generates several trajectories in parallel. A separate system, trained on verifiable problems, then resolves among these possibilities, outputting the final answer.

We have integrated Parallel Thinking into Reka Research—our research agent that can browse the web and private documents to answer complex questions—and found that it brings substantial improvements on benchmarks like SimpleQA and Research–Eval.

How Parallel Thinking Works

At a high level, Parallel Thinking works in two stages:

Parallel generation. Given an input prompt, we use the standard version of Reka Research to independently generate N candidate responses in parallel. Because the generations are produced concurrently, latency barely increases with N, making it suitable for practical applications where response time is important.
Resolution. A separate model resolves among these candidates to produce the most reliable final answer. This model is initialized from a post-trained version of Reka Flash 3, and trained on verifiable problems to prefer correct solutions over incorrect ones, using candidate trajectories generated by Reka Research.

By adjusting N, developers can balance accuracy and cost. The Reka Research API exposes two modes (low and high), making this tradeoff easy to control.

Results

We find that Parallel Thinking yields strong and consistent improvements across multiple benchmarks. We report results on the following two:

On Research-Eval, we obtain an improvement of over 3-4 points depending on the thinking level, improving from 59.1 to 63.3 in high thinking mode.
On SimpleQA, we observe a 3 point improvement on low thinking mode, while high thinking mode further improves results by half a point, reaching a final accuracy of 94.8.

Pricing and Availability

Parallel Thinking is now available in our Research API. With a fixed cost of $35 per 1k requests (only $10 above standard requests), the low thinking mode offers an affordable option to improve performance for most practical scenarios. For applications where accuracy is critical, the high thinking mode is available at $60 per 1k requests.

Final Remarks

Parallel Thinking brings a simple but powerful idea into practice: when faced with a challenging problem, it’s often better to explore multiple possibilities before committing to an answer. By enabling Reka Research to think in parallel, we obtain substantial improvements in accuracy with a minimal latency overhead. Developers can now benefit from these improvements directly in the Research API, with flexible options to balance cost and accuracy.

At Reka, our progress is never static. We continue to iterate, refine, and advance our research and product efforts to make our systems smarter every day. We are hiring across all roles. If you are interested in joining us, please apply here.