NVIDIA AI-Q leads DeepResearch Bench I and II

NVIDIA AI-Q reached first place in DeepResearch Bench I (55.95) and DeepResearch Bench II (54.50). Why does this matter? Because it shows that an open, configurable, and reproducible stack can compete in complex automated-research tasks: retrieving evidence, synthesizing analysis, and producing high-quality cited reports.

What NVIDIA AI-Q achieved

AI-Q is not just a model: it's an open blueprint to build research agents that work over enterprise and web data and deliver answers with verifiable citations. With a single configurable stack, NVIDIA achieved top performance on two complementary benchmarks that measure both narrative quality and fine-grained factual correctness.

DeepResearch Bench I rewards the final report quality: comprehensiveness, depth of insight, obedience to instructions, and readability. DeepResearch Bench II uses more than 70 binary rubrics per task to evaluate information retrieval, analysis, and presentation. Leading both means AI-Q doesn't just write well: it also finds and analyzes the right evidence.

What NVIDIA AI-Q achieved

Core architecture: multi-agent and modular

Open and reproducible stack

Data, trajectory generation and fine-tuning

Middleware for reliability on long horizons

Ensemble and refiner: boosting coverage and polish

Why this approach matters for companies and developers

Key numerical details

Final reflection

Original source

Stay up to date!

NVIDIA AI-Q leads DeepResearch Bench I and II