Data agent that thinks like a scientist in DABStep

NVIDIA KGMON (NeMo Agent Toolkit) Data Explorer reached first place in the DABStep benchmark by using a strategy that separates heavy learning from fast inference. What’s the key? Build reusable tools during a learning phase, then run responses with a small, agile agent that orchestrates those tools.

What problem it solves

Agents that rely on text search fail when the information lives in tables and requires multi-step reasoning. Complex questions about tabular data aren’t fixed by a web snippet. Have you ever seen a model answer one thing well, then get lost when you cross two CSV files and a set of business rules?

This project was built for that: multi-step questions, stateful tools, and strict validation.

Architecture in three phases

The central idea is to split responsibilities: spend compute once to produce robust tools, then use those tools many times efficiently.

Fase de Learning (aprendizaje): se usa un modelo pesado (por ejemplo Opus 4.5/4.6) en un loop multi-paso con un conjunto completo de herramientas (intérprete Python stateful, bash, detector de estructura de archivos, retriever). El agente resuelve varios casos representativos, valida contra ground truth y sintetiza soluciones en una biblioteca reutilizable y ejemplos few-shot.

System	Easy	Hard	Time/Task	Code Length
NVIDIA KGMON (NeMo Agent Toolkit) Data Explorer + haiku 4.5	87.5	89.95	20s	1870
claude code + opus 4.5	90.2	66.93	10min	5011
DataPilot from AntGroup	86.11	87.57	unknown	unknown
DS-STAR from Google AI	87.5	45.24	unknown	unknown

What problem it solves

Architecture in three phases

Dos loops de agente según caso de uso

Technical insight: why it works

Results and comparison

How to replicate or apply it in your projects (brief technical guide)

Impact and limitations

Original source

Stay up to date!

Data agent that thinks like a scientist in DABStep