AstaBench updates results and gains industrial adoption

AstaBench publishes a new batch of results after evaluating frontier models on more than 2.4K scientific research problems. What do the numbers show about AI's real ability to do cutting‑edge science, and how useful is it today for you as a researcher or developer?

What is AstaBench?

AstaBench is an open benchmark designed to measure whether AI agents can do scientific research with substance and rigor. It's not just a list of tests: it's an evaluation framework, a set of problems, and a collection of baseline agents anyone can use and extend.

The benchmark evaluates four major categories:

literature search and understanding,
writing and running code,
dataset analysis,
and end-to-end discovery workflows.

All code, tools, and baseline agents are open source. The first release shipped with Asta and the paper was presented as an oral at ICLR 2026. The goal is to have a shared, reproducible measurement of whether AI can do science—not just isolated tasks.

What is AstaBench?

New results: key numbers and technical reading

Update to the scoring model and transparency

Industrial adoption: who is integrating AstaBench

Want to try your agent?

Original source

Stay up to date!

AstaBench updates results and gains industrial adoption