AutoDS has just landed as a proposal for machines not only to answer questions, but to discover which questions are worth asking. Ai2 presented this prototype on July 18, 2025 and offers it as open-source so researchers and curious people can try it. (allenai.org)
What is AutoDS
AutoDS stands for Autonomous Discovery via Surprisal. It’s an experimental engine that aims to carry out scientific discovery openly and continuously, without a human handing it the initial hypothesis. Instead of waiting for a question, AutoDS generates hypotheses, tests them with statistical experiments, and uses those results to propose new hypotheses in an iterative cycle. (allenai.org)
Sounds like science fiction? Not really. The idea is to reproduce part of a scientist’s workflow: think, test, learn, and reformulate questions. That turns tedious exploration tasks into automated processes that can suggest unexpected directions.
How it works (in plain terms)
AutoDS relies on two key ideas: Bayesian surprise and a tree-search style exploration called Monte Carlo Tree Search or MCTS
.
-
Bayesian surprise: measures how much a model’s beliefs change when it sees new evidence. If a result shifts your prior a lot, it’s surprising — and that surprise signals something worth investigating. Think of it like noticing an ingredient suddenly altering the taste of a familiar recipe; your reaction tells you to dig deeper. (allenai.org)
-
MCTS with progressive widening: a technique to explore many possible hypotheses without getting lost. MCTS helps pick which hypotheses to develop, and
progressive widening
controls how the search expands when options are effectively infinite. The system uses the surprise computed by a language model as a reward signal to guide that search. (ar5iv.org)
Imagine AutoDS as someone using their own astonishment to decide where to look next.
Early results that catch attention
The authors tested AutoDS on 21 real datasets from fields like biology, economics, and finance. Under a fixed budget, AutoDS produced 5–29% more discoveries judged surprising by the model compared to competing methods. In a human evaluation of over 500 hypotheses, about two-thirds of AutoDS’s findings were also surprising to experts with master’s or PhD-level training. (ar5iv.org)
It’s important to note the team acknowledges limits: AutoDS isn’t always fast and discoveries require rigorous academic review before being accepted as valid. (allenai.org)
How you can try it today
If you’re a researcher, student, or curious entrepreneur, Ai2 released the code on GitHub and the paper on arXiv. The repo includes setup instructions, example datasets, and commands to run MCTS-based explorations with models like gpt-4o
. (github.com)
Practical quick steps:
- Clone the repository and create the environment exactly as they show in the
README
. - Use example datasets (DiscoveryBench, BLADE) or bring your own with a JSON metadata file.
- Run
run.py
tweaking parameters liken_experiments
,exploration_weight
, or thebelief_model
to see what hypotheses the system generates. (github.com)
Concrete example: a small biomedical team could use AutoDS to explore unexpected correlations in a public dataset, generate surprising hypotheses, and prioritize which ones to validate in the lab. It doesn’t replace human judgment, but it speeds up initial exploration.
Risks, limits and good practices
Automating curiosity does not remove the need for human scrutiny.
AutoDS can point you toward unexpected findings, but those findings may still contain biases, statistical mistakes, or data artifacts. That’s why you must:
- Verify results with independent tests.
- Review model assumptions and data quality.
- Avoid blindly trusting surprise signals without human interpretation.
Ai2 itself calls for caution and peer review before celebrating discoveries. (allenai.org)
Who it’s useful for and why it matters
- Researchers with limited resources who need to prioritize experiments.
- Product teams that want to explore user or market hypotheses from data.
- Educators looking for practical examples to teach the scientific method and critical thinking.
The promise is clear: if a machine can propose valuable questions, you can spend more time designing real experiments, interpreting results, and applying domain intuition. But this isn’t a magic wand; it’s a tool to amplify human curiosity.
Where to read more
- GitHub repo: allenai/autods. (github.com)
- Technical paper: Open-ended Scientific Discovery via Bayesian Surprise (arXiv). (ar5iv.org)
- Official Ai2 announcement: July 18, 2025 blog post. (allenai.org)
Are you interested in trying AutoDS with your own dataset, or would you prefer I guide you step by step to set it up?