Scientists have urgent questions and piles of structured files, but only a few hours and limited reliable tools to analyze them. Sound familiar? The Allen Institute for AI introduces Asta DataVoyager, a tool that lets you query datasets in natural language and get reproducible answers with visuals and ready-to-use code. (allenai.org)
What is Asta DataVoyager
Asta DataVoyager is a feature inside the Asta ecosystem designed to make discovery and analysis of structured data easy. You upload a file in common formats like CSV
, Excel (.xlsx
), JSON
/JSONL
, HDF5, TSV
or Parquet, ask your question in plain language, and the tool returns a full package: a concise answer, copyable visuals, reproducible code and a methods section that documents assumptions and statistical tests. Everything is built so the results are easy to share and audit. (allenai.org)
Want to dig deeper? You can request follow-ups like "Control for baseline weight" or "Use non-parametric tests" and Asta adds new cells to the output, keeping traceability of analytical steps. That turns the interaction into something like working with a Python notebook but without starting from scratch. (allenai.org)
Why it matters for clinical research
One of the first prototypes was developed with the Cancer AI Alliance, which set up a federated instance of Asta DataVoyager. In this setup, the models travel to clinical centers to learn locally from deidentified data so records never leave institutional firewalls. That architecture enables multicenter analyses while protecting patient privacy. (allenai.org)
As a concrete example, researchers are preparing a federated study on lung cancer to explore topics like time to surgery after neoadjuvant chemo‑immunotherapy, the effect of adding immunotherapy after radiation, and comparisons between targeted drugs and standard chemotherapy. If the prototype works, this could generate real‑world findings that help improve care. (allenai.org)
"We’re excited about the possibility of offering powerful, secure analytical tools to oncology researchers who may not have AI expertise." Jeff Leek, PhD, VP and Chief Data Officer at Fred Hutch Cancer Center. (allenai.org)
Security, control, and reproducibility
Asta DataVoyager was built to keep teams fully in control of their data. You can delete datasets from the hosted console or deploy the tool on-premises, in private data centers, or private clouds. That flexibility is key for groups handling sensitive data or needing to meet clinical and regulatory requirements. (allenai.org)
Plus, the output includes a methods section that documents decisions and tests, making it easier to reproduce and review analyses. In practice, this helps a collaborator, reviewer or auditor understand not only the finding but how it was reached. (allenai.org)
Practical uses for different profiles
Are you a researcher at a small university, a product manager with user data, or a journalist working with public datasets? Asta DataVoyager can shorten the path from question to result without forcing you to program everything from scratch. Imagine turning a CSV
of survey responses into presentation-ready visuals in minutes, or reproducing a statistical analysis for a report with the exact code that produced the conclusion.
If you work in health, model federation opens the door to collaborations between institutions without moving sensitive data. If you’re a startup, getting code and visuals ready lowers development time and improves transparency for investors or customers.
How to access and next steps
Asta DataVoyager is already available as part of the Asta ecosystem and the Allen Institute invites teams to request access for pilots and secure deployments. They also offer the option to install it on your own infrastructure, and there’s a form to request early access or discuss projects with the team. (allenai.org)
If you’re interested in using AI to speed up reproducible analyses without giving up control of your data, this tool is worth a try. What questions could you answer today if you had visuals, code and methods ready in minutes?