AutoDiscovery arrives in AstaLabs to change how you interact with your data: instead of waiting for you to ask the right question, the AI explores on its own, generates hypotheses and runs reproducible experiments with code and statistical analysis. Can you imagine spending less time on manual exploration and finding signals hidden in rows and columns? That’s exactly what this experimental tool promises.
Qué es AutoDiscovery y por qué importa
AutoDiscovery is an automated scientific-discovery pipeline integrated into AstaLabs. Rather than starting from a question, it starts from your structured dataset (CSV, JSON, Parquet, etc.) and does the heavy lifting: it generates hypotheses in natural language, proposes experimental plans, writes and runs Python code, interprets statistical results and then formulates new hypotheses.
What does research gain? Reproducible speed and a systematic search for the unexpected. Teams in marine ecology, oncology and social sciences are already reporting useful findings — some verified and even published after independent audit.
AutoDiscovery avoids two common failures in open exploration: wandering aimlessly or replicating training biases. It does this with two clear technical ideas:
Uses Bayesian surprise to prioritize findings that change the system’s beliefs. Before an experiment it keeps a prior (a probability distribution extracted by querying the language model). Then it observes data, computes the posterior, and measures the magnitude of the change. That magnitude is the surprise.
Navigates the infinite space of questions with MCTS (Monte Carlo Tree Search). MCTS balances exploring new branches and exploiting promising ones, allocating compute resources to the most informative nodes.
Important: AutoDiscovery not only measures how large the surprise is, it also measures its direction. A negative shift (evidence that decreases belief in a hypothesis) can be as valuable as a positive one, because it contradicts prevalent assumptions.
Results that change our expectations are often more interesting than those that simply confirm the obvious. That’s why chasing surprise makes scientific sense.
Qué verás en AstaLabs: interfaz y trazabilidad
In AstaLabs the execution is transparent. As experiments run, a table appears where each row is a tested hypothesis. You’ll see columns like "Before", "After" and the Surprisal score to understand how much the belief changed.
Also:
The search tree shows the sequence of hypotheses explored.
Clicking a row opens the Inspector Panel with the full hypothesis, the statistical analysis and the effective Python code: everything reproducible.
You can iterate: pass learnings from one run as context for the next.
Caso concreto: exclusividad mutua en mutaciones de cáncer
A practical example with oncologists showed how AutoDiscovery can surf huge search spaces. Starting from co-occurrence patterns in breast cancer mutations, the system found a branch suggesting mutual exclusivity between PIK3CA and TP53.
Prior: mean 0.50 (neutral uncertainty)
Posterior after analysis: mean 0.82
Result: strong increase in belief and a high surprisal score, so it was flagged for follow-up.
Researchers appreciated that the signal emerged from an exploration that would be infeasible by hand and that AutoDiscovery proposed concrete validation steps.
Cómo probarlo en AstaLabs (paso a paso)
Log in to AstaLabs and try the Example Sessions dataset to see the full flow.
Create + New exploration: upload your file (CSV, JSON, Parquet), describe the context to seed the system’s beliefs and adjust the experiment budget.
Start the run with Start Run. The table and the tree populate in real time. You can navigate away; results are saved.
Inspect any row to see the hypothesis, the analysis and the reproducible code.
Practical tips: start small (<10 hypotheses) as a test drive; then scale to 50–100 hypotheses for deeper analysis. Runs are limited to 500 hypotheses per session.
Costos, privacidad y límites operativos
For early access there is an allocation of 1,000 Hypothesis Credits free (1 hypothesis = 1 credit). Credits are available until February 28, 2026.
Runs tend to be compute-intensive and can take hours.
Confirm your data isn’t confidential before uploading. Source datasets are deleted automatically 7 days after analysis completes; AutoDiscovery retains the outputs needed to reproduce and extend findings (hypotheses, plans, code, results).
Riesgos, validación y buenas prácticas
AutoDiscovery is powerful, but not infallible. Some recommendations to use it rigorously:
Treat its findings as starting points, not definitive proof. Always validate with additional analysis and domain review.
Review the code and statistical tests AutoDiscovery generates. Transparency makes human audit easier.
Consider biases in your data; surprise can result from sampling artifacts or data-cleaning quirks.
Use a small budget to explore, iterate on the intent/context, and then scale if results look promising.
Reflexión final
AutoDiscovery changes the relationship between scientist and dataset: from static deposits to interactive research artifacts. If you work with structured data, it lets you explore questions you might not even know how to ask. It doesn’t replace expert intuition; its value is the speed and reach with which it suggests novel directions.