Asta publishes 258k queries: real lessons about AI | Keryc
Asta frees a treasure trove of data: 258,935 queries and 432,059 interaction events from researchers who used AI tools integrated with Semantic Scholar. What do scientists actually do when they have powerful research assistants at hand? The answer is quite different from what developers expected.
What Asta is and how it works
Asta is a research-assistant platform integrated with S2 (Semantic Scholar). It offers two main interfaces:
PaperFinder (PF): improved literature search with ranking and light synthesis generated by LLMs.
ScholarQA (SQA): generation of structured reports with sections and inline citations — in other words, scientific summaries with referenced evidence.
Both use retrieval-augmented generation (RAG) over an academic corpus, so claims are anchored to retrieved papers. For comparison, the study also looks at traditional keyword searches in .
S2
Data came only from users who opted in to share interactions; information was anonymized with hash identifiers and queries containing PII were filtered out.
Technical findings and usage patterns
The headline numbers are striking: queries to SQA are seven times longer than traditional searches in S2. But it’s not just length: queries contain more entities, relationships and explicit constraints. Between 2022 and 2025, even traditional searches grew in complexity (from 4.8 to over 6 words on average, and from 7% to 10% with at least one constraint).
Does that mean users are only typing longer sentences? No. Researchers bring tactics from general-purpose chatbots: prompt engineering, role assignment, templates and even collaborative writing strategies. In some cases they tried to evade plagiarism detectors; the dataset documents that so the community can understand real behavior, not just the ideal.
Key behaviors detected
Richer queries: more entities (e.g., genes, methods, datasets), relationships and filters.
Non-linear reading: users jump sections, reopen previous parts and navigate non-sequentially.
Persistent results: more than 50% of SQA users and 42% of PF users revisit reports hours or days later.
Low duplication: the rate of nearly duplicate queries is ~19% for SQA and ~15% for PF, lower than the revisit rate — suggesting outputs are stored as reusable artifacts.
Implications for developers and designers
These findings aren’t curiosities; they change how you should design tools:
Artifact management: if users return to reports, you need versioning, history handling and easy ways to update results as new literature appears.
Support for non-linear reading: interfaces should prioritize collapsible sections, TL;DRs and quick routes to relevant subsections instead of assuming sequential consumption.
Capability expectations: users treat these tools as collaborators. That requires making limits explicit, improving evidence traceability and offering controls to prevent misuse (for example, avoiding features that facilitate plagiarism).
Rich and ethical telemetry: clickstreams (what expands, which citations are followed) are essential to understand real utility — but must be collected with consent and robust anonymization.
Technical details of the dataset (Asta Interaction Dataset - AID)
AID is, to our knowledge, the largest open dataset on researchers’ interactions with AI-powered scientific tools. What it includes:
Scale: 258,935 queries and 432,059 clickstream interactions over six months (February–August 2025).
Rich signals: full text of queries, section expansions, clicks on S2 links, citations consulted, report section titles, positions shown in results, and more.
Taxonomy: a reusable taxonomy of query intents, writing styles and types of search criteria, built with a human + LLM iterative process.
Format: six Parquet files (queries, section expansions, S2 link clicks, report section titles, report corpus IDs, and PF shown results), ready for large-scale analysis.
If you work on tools for researchers, this data will probably surprise you as much as it surprised the authors. It revealed notable mismatches between intended design and real use.
What you can do with AID (practical ideas)
Analyze domain-specific prompt engineering patterns to improve models.
Evaluate the usefulness of non-linear designs and measure which sections generate the most reproducible actions.
Train reranking or summarization models that optimize for persistence and reuse, not just immediate impressions.
Use the taxonomy as a label set to classify queries in reproducible studies.
Final reflection
Researchers don’t use AI just to search: they rewrite it. They make queries longer and more structured, treat results as artifacts and bring habits learned from general chatbots. For tool builders this is a clear invitation: adapt interfaces, safeguard evidence and think about the lifecycle of results, not only the instant answer.