The number of papers being published is growing at an overwhelming pace: how do you filter what's relevant when you need grounded, cited answers? SciArena aims to do exactly that — to test language models on real scientific literature tasks with help from the research community.
What is SciArena
SciArena is an open platform for comparing foundational models' answers to questions about scientific literature. Researchers submit queries, see side-by-side answers generated by different models, and vote for the output that best answers the question. The goal isn't to judge chatty bots by style, but to measure real ability to reason and synthesize academic work. (allenai.org)
How it works
When you upload a question, SciArena uses a retrieval pipeline adapted from the Scholar QA system to fetch relevant snippets from papers. Those contexts and the question are sent to two models randomly selected, which generate long answers with citations. Outputs are standardized into plain text to reduce style bias, and then users vote in a blind comparison. In parallel, an -style ranking system keeps a dynamic leaderboard of model performance. ()
