Pushing the Frontier of AI for Science
Discover benchmarks and track model performance across scientific domains
Leaderboards
| Rank | Model | Org | Spearman's correlation | Mean Absolute Error MAE | Price (per 1M input tokens) Price | Distribution Plot | Date |
|---|
About ICLR2026-1K Benchmark
This benchmark evaluates AI models' ability to evaluate scientific papers using submissions to the International Conference on Learning Representations (ICLR) 2026. The benchmark consists of a random sample of 1,000 papers that received human reviews during the actual conference review process.
Each model is tasked with generating review scores for these papers. We then compare the AI-generated scores against the actual human reviewer scores to measure how well AI models can assess scientific quality and provide meaningful feedback. Importantly, all models in this benchmark were released before the ICLR 2026 reviews were made publicly available, ensuring there is no possibility of data leakage or contamination.
Evaluation Metrics:
- Mean Absolute Error (MAE): Measures the average absolute difference between AI-generated and human scores (lower is better)
- Spearman's Correlation: Measures how well the AI's ranking of papers matches human reviewers' rankings (higher is better)
AI Reviewer Challenge
We're launching the first dynamic AI reviewer challenge to continuously evaluate AI's capabilities in reviewing scientific papers. The challenge will run multiple times per year, creating fresh, real-world evaluation sets to push the boundaries of AI for science. We partner with conference organizers and create our own review processes to ensure all the selected papers are not exposed in LLMs' training data and the reviews are created by human experts. Regular challenges throughout the year keep our benchmarks fresh and prevent overfitting, ensuring models are evaluated on genuinely new and unseen scientific content.
Join the Challenge
We are looking for reviewers, participants, collaborators and funders to help with the challenge. Please fill out the form below to submit your interest.
Sign UpAbout SciBench.ai
SciBench.ai is a comprehensive platform for tracking and evaluating AI model performance across scientific domains. We curate high-quality benchmarks that push the boundaries of AI capabilities in science, from paper review to complex problem solving.
Our mission is to accelerate scientific AI research by providing transparent, reproducible evaluations and fostering innovation in the field.
Team
Tom Cohen
MIT
Jiaxin Pei
Stanford&UT-Austin
Chenglei Si
Stanford
Jing Yang
PaperCopilot
Affiliated Institutions