OpenAI launches GDPval: measures AI on real work tasks

OpenAI presents GDPval, an evaluation designed to measure how well AI models perform real, economically valuable work tasks — not just academic tests. Why does this matter now? Because it helps shift the conversation from "what they could do" to "what they already do" in everyday work.

What GDPval measures

GDPval evaluates representative knowledge-work tasks across 44 occupations chosen from the industries that contribute most to the United States' Gross Domestic Product. The initial release contains 1,320 specialized tasks and an open gold subcollection of 220 tasks.

Each task is built from real deliverables — a legal brief, a presentation, or an engineering drawing — which makes the evaluation more like actual work than a classroom exam. (openai.com)

How they chose occupations and built the dataset

They selected 9 industries that each contribute more than 5% to U.S. GDP, and within those, the 5 highest-wage occupations that are predominantly knowledge work. For each occupation they worked with experienced professionals (about 14 years of experience on average) who wrote and reviewed tasks through multiple quality-control cycles.

What GDPval measures

How they chose occupations and built the dataset

A concrete example

How they grade the responses

Early results worth knowing

What does this mean for work?

Limitations and next steps

Final reflection

Stay up to date!

OpenAI launches GDPval: measures AI on real work tasks