AI increases labor productivity by 1.8% according to Claude | Keryc
The way people talk with models like Claude can tell you a lot about how AI is changing real work. Anthropic analyzed 100,000 real conversations, estimated how long those tasks would take without AI and with AI, and offers a technical but practical way to think about effects on aggregate productivity.
Curious how they turned chat transcripts into macroeconomic numbers? Let’s walk through it.
Quick methodological summary
They analyzed 100,000 anonymous transcripts from Claude.ai (Free, Pro and Max) and asked Claude itself two estimates per conversation: (1) how long a competent professional would take to complete the task without AI and (2) how long the task actually took with the model’s assistance. Then they mapped each task to the O*NET catalog and used OEWS May 2024 wage data to convert hours into labor costs.
How do you go from a per-task estimate to a macro effect? They used a standard approach based on Hulten’s theorem and Domar weights: basically, each productivity improvement in a task is weighted by its economic importance (hours spent on the task and share in the wage bill) to obtain an implicit change in labor productivity and TFP.
Key results and technical metrics
Average without AI: the tasks seen in the conversations would have taken about ~90 minutes on average for a human.
Speedup with AI: Claude estimates a median task time reduction of ~80%. The typical conversation shows an estimated saving of 84%.
Macro extrapolation: assuming universal adoption over 10 years and holding current capabilities and uses constant, the estimates imply an annual increase in U.S. labor productivity of 1.8% per year over the next decade. With a labor share in TFP of 0.64, that corresponds to roughly ~1.1% annual TFP growth.
Cost per task: the median human-equivalent cost of the tasks handled by Claude is about $54. Management and legal tasks tend to be longer and pricier (e.g., management average ~2.0 h, legal ~1.8 h) while food prep or installation/maintenance are in the 0.3–0.5 h range.
Contribution by occupation: software developers account for the largest share of the estimated impact (19% of the total), followed by managers, market analysts, customer support, and high school teachers.
Technical validation of the estimates
Anthropic ran checks to see if the model’s time estimates are informative:
Self-consistency: prompt variations over 1,800 conversations show strong logarithmic correlations, r = 0.89–0.93 between variants.
External benchmark (JIRA): they compared Claude Sonnet 4.5 with human estimates and real times on development tickets:
Interpretation: the model’s predictions give directional information close to that of human developers, although they tend to be compressed (overestimating short tasks and underestimating long ones) and show biases that reduce the real variance across tasks.
Concrete examples that help make the numbers real
Curriculum development: Claude estimates a task that would take 4.5 hours for a human is completed in 11 minutes with AI.
Reviewing diagnostic images: only ~20% savings in the shown case, suggesting tasks already optimized or that depend on expert judgment get less net benefit.
Compiling information from multiple reports: up to ~95% savings, because reading, extracting and citing is something current models are particularly fast at.
From task to aggregate: critical assumptions
To move from tasks to a macro effect you need several important assumptions:
Broad and relatively homogeneous adoption of the same ways of using AI that they observed.
That the model’s time estimates represent reasonable averages for all instances of each task.
That efficiency gains translate into a similar productive structure (they assume reinvestment of gains in capital, compatible with Acemoglu and Hulten frameworks).
If any of these assumptions fails (uneven adoption, jobs requiring more human verification after chat, slow reorganization effects), the real effect could be smaller or different in sectoral composition.
Limits and biases to keep in mind (yes, there are many)
Lack of full validation: the method depends on the model’s own estimates and doesn’t capture human work after the conversation (review, correction, integration into processes).
Selection bias: the conversations come from users who already chose to use Claude, so the sample contains tasks where users expected the tool to be useful.
Compression of estimates: Claude tends to oversimplify time ranges (extreme outliers in both directions and general compression).
Limited taxonomy: O*NET doesn’t capture tacit knowledge, relationships, supervision or coordination between tasks; many gains can be illusory if those interdependencies are ignored.
Structural assumptions: calculating an aggregate productivity gain requires assumptions about how firms restructure and how capital and labor are reallocated. That’s uncertain.
Practical and technical implications you should consider
Which tasks should you speed up first? Tasks that involve reading, synthesis and writing get a lot of benefit right now. Physical tasks or those requiring presence don’t show the same improvement.
Measuring real productivity matters: controlled trials and post-use tracking are necessary to compare model estimates vs effective time. Prior studies have shown smaller savings (56%, 40%, 26%, 14% or even negative results depending on application and model generation).
Better models and more context mean better estimates: features like memory, tool integrations and access to operational metrics will reduce noise in time forecasts.
Risk of bottlenecks: when some subtasks speed up a lot and others don’t, the slow ones can become flow limiters; you need to map processes, not just tasks.
Final reflection
Anthropic’s proposal is interesting because it moves productivity measurement from lab experiments and isolated cases to a massive analysis of real use. Does that mean AI will immediately double economic growth? Not necessarily. It means that, if the assumptions hold and adoption spreads, current models show potential for a meaningful increase in productivity growth rates. But it also reminds you that historical transformations didn’t come just from speeding tasks up—they came from reorganizing production.
The practical takeaway for companies and product leaders is clear: measure, validate and restructure. If you only apply AI to speed up tasks without changing workflows or responsibilities, you’ll run into bottlenecks that limit the gains. If instead you integrate AI into processes and reorganize work, effects could be much larger than the estimates here.