The question is simple and urgent: when you use AI to program, do you learn less? Anthropic published a randomized controlled trial that tested exactly that with Python developers. The results aren't an absolute verdict, but they are a wake-up call for tool designers, team leads, and anyone who uses AI to learn on the job.
What they did and why it matters
The study recruited 52 engineers (mostly junior) who use Python regularly, and asked them to learn and apply Trio, an asynchronous programming library most of them didn't know. Participants were split into two groups: one with access to an AI assistant in the sidebar and one without. Everyone knew there would be a quiz at the end about what they learned.
They measured two key things:
- How fast they completed the task (productivity).
- How well they'd learned the concepts, evaluated with debugging questions, code reading, and conceptual understanding.
The bigger question was relevant: automation with AI can speed up work, but at what cost to human ability to supervise and understand critical systems?
Main results (numbers and practical meaning)
The central findings are clear and statistically significant:
- Average quiz score was 67% for the no-AI group and 50% for the AI group. That gap is nearly two letter grades on an academic scale (Cohen's d = 0.738, p = 0.01).
- In time, the AI group finished about 2 minutes faster on average, but that difference wasn't statistically significant.
- The largest gap on the quiz came from debugging questions — identifying why code fails.
Interpretation: using AI during immediate learning reduced mastery of concepts relevant to supervising and fixing code, even though it didn't produce a clear productivity gain in this learning scenario.
How people interacted with the AI (modes that matter)
Not all reliance on AI is the same. The study annotated screen recordings and classified usage patterns that correlated with different outcomes:
-
Low-score patterns (scores < 40%):
- AI delegation: the participant delegated all the writing to the assistant. Faster, but little learning.
- Progressive AI reliance: started by asking for something and ended up delegating everything.
- Iterative AI debugging: relied on the assistant to fix errors instead of understanding why they happened.
-
High-score patterns (>= 65%):
- Generation-then-comprehension: first asked for code and then asked for explanations to understand it.
- Hybrid code-explanation: requested generation together with explanations, actively reading the responses.
- Conceptual inquiry: asked only conceptual questions and fixed errors on their own. This group was efficient and showed good understanding.
Practical takeaway: using AI to generate code without probing or asking for explanations tends to reduce retention; using it as a tutor or questioning partner tends to preserve or improve learning.
Product design and team policy: technical and practical recommendations
If you design AI tools or lead teams, here are actionable points:
- Design learning modes: incorporate a
Study ModeorExplanatory Modethat encourages or requires asking for explanations, not just code. - Force reflection: ask users to document or explain the generated snippet before accepting it (small frictions that foster understanding).
- Integrate quizzes and checkpoints: short assessments right after a session to consolidate learning.
- Intentional human reviews: code reviews focused on conceptual understanding, not just style.
- Rotate tasks and use pair programming: assign problems that require investigation and debugging, not just assembling automated solutions.
- Educational telemetry: measure types of queries (generation vs conceptual) to detect risks of cognitive offloading.
Technically, you can instrument prompts that request the assistant reply in an explanatory format, or provide pedagogical scaffolding—e.g., asking first for the logic in pseudocode and then for the implementation.
Limitations and open questions (yes, there are many)
The study is solid but has clear limits:
- Relatively small sample (n = 52).
- Immediate evaluation: measures short-term retention, not longitudinal learning.
- Specific task: learning
Trioand async concepts. Other domains might behave differently. - The assistant used wasn't a fully autonomous agent; more agentive tools like
Claude Codeor others could have different, possibly larger, impacts.
Future research should explore long-term effects, compare AI vs human assistance during learning, and test whether the gap narrows as engineers gain fluency.
What this means for you if you code or lead teams
If you're junior or learning a new library, asking the AI to write everything for you is tempting but risky for your development. Do you want to save 2 or 10 minutes now, or do you want to be able to diagnose and fix production failures tomorrow?
If you lead teams, giving access to AI isn't enough. You need processes that force people to understand what the AI suggests: focused reviews, educational modes in tools, and continuous evaluation practices.
In short: AI can speed up tasks you already master and boost productivity, but when the goal is deep learning, how you interact with the AI matters as much as the tool itself.
Original source
https://www.anthropic.com/research/AI-assistance-coding-skills
