Anthropic's AI Fluency Index reveals key usage patterns | Keryc
The adoption of AI tools is no longer just a statistic: it's part of everyday practice. But does using an AI mean you're using it well?
What behaviors do people who become more competent actually show? Anthropic published a technical report that maps those habits and proposes a way to measure "fluency" with AI. Here I explain the essentials, with data and practical tips you can try today.
What they measured and how
Anthropic builds on the 4D AI Fluency Framework, developed by Rick Dakan and Joe Feller, and defines 24 behaviors that exemplify a safe and effective human-AI collaboration. Of those 24, 11 are directly observable in conversations inside Claude.ai and Claude Code.
For this study they analyzed 9,830 conversations with multiple back-and-forths during a 7-day window in January (anonymized sample processed with a privacy-preserving analysis tool). To identify behaviors they ran 11 binary classifiers: the behavioral classification was done by and language detection by .
Claude Sonnet 4
Claude Haiku 3.5
Important: the results reflect what is visible inside the chat; there are 13 other behaviors (for example, assigning the AI a role or assessing risks when sharing outputs) that happen outside the conversation and were not measured here.
Key findings
Anthropic highlights two main patterns: the power of iteration/refinement and a behavior shift when the AI is producing artifacts (code, documents, apps).
Iteration and refinement
85.7% of conversations showed iteration and refinement: that is, building on previous responses rather than accepting the first answer.
Iterative conversations show on average 2.67 additional fluency behaviors, versus 1.33 in non-iterative conversations.
For evaluation-related behaviors: iterative conversations are 5.6x more likely to challenge the model's reasoning and 4x more likely to identify missing context.
What does this tell you? Iterating isn't wasting time: it's the most consistent way to trigger other critical fluency behaviors.
Artifact production: more directive, less critical
12.3% of conversations involved creating artifacts (code, documents, tools). In these chats, users were more directive: clarified goals (+14.7 percentage points), specified formats (+14.5 pp), gave examples (+13.4 pp), and repeated iterations (+9.7 pp).
But paradoxically, in those same conversations users were less critical: they identified missing context less often (-5.2 pp), verified facts less (-3.7 pp), and questioned model reasonableness less (-3.1 pp).
Possible explanations? The AI may deliver polished outputs that invite trust; testing and validation may happen outside the chat; or the task may prioritize aesthetics/functionality over factual precision.
How to develop your own AI fluency (practical recommendations)
Based on the observed patterns, here are three concrete practices that improve fluency:
Stay in the conversation. Treat the first answer as a draft: ask about assumptions, request refinements, and challenge details. Iteration is the most powerful lever to increase other fluency behaviors.
Question polished outputs. Looks good, but is it correct? Do reality checks: ask for sources, request step-by-step explanations, and contrast with your own knowledge or simple practical tests.
Set collaboration terms from the start. Only 30% of conversations said how the user wanted the AI to interact. Try clear instructions like:
"Explain your reasoning before giving the final answer."
"If there is uncertainty, mention it and quantify it."
"Propose alternatives and prioritize clarity over brevity."
Example of a direct prompt:
Act as a critical reviewer: first give me step-by-step reasoning; then a 3-point summary; finally indicate uncertainties or assumptions.
Important limitations (and why they matter)
Biased sample: users of Claude.ai who keep multi-turn conversations over a week. They're likely early adopters and don't represent the whole population.
Partial coverage: only 11 of 24 indicators were available in-chat. Ethical and responsibility behaviors that occur outside the dialogue weren't measured.
Binary classification: each behavior was marked as present or absent, which loses nuance and degrees of demonstration.
Implicit behaviors: people may be verifying or testing outside the chat, so absence of a signal in the conversation doesn't mean absence of evaluation.
Correlations, not causation: we can't claim that iterating causes more critical judgment; both might relate to task complexity or user experience.
Looking ahead
Anthropic proposes using this index as a baseline to study fluency evolution over time. Next steps announced include cohort analyses (new vs experienced users), qualitative methods for non-observable behaviors, and experiments to explore causality (for example, whether encouraging iteration increases critical evaluation).
For developers and educators, this suggests two priorities: design interfaces that encourage iteration and create exercises that teach how to validate polished outputs.
The practical lesson is simple: AI can speed up work, but the human skill of asking, verifying, and iterating is still what makes the difference.