Why AIs Seem Human, According to Anthropic

Conversational AIs like Claude often feel surprisingly human: they celebrate when they fix a bug, apologize if they get stuck, and even paint almost cinematic scenes about how they would make an in-person delivery. Why do they act like that? Anthropic offers a technical but simple explanation: the humanlike behavior of AIs largely comes from them learning to interpret and represent “people” during their training.

What is the persona-selection model?

Anthropic calls their theory the persona-selection model. The central idea is that during the initial training phase, called pretraining, the model learns to predict the next token across huge amounts of text. That’s not just grammar: to predict well, the model must recreate dialogues, characters, and styles. In that sense, training turns the model into a very sophisticated autocomplete engine that simulates human characters, fictional ones, and everything in between.

These simulations are the personas: patterns of behavior, goals, and traits that appear in the texts the model saw. Important: personas are not the AI itself; they’re characters the model can play, like Hamlet or a kind assistant in a chat.

What is the persona-selection model?

Evidence and concrete examples

Why does this happen technically?

Practical consequences for development and safety

Open questions and research directions

Final reflection

Original source

Stay up to date!

Why AIs Seem Human, According to Anthropic