Anthropic detects distillation attacks against Claude

Anthropic has uncovered industrial-scale campaigns targeting Claude, their language model: three labs — DeepSeek, Moonshot and MiniMax — generated more than 16 million exchanges using around 24,000 fraudulent accounts to extract system capabilities. Why should you care even if you’re not an AI engineer? Because it’s not just commercial rivalry; it’s a risk to security and to how this technology is governed worldwide.

What is distillation and why it matters

Distillation is, essentially, training a less powerful model with the outputs of a more powerful one. It’s a legitimate technique when the same organizations use it to build smaller or cheaper versions. But can it be used illicitly? Yes — instead of investing years and resources in research, someone can extract valuable capabilities from another model at scale.

What’s the danger here? Illicitly distilled models often lack the safeguards the original creators built to prevent dangerous uses. That means capabilities like complex reasoning, tool use, or code generation can spread without controls. And when those capabilities fall into the hands of authoritarian governments or malicious actors, the consequences range from coordinated disinformation campaigns to real cybersecurity and biotechnology threats.

What is distillation and why it matters

What Anthropic found: who did it and how they did it

Signs that give away a distillation attack

How Anthropic responds and what it asks

Practical implications for users and policymakers

A call for collaboration and vigilance

Original source

Stay up to date!

Anthropic detects distillation attacks against Claude