Anthropic donates Petri 3.0, an open alignment tool

Anthropic has just handed Petri, its open toolkit for evaluating alignment in large language models, over to Meridian Labs and released Petri 3.0.

If you work evaluating models — or you're simply curious about how we measure risks like deception or sycophancy — this changes how we can audit models openly and reproducibly. Want to know why it matters for your audits or deployments?

What is Petri and how it works

Petri started as a suite of alignment tests you can run against any large language model. Its basic flow separates three components: an auditor that creates scenarios, the target model that responds to those scenarios, and a judge that scores the transcripts for problematic behaviors like deception, sycophancy, or cooperating with harmful requests.

That separation makes it easy to automate large-scale evaluations and compare different versions of the same model. Since Claude Sonnet 4.5, Anthropic has used Petri as part of its internal evaluations, and external organizations like the UK’s AI Security Institute already include it in their processes.

What is Petri and how it works

Key changes in Petri 3.0

Relevant technical aspects

Why donating Petri to Meridian Labs matters

Practical implications for developers and evaluators

What's next and quick recommendations

Original source

Stay up to date!

Anthropic donates Petri 3.0, an open alignment tool