Anthropic has just handed Petri, its open toolkit for evaluating alignment in large language models, over to Meridian Labs and released Petri 3.0.
If you work evaluating models — or you're simply curious about how we measure risks like deception or sycophancy — this changes how we can audit models openly and reproducibly. Want to know why it matters for your audits or deployments?
What is Petri and how it works
Petri started as a suite of alignment tests you can run against any large language model. Its basic flow separates three components: an auditor that creates scenarios, the target model that responds to those scenarios, and a judge that scores the transcripts for problematic behaviors like deception, sycophancy, or cooperating with harmful requests.
That separation makes it easy to automate large-scale evaluations and compare different versions of the same model. Since Claude Sonnet 4.5, Anthropic has used Petri as part of its internal evaluations, and external organizations like the UK’s AI Security Institute already include it in their processes.
