Anthropic publishes the third version of its Responsible Scaling Policy (RSP). Why should you care? Because the RSP is one of the clearest attempts by a company to define when an AI model demands stronger safety measures, and now it updates that map in light of what they learned over the last two years.
What is the RSP and why it matters
The RSP started as a conditional rule: if a model reaches certain dangerous capabilities, then specific safeguards apply. These stages were called ASL (AI Safety Levels). The lower levels, like ASL-2 and ASL-3, were well detailed; the higher levels were left open as the technology evolved.
Why does that logic make sense? Because AI moves fast: three years ago big models were basically chatbots; today they browse, write and run code, and can perform chained actions. That evolution creates new risks that fixed rules can miss. The tried to be a flexible mechanism to anticipate and mitigate those risks.
RSP
What worked and what didn't
The positives:
Internally, the RSP forced Anthropic to prioritize safeguards. For example, to meet ASL-3 they developed input and output classifiers to block dangerous content.
ASL-3 was triggered for relevant models in May 2025 and has improved since then.
Anthropic's example inspired similar frameworks at other companies (they mention OpenAI and Google DeepMind) and helped shape early rules and codes (for instance, references to SB 53 in California, the RAISE Act in New York, and work on the EU AI Act).
What didn’t go as expected:
Predefined thresholds turned out to be more ambiguous than expected. In many cases there were doubts about whether a model had truly crossed a threshold. Scientific evaluation is still not conclusive enough.
In biological risks the ambiguity was obvious: models show enough knowledge to trigger alerts on many tests, but those tests aren’t conclusive to prove real risk unequivocally.
Government action has been slower than needed. Political debate prioritizes competitiveness and growth, which complicates fast multilateral controls.
For higher risk levels, many mitigations would require international cooperation or national security support; alone, they could be impossible, as a RAND report on model-weight security suggests.
This balance led Anthropic to restructure the RSP before reaching those higher levels that would be hard to manage solo.
Key changes in version 3.0
Anthropic presents three main changes in RSP 3.0:
Separate what the company will do from what it recommends to the industry
There are now two mitigation maps: one with the measures Anthropic plans to implement no matter what, and another, more ambitious, that details what the industry as a whole should adopt to manage advanced risks.
Frontier Safety Roadmap
The new RSP requires publishing a Frontier Safety Roadmap: public objectives in Safety, Alignment, Safeguards and Policy.
These aren’t legally binding promises, but public goals Anthropic commits to report on openly. The idea is to create the pressure and clarity that worked in earlier versions.
Some concrete examples from that roadmap:
Moonshot R&D projects to research information-security at unprecedented levels.
Automated red‑teaming to complement bug bounty programs.
Systematic measures to ensure Claude behaves according to its constitution.
Centralized registries of critical developments, analyzed by AI to detect internal risks.
A “regulatory ladder”: policy proposals that scale with risk.
Risk Reports and external reviews
They will publish Risk Reports every 3 to 6 months with a broader analysis: capabilities, threat scenarios, active mitigations and an assessment of the risk level.
In certain cases there will be external review by independent experts with minimally redacted access to assess the reasonableness of Anthropic’s judgments.
Anthropic is already piloting this process.
The intent: more transparency, more accountability, and a clear line between what a company can do alone and what requires collective action.
What changes for companies, regulators and you?
For companies and developers: more public pressure to document goals and progress. That makes external audits easier and sends clear signals to investors and partners that the company takes risk management seriously.
For governments and regulators: RSP 3.0 provides technical material and practical examples that can help design laws and standards. But Anthropic acknowledges that international coordination and political will remain the bottleneck.
For society: more public reports and external reviews make it easier to understand how dangerous certain models are and when controls make sense. It’s not a perfect solution, but it improves traceability and public conversation.
Final thoughts
RSP 3.0 is a mix of realism and ambition: realistic in recognizing practical and political limits, ambitious in demanding transparency and public plans. It doesn’t solve the core problem—the need for collective action against very advanced capabilities—but it takes concrete steps so that collective action becomes more visible and accountable.
AI isn’t a black box that only labs understand. When companies publish roadmaps and risk reports, everyone wins: public conversation improves, regulators have something to work with, and companies become assessable. Will it be enough? Probably not by itself. Is it worth doing? Yes—because without transparency there’s no control, and without control there’s no trust.