OpenAI strengthens safety with external evaluations | Keryc
OpenAI announces it's inviting trusted external evaluators to test its cutting-edge models. Why open the black box to third parties? Because safety isn't just an internal claim: it needs independent verification, transparency and a diversity of methods to spot blind spots and improve deployment decisions.
What external evaluations are and why they matter
Third-party evaluations are independent reviews done by organizations and experts outside the lab that built the model. They don't replace internal testing — they complement it: validating safety claims, uncovering blind spots and boosting public confidence in how these systems are assessed and deployed.
Why should you care? Because these tests help answer critical questions: Can the model plan dangerous actions in a lab? Can it evade oversight or self-improve? Does it have offensive cyber capabilities? Having third parties look reduces the risk of self-confirmation and improves the quality of deployment decisions.
Main forms of external collaboration
OpenAI describes three main ways of working with third parties:
Independent evaluations: external labs apply their own methods to assess risk areas like biosecurity, cybersecurity, self-improvement and long-horizon planning behavior.
Methodological reviews: experts review how OpenAI designs and runs its internal tests, giving feedback on methodology and evidence without necessarily repeating costly experiments.
Probing by subject-matter experts (SMEs): specialists perform real tasks and score how much the model can raise a novice's skill to a competent level, providing qualitative judgment and practical context.
Concrete examples and access controls
Since GPT-4, OpenAI has collaborated with various partners. For GPT-5, they mention evaluations in risk areas such as long-term autonomy, deception, subverting oversight, lab planning and offensive cybersecurity.
To enable these tests, OpenAI offered:
Secure access to early model checkpoints.
Models with fewer mitigations or 'helpful-only' versions when needed.
Access to chain-of-thought to inspect reasoning traces in cases where that reveals behaviors like sandbagging or scheming.
Conditions like zero data retention when required.
These measures came with updated safety controls tuned to the model's capability and the needs of each evaluation.
External evaluations add an independent layer of assessment that complements internal work and helps prevent confirmation bias.
When methodological review is the best option
When reproducing experiments requires heavy infrastructure (for example, adversarial fine-tuning to estimate worst-case scenarios in open models), OpenAI chose to invite third parties to review methods and results instead of asking them to replicate experiments. That allowed valuable recommendations without duplicating costs and showed how methodological confirmation can improve processes without repeating work.
Transparency, confidentiality and publication
OpenAI lays out the rules clearly:
Evaluators sign confidentiality agreements that allow sharing non-public information necessary for the evaluation.
The goal is to enable publication and transparency, but with review steps to protect secrets and verify facts before release.
Many evaluations and summaries are included in system cards, and several organizations have published their work after joint review.
Incentives and sustainability of the ecosystem
OpenAI pays or subsidizes evaluators to foster a sustainable ecosystem, though some organizations decline payment on principle. Important: payments don't depend on the outcome of the evaluation.
Building credible external capacity requires steady funding, methodological rigor and security measures for sensitive access. Without that, model progress will outpace independent evaluation capacity.
Impact on governance and deployment
Third-party evaluations directly influence responsible deployment decisions. They serve to:
Inform mitigation changes before release.
Add evidence to system cards that explain capabilities and risks.
Strengthen sustained trust and learning between labs and evaluators.
Does this mean safety is solved? No. But it changes the game: moving from internal claims to external evidence improves governance and gives regulators, researchers and the public more to judge risks by.
OpenAI emphasizes that these evaluations are just one piece of the puzzle: collaborations with red teams, collective alignment projects and advisory groups complement this work.
Think of this as a collective effort to get evaluations that are more robust, more replicable and more useful for making responsible decisions about technologies that affect everyone.