OpenAI launched two open-weight reasoning models called gpt-oss-safeguard-120b and gpt-oss-safeguard-20b. What makes them different? They're designed to classify content according to a provided policy, are customizable, generate chain-of-thought full reasoning and operate under the Apache 2.0 license. They're not models meant to replace direct user interaction, but to be a rule-based evaluation and moderation layer.
What they are and what they're for
These variants are post-trainings of the original gpt-oss models, fine-tuned specifically to reason from a given policy and label content according to it. They're ready to use with the Responses API and offer reasoning effort options (low, medium, high) as well as structured outputs.
And why does that matter? Because they let you automate classification decisions with greater traceability: by providing full chain-of-thought, you can see the reasoning behind a label — useful for auditing and tuning policies.
