OpenAI releases gpt-oss-safeguard-120b and gpt-oss-safeguard-20b for classification

OpenAI launched two open-weight reasoning models called gpt-oss-safeguard-120b and gpt-oss-safeguard-20b. What makes them different? They're designed to classify content according to a provided policy, are customizable, generate chain-of-thought full reasoning and operate under the Apache 2.0 license. They're not models meant to replace direct user interaction, but to be a rule-based evaluation and moderation layer.

What they are and what they're for

These variants are post-trainings of the original gpt-oss models, fine-tuned specifically to reason from a given policy and label content according to it. They're ready to use with the Responses API and offer reasoning effort options (low, medium, high) as well as structured outputs.

And why does that matter? Because they let you automate classification decisions with greater traceability: by providing full chain-of-thought, you can see the reasoning behind a label — useful for auditing and tuning policies.

What they are and what they're for

What they are and what they're for

Key practical features

Evaluation and safety: what OpenAI reports

Limitations and practical precautions

How could you use them today?

Final reflection

Original source

Stay up to date!

OpenAI releases gpt-oss-safeguard-120b and gpt-oss-safeguard-20b for classification