OpenAI introduces GPT‑OSS: open models 120B and 20B

Aug 4, 20253 minutes

OpenAI takes a big step toward open AI with GPT‑OSS, a family of reasoning models you can download, run, and fine‑tune to your needs. What's the promise? Performance close to their commercial models, but with open weights and a flexible license.

The essentials in 30 seconds

Two models: gpt‑oss‑120b and gpt‑oss‑20b, with open weights under the Apache 2.0 license.
In reasoning tasks, the 120b approaches o4‑mini and the 20b compares with o3‑mini across several benchmarks.
Built for practical use: 120b can run on a single 80 GB GPU; 20b targets environments with ~16 GB memory.
128k context, tools (web search, Python), Structured Outputs and chain of thought (CoT) available for research and debugging.
Compatible with the Responses API and designed for agentic flows. Release: August 5, 2025. (openai.com)

What changes for you?

If you're an independent developer, you no longer need a GPU farm to prototype agents that call functions or run code. Can you iterate locally, cut latency, and keep control of your stack? Yes.

If you lead a team or startup, you get more options to balance cost, performance, and privacy. Need on‑prem for sensitive data? This opens the door without giving up solid reasoning metrics.

In enterprises and the public sector, the risk equation shifts: open weights mean auditability and data sovereignty, but also responsibility to apply your own safeguards.

How they're built

Both models use a Transformer architecture with Mixture‑of‑Experts (MoE). The 120b has 36 layers and activates ~5.1B parameters per token (with 128 experts, top‑4 active per token); the 20b uses 24 layers and ~3.6B active. Dense and windowed attention alternate, grouped GQA, RoPE, and native context up to 128k. They also include MXFP4 quantization to tune memory and speed, and a new open tokenizer: o200k_harmony. (cdn.openai.com)

Practical translation: more brain when needed, less compute when it's not; and enough efficiency to run on accessible hardware.

Performance in evaluations

On AIME (2024 and 2025), MMLU, HLE and GPQA, the 120b outperforms o3‑mini and approaches o4‑mini; the 20b competes surprisingly well despite its size. On HealthBench, the gpt‑oss models even beat some proprietary models in certain cases. As always, they don't replace professional judgement. (openai.com, cdn.openai.com)

Safety and responsible use

OpenAI subjected an adversarially tuned 120b to its Preparedness Framework (bio, cyber, and self‑improvement). Result: it doesn't reach the "High" capability threshold in those categories; still, open weights bring different risks and require additional implementer controls. Also, while CoT is available for research and monitoring, showing chains of thought to end users is not recommended. (openai.com, cdn.openai.com)

How to try them today

Read the announcement and the model card to understand licenses, limits, and best practices.
Define the “reasoning effort” in your system (low, medium, high) based on the task: more speed or more accuracy.
Integrate with the Responses API for agentic flows (tool calls, code execution, and structured outputs).
Evaluate locally with your own data and metrics; if you go on‑prem, prepare security safeguards and monitoring. (openai.com)

Quick questions

Is it open source? It's “open‑weights” under Apache 2.0: you can download, use, and fine‑tune, but remember the usage policies. (openai.com)
Will it run on my machine? The 20b is aimed at environments with ~16 GB of memory; the 120b needs an 80 GB GPU. Adjust expectations for your hardware and latency. (openai.com, cdn.openai.com)
And the tokenizer? o200k_harmony is released with the models to ease compatibility and efficiency. (cdn.openai.com)

Official link and documentation

Announcement and technical details: Introducing gpt‑oss
Model card (PDF): gpt‑oss‑120b & gpt‑oss‑20b

In the end, GPT‑OSS lands a simple but powerful idea: more people experimenting with high‑level AI, on their own terms. The key? Understand the power it gives you… and the responsibility that comes with it.

Stay up to date!

Receive practical guides, fact-checks and AI analysis straight to your inbox, no technical jargon or fluff.