Olmo 3: the open model flow that drives AI | Keryc
Olmo 3 arrives to change how we think about one fundamental thing: not only the final weights matter, but the entire flow that generates them. Why is that relevant to you, researcher or developer? Because opening the model flow means you can intervene at any stage, reproduce results and, above all, understand why a model does what it does.
Qué es Olmo 3 y por qué importa
Olmo 3 is a family of open models (7B and 32B parameters) and, more importantly, the release of the complete development path: data, checkpoints, code and traceability down to the data point that caused a behavior. It’s not just publishing weights; it’s publishing the whole process so you can audit, reproduce and improve.
Here are two key novelties: first, Olmo 3-Think (32B) exposes intermediate traces of reasoning. Second, the whole flow comes with checkpoints at every training milestone, so you can pause, fork or mix stages at will.
La familia Olmo 3: Base, Think, Instruct y RL Zero
Olmo 3-Base (7B, 32B): the solid base. Designed to keep performance in long contexts (up to ~65K tokens) and serve as a platform for further pretraining or fine-tuning.
Olmo 3-Think (7B, 32B): post-trained for deep reasoning. It shows reasoning traces and competes with open thinking models at similar scale, reaching scores close to Qwen 3 on several benchmarks while training with fewer tokens in some cases.
Olmo 3-Instruct (7B): made for chat, quick replies and tool use. Optimized for inference efficiency and strong performance in function calling and instruction following.
Olmo 3-RL Zero (7B): an open path for RL experiments. They publish series of checkpoints by domain: math, code, instructions and general chat so you can study RL with verifiable rewards.
What's the practical idea? You start with Olmo 3-Base, pick a route (Instruct, Think or RL Zero) and apply your data or objectives at concrete points in the flow.
Arquitectura y pipeline de entrenamiento
Olmo 3 uses a decoder-only transformer and a multi-stage pipeline:
Initial pretraining for broad coverage.
Mid-training targeted at hard material: math, code and reading comprehension.
Long-context extension to handle very long documents.
Then comes post-training with the recipe SFT -> DPO -> RLVR, documented and replaceable. The important part is that at each step there are checkpoints available: base, mid-trained, long-context and post-trained for each route.
Datos: Dolma 3, Dolci y mixes
Olmo 3 presents a fully open data curriculum:
Dolma 3: corpus ~9.3 trillion tokens (9.3T) mixing web, scientific PDFs processed with olmOCR, code repositories, math problems and encyclopedic text.
Dolma 3 Mix: pretraining mix ~5.9T tokens with a higher share of code and math and strong decontamination via deduplication and filtering.
Dolma 3 Dolmino: mid-training, ~100B tokens sampled from a ~2.2T token pool focused on math, science, code, instruction and deep reading.
Dolma 3 Longmino: ~50B tokens to teach tracking in very long contexts from a 639B token pool.
Dolci: post-training suite for SFT, DPO and RLVR with high-value data for reasoning, tool use and instructions.
All datasets are released with mixes and tools to replicate the same preprocessing, tokenization and deduplication.
Infraestructura y eficiencia
They trained Olmo 3 on up to 1024 H100 GPUs. For Olmo 3-Base (7B) they report throughput of 7.7K tokens per device per second. In post-training, they moved SFT from Open Instruct to Olmo Core and increased tokens-per-second by 8x. In RL, improvements like in-flight weight updates and continuous batching made that phase 4x more efficient.
They also point out that 32B is a practical sweet spot: capable enough for serious research and accessible enough for mid-sized teams to fine-tune and deploy.
Rendimiento y benchmarks técnicos
Olmo 3 was evaluated with a broad, up-to-date suite, grouping standard tasks and some new ones. Highlights:
Olmo 3-Base 32B leads among fully open models in programming, reading comprehension, math and long-context benchmarks like RULER.
Olmo 3-Think 32B is one of the strongest open thinking models; it ties with or gets very close to the best open-weight models (e.g. Qwen 3 32B) on MATH, OMEGA, BigBench Hard, HumanEvalPlus and PopQA.
Olmo 3-Instruct 7B delivers competitive, efficient performance for chat and function calls, matching or outperforming other open weights at its scale.
In short: Olmo 3 closes the gap in reasoning while keeping excellent practical-task capabilities.
Herramientas, trazabilidad y reproducibilidad
Olmo 3 includes a set of tools to make the flow truly actionable:
OlmoTrace to map model outputs to training examples in real time.
olmo-core for distributed training.
Open Instruct for flexible post-training.
datamap-rs for large-scale cleaning in Rust.
duplodocus for efficient fuzzy deduplication.
OLMES for reproducible evals and OlmoBaseEval as a benchmark collection.
decon to remove test sets from training data.
With these utilities you can reproduce training curves, run ablations or instrument intermediate traces to understand why the model fails or succeeds.
Cómo puedes usar Olmo 3 hoy
If you research reasoning, use Olmo 3-Think and explore the traces to design better objectives or rewards.
If you build agents or assistants, try Olmo 3-Instruct for chat and efficient function calling.
If you want to experiment with RL, parts of Olmo 3-RL Zero give you checkpoints and a reproducible pipeline.
Need to specialize it? Insert your data in mid-training or fork a checkpoint that has the data mix you care about.
Everything is designed so you can repeat the exact steps of the team that developed Olmo 3 or create your own variants, whether in a local notebook or on a research cluster.
Olmo 3 bets on a practical notion of openness: sharing weights is not enough, you must share the knowledge and tools that explain those weights. Want to audit a model, improve its data bias, or simply understand how complex skills emerge? Here’s a complete flow to do it.