Granite 4.1: architecture, training and benchmarks

Granite 4.1 is IBM's new family of dense LLMs (3B, 8B and 30B) trained on ~15T tokens with a five-stage pretraining pipeline and context extension up to 512K tokens. The interesting part: a dense 8B model matches or outperforms a 32B MoE on many benchmarks, and everything is released under Apache 2.0.

What is Granite 4.1 and why it matters

What is this advancement good for? Granite 4.1 shows that training quality and data strategy can make up for model size. Instead of just scaling parameters, the team prioritized progressively curated data mixes, rigorous supervised fine-tuning and a staged RL pipeline.

This matters if you are an engineer looking for efficient models for production, an entrepreneur who wants to deploy tool-enabled assistants, or a researcher studying alternatives to expensive MoE models.

Design and architecture (technical summary)

Granite 4.1 uses a dense decoder-only transformer with these key design decisions:

Grouped Query Attention (GQA)

What is Granite 4.1 and why it matters

Design and architecture (technical summary)

Five-stage pretraining pipeline

Supervised fine-tuning and LLM-as-Judge

Staged reinforcement learning (technical)

Performance and key benchmarks

Quantization, deployment and quick example

Infrastructure and license

When to use Granite 4.1?

Original source

Stay up to date!

Granite 4.1: architecture, training and benchmarks