Gemma 4: open multimodal AI that runs on-device

Gemma 4 arrives as a complete package: an open model under Apache 2, multimodal (text, image, audio, video), sizes meant to run from your laptop up to a server, and with results that in many cases work excellently without needing fine-tuning.

What's new with Gemma 4

Gemma 4 combines proven ideas and focused improvements to offer a practical, efficient model family:

Apache 2 license and open checkpoints for free use and deployment.
Multimodal: text + image + video; the smaller variants also handle audio.
Designed to run on many infrastructures: Transformers, llama.cpp, MLX, WebGPU, Rust, ONNX and more.
Four base sizes, all with a base checkpoint and instruction-tuned checkpoint: E2B (2.3B effective), E4B (4.5B effective), 31B dense and 26B A4B (MoE with 4B active).
Long context: 128k for E2B/E4B and 256k for the large models.

Quick takeaway? Models you can try today, even on-device, and designed to be efficient when quantized.

What's new with Gemma 4

Architecture and technical details

Performance and metrics

Multimodal capabilities and practical examples

Deployment: where and how to run it

Fine-tuning, training and demos

Practical reflection: what can you do today?

Original source

Stay up to date!

Gemma 4: open multimodal AI that runs on-device