Gemma 4 has been evolving fast over the past couple of months. Now Google is releasing checkpoints trained with Quantization-Aware Training (QAT) so you can run powerful models locally — on your phone or laptop — using much less memory and without losing the quality you expect.
What's in this update
The core idea is simple: instead of compressing the model after training (what's called Post-Training Quantization or PTQ), the compression is simulated during training. That helps keep accuracy when the model is converted to smaller formats.
QAT simulates quantization during training to minimize quality loss when the model is compressed.
Google now offers QAT checkpoints in the popular Q4_0 format and also a new format specially designed for mobile. With that mobile format, Gemma 4 E2B text-only cuts its memory footprint to around , making long conversations possible on consumer devices.
