Today Google presents DiffusionGemma, an experimental model that rethinks how text is generated to gain a lot of speed without relying on the old “typing” process word by word.
What is DiffusionGemma and why it matters
DiffusionGemma is an open model under the Apache 2.0 license: a Mixture of Experts (MoE) of 26B that activates only 3.8B parameters during inference. Instead of producing tokens sequentially, it generates whole blocks of text at once, which lets it reach up to 4x the speed on dedicated GPUs.
So what does that mean for you as a developer or creator? Less latency when you run models locally, snappier responses in interactive editors, and better experiences in flows where every millisecond counts.
Practical advantages (and their limits)
-
Speed: Up to 1000+ tokens per second on an NVIDIA H100 and 700+ tokens per second on a GeForce RTX 5090. That turns tasks that used to feel slow into almost instant interactions.
