Modular Diffusers: modular blocks for diffusion pipelines | Keryc
Hugging Face introduces Modular Diffusers, a more flexible and composable way to build diffusion pipelines. Tired of rewriting entire pipelines every time you want to swap a component? Modular Diffusers lets you mix and match reusable blocks: text encoders, image encoders, denoise steps, decoders, and custom blocks you can inspect and run separately.
What Modular Diffusers is and why it matters
Modular Diffusers complements the DiffusionPipeline class with a block-based alternative that encapsulates each block's inputs, outputs, and logic. Think of pipelines as a socket where you can pull a component out, test it in isolation, and plug it back without breaking the rest.
Key advantage: you can compose custom workflows, reuse components across pipelines, and manage memory and weight loading more granularly.
This changes the ergonomics of model development: less duplicated code, more experimentation, and compatibility with visual interfaces like Mellon.
How it works — practical example and technical concepts
The API keeps the simplicity you know, but the internal objects are composable blocks. Here’s a short example with FLUX.2 Klein 4B that shows the separation between defining the workflow and loading weights:
import torch
from diffusers import ModularPipeline
pipe = ModularPipeline.from_pretrained("black-forest-labs/FLUX.2-klein-4B")
pipe.load_components(torch_dtype=torch.bfloat16)
pipe.to("cuda")
image = pipe(prompt="a serene landscape at sunset", num_inference_steps=4).images[0]
image.save("output.png")
Under the hood pipe.blocks lists sub-blocks like text_encoder, vae_encoder, denoise and decode. Each block is self-contained: you can extract it, run it alone, and recompose the pipeline.
Block-by-block workflow
blocks.init_pipeline() turns a collection of blocks into an executable pipeline.
load_components() downloads and prepares weights (supports device_map, quantization and types like bfloat16).
ComponentsManager helps manage memory and automatic offloading across multiple pipelines.
Example of separating the text encoder to reuse embeddings:
# extract the text block and run it separately
text_blocks = pipe.blocks.sub_blocks.pop("text_encoder")
text_pipe = text_blocks.init_pipeline("black-forest-labs/FLUX.2-klein-4B")
text_pipe.load_components(torch_dtype=torch.bfloat16)
text_pipe.to("cuda")
prompt_embeds = text_pipe(prompt="a serene landscape at sunset").prompt_embeds
remaining_pipe = pipe.blocks.init_pipeline("black-forest-labs/FLUX.2-klein-4B")
remaining_pipe.load_components(torch_dtype=torch.bfloat16)
remaining_pipe.to("cuda")
image = remaining_pipe(prompt_embeds=prompt_embeds, num_inference_steps=4).images[0]
This enables optimizations: for example, generate prompt_embeds on one machine and decode on another, or use different precisions per component.
Creating custom blocks
A custom block is a Python class that defines its expected_components, inputs, intermediate_outputs and its logic in __call__. That way you encapsulate a specific operation and publish it to the Hub.
A short example of a block that extracts depth maps:
expected_components declares which models the block needs. If you provide a pretrained_model_name_or_path, load_components() will download that model automatically, unless you override it with update_components().
Modular repos and publishing to the Hub
Modular Diffusers also introduces the notion of a modular repository. A modular_model_index.json can point to components distributed across several repos. That lets you, for example, quantize only the transformer and load the original VAE from another repo.
Also, a modular repo can contain custom block code and Mellon configurations together. Publishing a block is as simple as pushing your repo and using trust_remote_code=True when loading it.
Visual integration: Mellon
Mellon is a node-based interface that leverages the consistency of the block API. Some key differences compared to other node UIs:
Dynamic nodes: the UI adapts to the model you load into the node.
Full-pipeline nodes: you can collapse a whole pipeline into one node to keep the canvas tidy.
Hub integration: published blocks show up in Mellon without extra UI code.
Quick usage example: drag a Dynamic Block node, set repo_id to diffusers/gemini-prompt-expander-mellon, load the block and connect its prompt output to the encode node. Gemini expands the prompt automatically.
Note: Mellon is early-stage, great for prototyping and visualization, but not yet recommended for critical production.
Use cases and community examples
The community is already publishing modular pipelines with impressive results:
Krea Realtime Video: a 14B-parameter pipeline achieving 11 fps on B200 for real-time video generation, with blocks for text-to-video, video-to-video and streaming.
Waypoint-1: a 2.3B autoregressive world model that generates interactive worlds from control inputs and prompts.
These examples show the power of packaging novel architectures as reusable blocks and sharing them on the Hub so anyone can load them with ModularPipeline.from_pretrained.
Practical considerations and recommendations
Memory handling: use ComponentsManager and device_map for offloading and selective component loading.
Types and quantization: load_components() accepts per-component configurations; you can mix bfloat16 and float16 as needed.
Security: if you load remote code, evaluate trust_remote_code carefully and review the repo before running in sensitive environments.
Experimentation: blocks make A/B testing components easy (for example, different ControlNets or VAEs) without rewriting entire pipelines.
Final thought
Modular Diffusers is a practical evolution: it brings composition, reuse and visual tools to the diffusion model workflow. Want to iterate fast, share components, or connect pipelines in visual interfaces? This approach gives you the pieces to do it with fine control over weight loading and memory.
Try it, publish a block and tell the community what worked and what didn’t. The flexibility is there; now it’s your turn to build and collaborate.