3D rendering often sounds like technical magic: models, lights, shadows, and a bunch of formulas. What if I told you that now a neural network can learn that entire process without you having to program the rules by hand?
RenderFormer is the name of the new approach from Microsoft Research that does exactly that: a transformer
network able to learn a complete rendering pipeline and produce images with global illumination without relying on traditional ray tracing or rasterization. (microsoft.com)
What is RenderFormer and why does it matter?
In short, RenderFormer shows that rendering can move from being centred on explicit physical rules to being learned from data. Why does that matter to you? Because it opens clear possibilities: task-specific renders, less dependence on conventional graphics engines, and a path to integrate rendering with video generation or embedded agents.
The work was presented by the Microsoft Research team and accepted to SIGGRAPH 2025, and it's available as open source. (microsoft.com)
How does it work without the classic tricks?
RenderFormer represents the scene as a collection of triangle tokens. Each token encodes position, normals and material properties (for example diffuse color, specularity, roughness). The camera is described by ray tokens, one per pixel or per pixel-block.
It then uses two transformers: one for view-independent features (diffuse shadowing, indirect light transport) and another for view-dependent effects (reflections, visibility, highlights). This way, the network learns to combine geometric and camera information to produce the final image. (microsoft.com)
RenderFormer leaves in the model’s hands what used to be done with rules: from soft shadows to specular reflections.
Data and training: where does it learn from?
The team trained the model with Objaverse
, a large collection of 3D models with annotations. They built scene templates and generated HDR renders with Blender to teach the model to handle varied lighting and materials.
The base model has around 205 million parameters and was trained in two phases: first at 256×256 for 500,000 steps and then at 512×512 for 100,000 steps, scaling the number of supported triangles. These details explain why it generalizes to scenes with arbitrary geometry. (microsoft.com)
What does it achieve in practice?
Visual results show that RenderFormer reproduces shadows, diffuse shading and reflections with high fidelity across diverse scenes. It can also generate video sequences by controlling viewpoint changes frame by frame, which is useful for animation and immersive experiences. (microsoft.com)
Imagine this applied to: rapid scene generation for architectural prototypes, AI-guided visual effects in games, or cloud rendering engines that learn specific styles without manual parameter tuning.
Limits and challenges that remain ahead
Not everything is perfect. Scaling to very large scenes, handling complex materials and extreme lighting conditions remain challenges. The transformer architecture helps, but computational efficiency and fidelity in edge cases need more research.
There's also a practical question: how do you integrate this with existing pipelines in animation studios or game engines? The technical and cultural transition will be as important as the pure technical improvements. (microsoft.com)
What does this mean for people who use or create 3D?
If you work in visualization, design or games, RenderFormer won’t replace traditional tools overnight, but it points in a direction: learned renderers can reduce manual steps, speed up iterations and personalise outputs.
For entrepreneurs and developers, the opportunity is to build layers that marry the flexibility of machine learning with the robustness of industrial pipelines. For artists, it’s another creative tool that can free up time for what really matters: the visual narrative.
Reading and resources
The team’s original article contains diagrams, ablation studies and links to the code and the SIGGRAPH 2025 paper. To dive into implementation and results, check the official source. (microsoft.com)
To finish: RenderFormer isn’t just an elegant experiment. It’s a sign that 3D rendering is entering a phase where AI not only helps, but can redefine how we create images and scenes. Ready to try a way of making graphics that learns as you work?