Recently DeepMind introduced a new generation of models designed so robots not only see, but also think, plan and act in the real world.
Sound like science fiction? Not really: these are practical advances aimed at solving multi-step tasks with context—something many robots still struggle with today.
What DeepMind announced
DeepMind introduced two key pieces: Gemini Robotics 1.5
, a vision-language-action (VLA) model that turns images and instructions into motor commands for robots, and Gemini Robotics-ER 1.5
, an embodied reasoning model (VLM) that creates detailed plans and can call digital tools. The goal is to get robots to think before they act and to reveal part of their decision process. (deepmind.google)
Also, Gemini Robotics-ER 1.5
is already available to developers via the Gemini API in Google AI Studio, while Gemini Robotics 1.5
is available with selected partners. That means the reasoning layer for robots arrives first so creators can test and integrate it. (deepmind.google)
How these models work in simple words
Think of two roles: one that understands and reasons about the scene and rules, and another that converts those decisions into real movements.
Gemini Robotics-ER 1.5
acts like the high-level brain: it analyzes images, plans steps, estimates when something is completed and can call online tools or other modules to perform concrete actions. Gemini Robotics 1.5
(VLA) takes that plan and generates the motor commands. Together they enable more coherent cycles of perception, planning and action. (developers.googleblog.com)
The ER model improves spatial and temporal understanding: it can generate precise 2D points to locate objects in an image and reason about what happens between different moments in a video. That helps a plan be more than a checklist—it becomes a sequence anchored in real perception. (developers.googleblog.com)
Practical examples that stop being theory
Want a concrete example? Ask a robot to sort waste: the model can look up local recycling rules, identify the objects in front of it, and plan the sequence to deposit them in the correct bin. It’s not just recognizing a bottle; it’s understanding the rule and executing the steps. (developers.googleblog.com)
Other imaginable cases: organizing boxes in a warehouse by priority, assembling a workstation step by step, or assisting in lab tasks that require controlled sequences. That said, we’re not talking full autonomy; these are tools for designers and operators to build more capable robotic systems.
Limitations and safety
DeepMind includes safety filters and improvements so the models can recognize and refuse to generate plans that break physical constraints—like exceeding a robot’s payload. Still, these systems need controlled testing: the physical world throws up unexpected variables and human responsibility remains crucial. (developers.googleblog.com)
It’s also important to remember that Gemini Robotics 1.5
is still in limited deployment with partners, while Gemini Robotics-ER 1.5
is offered as a developer preview, which means practical adoption will be gradual. (deepmind.google)
If you're a developer or entrepreneur, what can you do now?
-
Try
Gemini Robotics-ER 1.5
in Google AI Studio and the Gemini API to explore its embodied reasoning capabilities. It’s a direct way to experiment with planning and spatial understanding. (developers.googleblog.com) -
Start with limited, simulated cases before moving to real hardware: integrate the reasoning layer with your control stack and add explicit safety validations.
-
Design practical success metrics: pointing accuracy, robustness in multi-step sequences, and the ability to interrupt or correct plans when something goes wrong.
Final reflection
This announcement brings text-and-image AI closer to the physical world than many expect. Does that mean robots will do everything for us tomorrow? No.
It does mean that today we have models that understand space better, plan more clearly, and can be integrated into software and hardware chains to solve real tasks. If you’re building with robots, it’s worth seeing how these pieces change what’s possible—always keeping safety and human control as top priorities.
Main source: DeepMind article and developer technical note. More details in DeepMind’s original blog and the developer guide on the Google Developers Blog. (deepmind.google)
Useful links: