Google DeepMind presents a version of Gemini that not only replies with text, but can interact with interfaces like you do: clicking, typing into forms, and navigating pages to complete tasks. Sounds like a real automated assistant? Exactly — but with limits and safety measures built in from the start. (blog.google)
What DeepMind announced
The bet is called Gemini 2.5 Computer Use, a specialized variant built on top of Gemini 2.5 Pro. It's available in preview through the Gemini API and Google developer environments like Google AI Studio and Vertex AI. This means companies and developers can start building agents that control browsers and mobile apps aimed at UI. (blog.google)
How it works in plain words
The key piece is the new computer_use
tool inside the API. The flow is a loop: the model gets your instruction, a screenshot of the environment and the recent action history; it decides an action (for example click, type or drag) and returns it as a function-call style response. Then the application executes that action, sends back a new capture, and the model continues until the task is done or it stops for safety or because the user decided so. (blog.google)
And what is that loop good for in real life? For everything that doesn't have a clean API: filling forms on sites behind a login, organizing items in web apps, automated UI testing and more. DeepMind showed demos ranging from entering data in a CRM to moving notes on a virtual whiteboard. (blog.google)
Concrete use cases
- UI testing: automate tests that used to need very specific scripts or human intervention.
- Personal assistants: ask your agent to manage bookings or fill forms for you, while respecting confirmations for sensitive actions.
- Workflow automation: migrate data between tools that don't have official integrations.
Companies like Browserbase already offer demo environments to see the model in action, and Google says internal teams are already using versions of this model in projects like Project Mariner and testing agents. (blog.google)
Performance and comparisons
According to evaluations published by Google and third-party tests, Gemini 2.5 Computer Use outperforms leading alternatives on several web and mobile control benchmarks, and it also offers low latency when executing actions. That positions it as a strong option for tasks that require visual interaction with interfaces. Keep in mind the information comes from tests Google and partners disclosed publicly. (blog.google)
Safety, limits and practical recommendations
Controlling a browser exposes unique risks: abuse attempts by malicious users, prompt injections via web content, and unexpected errors when interacting with interfaces designed for humans. Google built-in safeguards like a per-step evaluation service that reviews each proposed action and system instructions
to enforce confirmations on high-risk actions. Also, the model isn't meant for OS-level control; it's optimized for browsers and, to a lesser extent, mobile UIs. (blog.google)
The key recommendations for developers: test exhaustively in controlled environments, require confirmations for sensitive actions, and use the API's security controls.
How you can get started today
If you're a developer, the entry point is the public preview in the Gemini API via Google AI Studio or Vertex AI. You can also try demos in environments like Browserbase to better understand capabilities and limits before integrating the model into production. Google publishes docs and guides to build the interaction loop with Playwright
or on virtual machines for testing. (blog.google)
A practical and responsible look
Can you imagine asking an agent to fill out applications, organize a team whiteboard, or run complex tests while you focus on business decisions? It's plausible today. At the same time, this demands responsibility: testing, supervision and designing guardrails to avoid dangerous automations.
Technology moves fast, but you decide how to use it. If you build something with this, start in a closed environment, document the flows, and from day one design how the agent asks for confirmations when it could affect sensitive data or money.