Google's Gemini 2.0: new agentic AI with multimodality

Today Google DeepMind introduces Gemini 2.0, their bet on what they call the "agentic era": models that don't just understand, they can plan steps and execute actions under your supervision. The official note came out on December 11, 2024 and describes an experimental release already in the hands of developers and trusted testers. (deepmind.google)

Qué es Gemini 2.0 y por qué importa

This isn't just a more powerful version. Gemini 2.0 aims to change how we interact with AI: it doesn't only reply, it also acts through tools, natively generates audio and images, and handles long context and multimodality for complex tasks. What does that mean for you? Assistants that can carry out real workflows, not just offer suggestions. (deepmind.google)

Novedades clave

Gemini 2.0 Flash: the first experimental variant, optimized for low latency and performance. According to Google, it outperforms previous versions on benchmarks and is up to twice as fast in some cases. (deepmind.google)
Native multimodal output: the model can now generate images combined with text and multilingual audio via text-to-speech you can steer (steerable TTS). It also processes multimodal inputs like video, audio and images. (deepmind.google)
Integrated tool use: it can call Google Search, run code and use third-party functions natively, which makes it more practical for real applications. (deepmind.google)
Multimodal Live API: for real-time audio and video, streaming inputs and tool composition for interactive apps. (deepmind.google)

Prototipos y experimentos: cómo piensan usarlo

Google shares several experiments that show concrete possibilities:

Project Astra: an evolution of the universal assistant on Android, with better multilingual dialogue, use of Search, Lens and Maps, and improved memory (for example, 10 minutes of session memory). They're also testing the system on prototype glasses. (deepmind.google)
Project Mariner: an agent that acts inside the browser via an experimental extension. It can read pixels and web elements (forms, images, code) and complete tasks for you, asking for confirmation on sensitive actions. In tests it reached 83.5% on the WebVoyager benchmark. (deepmind.google)
Jules: an agent designed for developers that integrates flows in GitHub, able to plan and execute changes under supervision. Imagine asking it to fix a bug, propose and run a test, and deliver a preliminary PR. (deepmind.google)

Dónde y cómo probarlo

As of the announcement (December 11, 2024) Gemini 2.0 Flash is available experimentally to developers through the Gemini API, Google AI Studio and Vertex AI, with some capabilities (like TTS and native image generation) open only to early-access partners. In the Gemini app it appears as an experimental model in the selector and wider rollouts are planned for early 2025. You can see more on the Gemini site. (deepmind.google)

Seguridad, privacidad y riesgos

Google emphasizes that the advancement comes with safety processes: internal reviews with their Responsibility and Safety Committee, AI-assisted red teaming to generate assessments and training data, privacy controls in prototypes like Astra (session deletion) and mechanisms to mitigate prompt injection in Mariner. It's not a perfect solution; it's an iterative approach tested with trusted groups. (deepmind.google)

"The only way to build AI is to do it responsibly from the start," the note says, which is why they're evaluating specific mitigations for each prototype. (deepmind.google)

Ejemplos concretos de uso (para que lo imagines)

In complex searches: AI Overviews powered by Gemini 2.0 will handle multi-step questions, include advanced calculations and consult images or code in context. Result: fewer back-and-forths when researching a topic. (deepmind.google)
In the browser: want to buy a camera? The agent compares specs across tabs, fills the form and asks you to confirm the purchase at the last step. That's how Project Mariner works in tests, with current limits in speed and accuracy. (deepmind.google)
In development: Jules detects a CI failure, proposes a plan and creates a PR with changes and basic tests for you to review. You supervise and approve. (deepmind.google)

¿Y ahora qué? Una mirada práctica

This isn't instant magic, but it is a usability leap. If you're a developer, it's time to explore the API and think about how to integrate agents into real flows. If you use digital tools daily, get ready for interfaces that do more for you — as long as companies get security and transparency right.

Gemini 2.0 also relies on in-house infrastructure: Google used its sixth-generation TPUs, called Trillium, to train and run 100% of the model, which highlights the hardware investment behind these advances. (deepmind.google)

Curious or worried? That's normal. These technologies bring assistants that perform tasks for us, but the difference will be made by how safeguards are designed and how you choose to interact with them. If you want, I can summarize the implications for your situation — work, product or personal project — and give concrete steps to start testing Gemini 2.0 today.

Stay up to date!

Receive practical guides, fact-checks and AI analysis straight to your inbox, no technical jargon or fluff.