On June 17, 2025 Google published an important update about the Gemini 2.5 family: new versions, price adjustments and transition dates you should know if you use these models in production.
What Google announced about Gemini 2.5
Google officially presents changes to the Gemini 2.5 family, including general availability of Gemini 2.5 Pro and Flash, and the preview launch of Gemini 2.5 Flash‑Lite. The goal is to give you faster and cheaper options for different use cases. (deepmind.google)
Main updates that matter
-
Gemini 2.5 Flash‑Lite arrives in preview as the lowest-latency, lowest-cost option in the family. It's designed for high-volume tasks like classification or large-scale summaries. By default the 'thinking' capability (internal reasoning) is turned off to prioritize speed and cost. (deepmind.google)
-
Gemini 2.5 Flash is being stabilized and Google updates the pricing: the cost for 1M input tokens rises to 0.30 USD, the cost for 1M output tokens drops to 2.50 USD, and they remove the price difference between modes with and without 'thinking'. This simplifies billing for developers. (deepmind.google)
-
Gemini 2.5 Pro is declared stable and remains the choice for tasks that need stronger reasoning, like coding or decision-making agents. Google notes very high demand for Pro and keeps it at the same price point. (deepmind.google)
Operational and deprecation dates
If you're using preview endpoints you need to plan your migration: the preview version of Gemini 2.5 Flash with the old prices will be kept only until July 15, 2025, when that endpoint will be shut down. There are also windows for Pro previews that require migration before the listed dates. Check your model strings and update to gemini-2.5-flash
or gemini-2.5-pro
as appropriate. (deepmind.google)
What this means for your project
Do you need fast, cheap responses or do you prefer stronger reasoning? Now you have clearer options:
- For throughput and low cost choose Flash‑Lite and enable 'thinking' only when necessary.
- For complex programming tasks, agents or deep analysis, Pro remains the safest bet.
Concrete example: if you build a daily-summary service for thousands of users, Flash‑Lite can cut latency and your bill. If you build an assistant that writes code or coordinates multi-step actions, Pro is the safer pick.
Practical recommendations
- Review API calls and replace old model identifiers with
gemini-2.5-flash
,gemini-2.5-flash-lite
(preview) orgemini-2.5-pro
depending on your use case. (deepmind.google) - Test 'thinking' parameters in staging to measure cost vs. benefit. Flash‑Lite has thinking off by default, but you can enable it if you need more accuracy.
- Monitor token usage and latency after migrating to adjust budget and settings.
Where to read the official note
You can read the official post for technical details, images and migration guides in Google's publication. Official Gemini 2.5 post. (deepmind.google)
Final thought
These updates show a practical evolution: more cost and latency options for different usage profiles, with Pro keeping the lead for complex tasks. If you develop with Gemini, try the Flash‑Lite preview to evaluate cost savings, and plan migrations according to deprecation dates to avoid interruptions.