Google launches Gemini 3 Flash: fast, affordable AI | Keryc
Google introduces Gemini 3 Flash, a version of the Gemini 3 family designed to deliver cutting‑edge intelligence with much more speed and at lower cost. The idea is simple: bring Gemini 3’s multimodal reasoning to more people and more use cases where latency and price matter as much as accuracy. Today the global rollout begins across apps, Search, API and enterprise platforms.
What Gemini 3 Flash offers
Gemini 3 Flash keeps the advanced reasoning core of Gemini 3 (complex reasoning, multimodal understanding and agentic capabilities) but optimizes for speed and efficiency. In key numbers, it hits competitive scores on demanding benchmarks like GPQA Diamond (90.4%) and MMMU Pro (81.2%), and rivals larger models on PhD‑level tasks.
On top of that, it’s built to be parsimonious: it uses on average 30% fewer tokens than Gemini 2.5 Pro in typical traffic, which lowers costs. According to Google, it’s 3x faster than 2.5 Pro and offers an optimized quality/cost ratio — perfect when you need quick answers without giving up reasoning.
Gemini 3 Flash pushes the boundary between speed, cost and quality.
What is a "token" and why does it matter?
A token is a unit of text the model processes. Fewer tokens means lower cost and faster responses, as long as quality holds. Gemini 3 Flash aims for that balance: think well and fast without overspending.
For developers: performance that keeps up with fast cycles
If you build things, this will catch your attention. Gemini 3 Flash is aimed at iterative flows: low latency and strong performance on code and agent tasks. In the SWE‑bench Verified test it reaches 78%, outperforming not only the 2.5 series but also Gemini 3 Pro on that benchmark.
That makes it useful for:
Programming assistants and agents that run tasks in real time.
Video analysis, data extraction and complex visual question answering.
Interactive applications (for example, assistants inside games or A/B experiments that require fast responses).
Tools and access for developers include the Gemini API in Google AI Studio, Gemini CLI, Google Antigravity (agentic platform) and support in Android Studio.
For everyone: integrated in the app and in Search AI Mode
Gemini 3 Flash will be the default model in the Gemini app, replacing 2.5 Flash, and is starting to roll out to AI Mode in Search. That means free access for everyday users, with faster multimodal abilities: read videos and images, summarize content and turn it into actionable steps in seconds.
Want to dictate ideas and have Gemini turn them into simple prototypes or app features without knowing how to code? You’ll be able to. It’s a clear example of AI becoming practical for daily tasks, not just for specialists.
Enterprises and pricing
Large companies can access Gemini 3 Flash via Vertex AI and Gemini Enterprise. Google shows early customers like JetBrains, Bridgewater Associates and Figma already using it to reduce latency and cost without losing reasoning power.
Advertised pricing:
$0.50 per 1M input tokens
$3 per 1M output tokens
Audio input: $1 per 1M input tokens
These numbers make Gemini 3 Flash attractive when speed and cost matter, for example in products with high query frequency.
Brief reflection
It’s not just about speed: it’s about bringing frontier reasoning to situations where latency and budget determine whether an idea can reach production. Gemini 3 Flash targets that: faster answers, lower spend and powerful reasoning. If you’re a developer, product maker or curious user, it’s a good time to try how AI can fit into real workflows without expecting miracles or complex infrastructure.