Google expands the File Search tool in the Gemini API so it’s no longer just text: it now processes images and text together, lets you add custom tags, and includes page citations so you can check exactly where each answer came from. The result? More useful, faster and verifiable RAG systems, both for prototypes and production.
What changes
The main novelty is that File Search becomes multimodal. That means, besides indexing text, it understands visual data thanks to the Gemini Embedding 2 model. Your app can search inside files by combining natural language descriptions with visual content.
You can also attach custom metadata to each file as key-value pairs. Think tags like department: Legal or status: Final. By filtering on those tags you cut down noise and make searches more precise and faster.
And to leave no doubt, File Search now includes page citations. When an answer comes from a large PDF, the system stores the page number so you can point the user exactly to the source of the information.
