Do you have stacks of PDFs, reports with charts, scanned contracts or presentations and wonder why search systems keep failing? Not magic: many systems only search text and lose the visual information and layout. NVIDIA introduces two small, practical Nemotron models that improve accuracy and reduce latency in multimodal searches over visual documents.
Qué lanzó NVIDIA y por qué importa
NVIDIA publishes two models designed for multimodal Retrieval-Augmented Generation (RAG) that work with standard vector DBs and are small enough for common GPUs:
llama-nemotron-embed-vl-1b-v2: dense embedding of image + text per page (single-vector), 2048 dimensions, built for page-level search with millisecond latency.llama-nemotron-rerank-vl-1b-v2: cross-encoder reranker that reorders the top-k candidates to improve relevance before passing context to a VLM.
Why does this change practice? Because multimodal embeddings decide which pages reach the language model, and the reranker decides which pages actually influence the answer. If either step fails, the VLM can confidently invent. Using image+text embeddings plus a multimodal reranker reduces those hallucinations without inflating prompts.
