NVIDIA introduces Nemotron ColEmbed V2, a family of late-interaction models designed for search over complex visual documents. If you work with pages that mix text, tables, charts and images, this is for you: it improves accuracy when retrieving multimodal information in enterprise-style and RAG scenarios.
What is Nemotron ColEmbed V2
They are multivector embeddings models (late-interaction) available in 3B, 4B and 8B parameter sizes. Instead of a single vector per document, here every token in the document produces an embedding. During search, each token in the query compares its embedding to all token-embeddings in the document using the MaxSim operation, and the per-token maxima are summed to get the final score.
Why does that matter? Because it enables fine-grained matches: a table cell, the text inside a figure, or a small label can influence the result—things that get diluted when the whole document is reduced to a single vector.
