NVIDIA publishes a recipe to fine-tune embeddings in a day

You can turn a general embeddings model into one that truly understands your domain with a single GPU and less than a day of training. Sounds like magic? It isn’t: it’s a practical recipe that combines synthetic data generation, hard-negative mining, contrastive training and optimized deployment.

What this NVIDIA recipe offers

NVIDIA publishes a full pipeline (NeMo Data Designer, NeMo Automodel and Nemotron) that goes from raw documents to a production-ready embeddings service. Key points:

Automatic generation of (question, document) pairs using an LLM to create high-quality synthetic data.
Hard-negative mining to teach the model to distinguish confusing passages.
Support for multi-hop questions (1 to 3 hops) and unrolling for contrastive training.
Export to ONNX/TensorRT and deployment on NVIDIA NIM with an OpenAI-compatible API.

The result? In their tests they saw double-digit improvements in metrics like Recall@10 and nDCG@10. Atlassian applied the recipe to their Jira dataset and took Recall@60 from 0.751 to 0.951 using a single A100 80GB GPU.

Parameter	Default value	Notes
Epochs	3	For large datasets drop to 1 or 2
Learning rate	1e-5	Try 5e-6 or 2e-5 if needed
Warmup steps	5	5-10% of total steps works well
Global batch size	128	Scales automatically if your dataset is small
Passages per query	5	1 positive + 4 hard negatives
Temperature	0.02	Low temperature = very sharp distribution

What this NVIDIA recipe offers

Requirements and tools

Summary of the flow in 6 commands

How synthetic generation (SDG) works

Hard-negative mining: why it matters

Multi-hop and unrolling

Fine-tuning: architecture and recommended parameters

Evaluation with BEIR and expected results

Export, quantization and deployment

Practical tips and common issues

Final thought

Original source

Stay up to date!

NVIDIA publishes a recipe to fine-tune embeddings in a day