Fine-tuning Cosmos Predict 2.5 with LoRA/DoRA for Robot Videos

NVIDIA publishes a technical guide to adapt Cosmos Predict 2.5 to concrete robotics tasks, showing how to use LoRA and DoRA to generate synthetic robot trajectories without retraining the whole model. What's the goal? Create physically plausible videos conditioned on text and images, and use them as scalable data to train robot policies.

Qué anuncia NVIDIA

Cosmos Predict 2.5 is a large-scale world model that generates physically consistent videos conditioned on text, images, or clips. NVIDIA shows a parametric-efficient fine-tuning pipeline using LoRA and DoRA to adapt the model to specific domains (for example, robot manipulation or particular camera views).

The practical novelty: instead of retraining the model's 2B parameters (expensive and prone to forgetting general knowledge), you inject small, portable adapters that let you train on a single powerful GPU and then swap adapters per task.

Por qué esto es útil para robótica

Collecting real trajectories is slow and costly. What if you could generate thousands of synthetic trajectories that are physically plausible and specific to your camera or robot setup? That speeds up policy iteration.

Qué anuncia NVIDIA

Por qué esto es útil para robótica

Cómo implementan LoRA y DoRA en Cosmos Predict 2.5

Datos y pipeline de entrenamiento

Algoritmo de entrenamiento y pérdida

Comandos, hardware y checkpoints

Evaluación: métricas y juez LLM

Resultados y lecciones prácticas

Conclusión

Original source

Stay up to date!

Fine-tuning Cosmos Predict 2.5 with LoRA/DoRA for Robot Videos