GR00T N1.7: VLA AI model for humanoid robots

Apr 17, 2026Keryc Díaz3 minutes

GR00T N1.7 is NVIDIA's new open base with a commercial license for humanoid robots. It's a 3-billion-parameter Vision-Language-Action (VLA) model designed to translate images and natural language into continuous robot actions, focused on multi-step tasks and finger-level dextrous manipulation. What does that mean in practice? That you can take it to the production floor, an assembly bench, or a lab and expect more reliable behavior in complex workflows.

What is GR00T N1.7 and why it matters

Open-source model with a commercial license, available on Hugging Face and GitHub.
It was pretrained with the EgoScale collection: 20,854 hours of human egocentric video, which greatly expands manipulation data compared to prior versions.
Result: better out-of-the-box dexterity and less need for massive teleoperation to teach robot behaviors.

The intuitive idea? Humans and robots share an interaction geometry: two hands, first-person view, and objects to manipulate. Training on sensorized human video gives manipulation priors that scale without having to demonstrate everything on every physical robot.

Architecture: Action Cascade — two systems that complement each other

GR00T uses an architecture called Action Cascade, which separates high-level reasoning and fine motor control into two systems:

System 2 - Vision-Language Model (VLM): a backbone Cosmos-Reason2-2B that processes image tokens and the natural-language instruction. Here tasks are decomposed and multi-step reasoning happens. Think of this system as the planner.
System 1 - Diffusion Transformer: a DiT with 32 layers that takes the VLM output plus the robot's proprioceptive state and applies a denoising process to generate continuous motor commands in real time. This is the fine executor, responsible for accuracy in multi-DoF movements.

Inputs: RGB frames (any resolution) + language instruction + proprioceptive state (joint positions, velocities, end-effector poses).

Outputs: continuous-valued action vectors mapped to the robot's degrees of freedom.

Data and the first law of dexterity scaling

The core work behind N1.7 is EgoScale: training on 20k+ hours of human egocentric video across 20+ task categories. The key contribution is the first law of dexterity scaling for robots: more human egocentric data predictably improves fine-manipulation ability.

Moving from 1k to 20k hours more than doubles the average task completion rate on the evaluated benchmarks.
This lets 22-DoF hands perform rich-contact tasks like small-part assembly or handling fragile objects.

In short: feeding the model lots of sensorized human video provides motor priors that previously required massive teleoperation on robots.

Capabilities and real-hardware validation

GR00T N1.7 was validated on loco-manipulation, tabletop manipulation, and bimanual dexterous tasks on platforms like Unitree G1, Bimanual Manipulator YAM, and AGIBot Genie 1.

Improved reasoning about subtasks and multi-step execution.
Finger-level manipulation for rich-contact tasks.
Supports inference with few denoising steps for reasonable latency in control loops.

How to try it and adapt it to your robot

You can install and run a policy server from the official repo. A minimal flow:

git clone --recurse-submodules https://github.com/NVIDIA/Isaac-GR00T
cd Isaac-GR00T
bash scripts/deployment/dgpu/install_deps.sh
source .venv/bin/activate
uv run python gr00t/eval/run_gr00t_server.py \
  --embodiment-tag GR1 \
  --model-path nvidia/GR00T-N1.7

Example query from your environment loop in Python:

from gr00t.policy.server_client import PolicyClient
policy = PolicyClient(host="localhost", port=5555)
obs, info = env.reset()
action, info = policy.get_action(obs)
obs, reward, done, truncated, info = env.step(action)

For fine-tuning on your own platform use the LeRobot format and the provided script. Example launch:

CUDA_VISIBLE_DEVICES=0 uv run python gr00t/experiment/launch_finetune.py \
  --base-model-path nvidia/GR00T-N1.7 \
  --dataset-path <YOUR_DATASET_PATH> \
  --embodiment-tag <YOUR_EMBODIMENT> \
  --modality-config-path <YOUR_MODALITY_CONFIG> \
  --num-gpus 1 \
  --output-dir <OUTPUT_PATH> \
  --max-steps 2000 \
  --global-batch-size 32

Updates from N1.6: it's a direct swap. Change --model-path to nvidia/GR00T-N1.7 and your embodiment configs should work the same, with improvements thanks to the new VLM backbone and EgoScale.

License, support and hardware

GR00T N1.7 has a commercial license, allowing production deployments.
Supported on NVIDIA Ampere, Hopper, Lovelace, Blackwell and Jetson platforms.
Repository and model:
- Model on Hugging Face: nvidia/GR00T-N1.7
- Code and docs: github.com/NVIDIA/Isaac-GR00T
- Developer portal: developer.nvidia.com/isaac/gr0ot

If you build something with GR00T N1.7, NVIDIA invites you to share it with the community.

This release isn't just a model tweak. It's a scaling shift in how we transfer human skills to robots: more human egocentric data, a clear split between planning and execution, and tools ready for real-world production. Ready to bring dextrous manipulation to your robot?

Original source

https://huggingface.co/blog/nvidia/gr00t-n1-7

Stay up to date!

Get AI news, tool launches, and innovative products straight to your inbox. Everything clear and useful.