GR00T N1.7 is NVIDIA's new open base with a commercial license for humanoid robots. It's a 3-billion-parameter Vision-Language-Action (VLA) model designed to translate images and natural language into continuous robot actions, focused on multi-step tasks and finger-level dextrous manipulation. What does that mean in practice? That you can take it to the production floor, an assembly bench, or a lab and expect more reliable behavior in complex workflows.
What is GR00T N1.7 and why it matters
- Open-source model with a commercial license, available on Hugging Face and GitHub.
- It was pretrained with the EgoScale collection: 20,854 hours of human egocentric video, which greatly expands manipulation data compared to prior versions.
- Result: better out-of-the-box dexterity and less need for massive teleoperation to teach robot behaviors.
The intuitive idea? Humans and robots share an interaction geometry: two hands, first-person view, and objects to manipulate. Training on sensorized human video gives manipulation priors that scale without having to demonstrate everything on every physical robot.
Architecture: Action Cascade — two systems that complement each other
GR00T uses an architecture called Action Cascade, which separates high-level reasoning and fine motor control into two systems:
-
System 2 - Vision-Language Model (VLM): a backbone Cosmos-Reason2-2B that processes image tokens and the natural-language instruction. Here tasks are decomposed and multi-step reasoning happens. Think of this system as the planner.
-
System 1 - Diffusion Transformer: a DiT with 32 layers that takes the VLM output plus the robot's proprioceptive state and applies a denoising process to generate continuous motor commands in real time. This is the fine executor, responsible for accuracy in multi-DoF movements.
Inputs: RGB frames (any resolution) + language instruction + proprioceptive state (joint positions, velocities, end-effector poses).
Outputs: continuous-valued action vectors mapped to the robot's degrees of freedom.
Data and the first law of dexterity scaling
The core work behind N1.7 is EgoScale: training on 20k+ hours of human egocentric video across 20+ task categories. The key contribution is the first law of dexterity scaling for robots: more human egocentric data predictably improves fine-manipulation ability.
- Moving from 1k to 20k hours more than doubles the average task completion rate on the evaluated benchmarks.
- This lets 22-DoF hands perform rich-contact tasks like small-part assembly or handling fragile objects.
In short: feeding the model lots of sensorized human video provides motor priors that previously required massive teleoperation on robots.
Capabilities and real-hardware validation
GR00T N1.7 was validated on loco-manipulation, tabletop manipulation, and bimanual dexterous tasks on platforms like Unitree G1, Bimanual Manipulator YAM, and AGIBot Genie 1.
- Improved reasoning about subtasks and multi-step execution.
- Finger-level manipulation for rich-contact tasks.
- Supports inference with few denoising steps for reasonable latency in control loops.
How to try it and adapt it to your robot
You can install and run a policy server from the official repo. A minimal flow:
git clone --recurse-submodules https://github.com/NVIDIA/Isaac-GR00T
cd Isaac-GR00T
bash scripts/deployment/dgpu/install_deps.sh
source .venv/bin/activate
uv run python gr00t/eval/run_gr00t_server.py \
--embodiment-tag GR1 \
--model-path nvidia/GR00T-N1.7
Example query from your environment loop in Python:
from gr00t.policy.server_client import PolicyClient
policy = PolicyClient(host="localhost", port=5555)
obs, info = env.reset()
action, info = policy.get_action(obs)
obs, reward, done, truncated, info = env.step(action)
For fine-tuning on your own platform use the LeRobot format and the provided script. Example launch:
CUDA_VISIBLE_DEVICES=0 uv run python gr00t/experiment/launch_finetune.py \
--base-model-path nvidia/GR00T-N1.7 \
--dataset-path <YOUR_DATASET_PATH> \
--embodiment-tag <YOUR_EMBODIMENT> \
--modality-config-path <YOUR_MODALITY_CONFIG> \
--num-gpus 1 \
--output-dir <OUTPUT_PATH> \
--max-steps 2000 \
--global-batch-size 32
Updates from N1.6: it's a direct swap. Change --model-path to nvidia/GR00T-N1.7 and your embodiment configs should work the same, with improvements thanks to the new VLM backbone and EgoScale.
License, support and hardware
- GR00T N1.7 has a commercial license, allowing production deployments.
- Supported on NVIDIA Ampere, Hopper, Lovelace, Blackwell and Jetson platforms.
- Repository and model:
- Model on Hugging Face: nvidia/GR00T-N1.7
- Code and docs: github.com/NVIDIA/Isaac-GR00T
- Developer portal: developer.nvidia.com/isaac/gr0ot
If you build something with GR00T N1.7, NVIDIA invites you to share it with the community.
This release isn't just a model tweak. It's a scaling shift in how we transfer human skills to robots: more human egocentric data, a clear split between planning and execution, and tools ready for real-world production. Ready to bring dextrous manipulation to your robot?
