GR00T N1.7: VLA AI model for humanoid robots | Keryc
GR00T N1.7 is NVIDIA's new open base with a commercial license for humanoid robots. It's a 3-billion-parameter Vision-Language-Action (VLA) model designed to translate images and natural language into continuous robot actions, focused on multi-step tasks and finger-level dextrous manipulation. What does that mean in practice? That you can take it to the production floor, an assembly bench, or a lab and expect more reliable behavior in complex workflows.
What is GR00T N1.7 and why it matters
Open-source model with a commercial license, available on Hugging Face and GitHub.
It was pretrained with the EgoScale collection: 20,854 hours of human egocentric video, which greatly expands manipulation data compared to prior versions.
Result: better out-of-the-box dexterity and less need for massive teleoperation to teach robot behaviors.
The intuitive idea? Humans and robots share an interaction geometry: two hands, first-person view, and objects to manipulate. Training on sensorized human video gives manipulation priors that scale without having to demonstrate everything on every physical robot.
Architecture: Action Cascade — two systems that complement each other
GR00T uses an architecture called Action Cascade, which separates high-level reasoning and fine motor control into two systems:
System 2 - Vision-Language Model (VLM): a backbone Cosmos-Reason2-2B that processes image tokens and the natural-language instruction. Here tasks are decomposed and multi-step reasoning happens. Think of this system as the planner.
System 1 - Diffusion Transformer: a DiT with 32 layers that takes the VLM output plus the robot's proprioceptive state and applies a denoising process to generate continuous motor commands in real time. This is the fine executor, responsible for accuracy in multi-DoF movements.
Inputs: RGB frames (any resolution) + language instruction + proprioceptive state (joint positions, velocities, end-effector poses).
Outputs: continuous-valued action vectors mapped to the robot's degrees of freedom.
Data and the first law of dexterity scaling
The core work behind N1.7 is EgoScale: training on 20k+ hours of human egocentric video across 20+ task categories. The key contribution is the first law of dexterity scaling for robots: more human egocentric data predictably improves fine-manipulation ability.
Moving from 1k to 20k hours more than doubles the average task completion rate on the evaluated benchmarks.
This lets 22-DoF hands perform rich-contact tasks like small-part assembly or handling fragile objects.
In short: feeding the model lots of sensorized human video provides motor priors that previously required massive teleoperation on robots.
Capabilities and real-hardware validation
GR00T N1.7 was validated on loco-manipulation, tabletop manipulation, and bimanual dexterous tasks on platforms like Unitree G1, Bimanual Manipulator YAM, and AGIBot Genie 1.
Improved reasoning about subtasks and multi-step execution.
Finger-level manipulation for rich-contact tasks.
Supports inference with few denoising steps for reasonable latency in control loops.
How to try it and adapt it to your robot
You can install and run a policy server from the official repo. A minimal flow:
git clone --recurse-submodules https://github.com/NVIDIA/Isaac-GR00T
cd Isaac-GR00T
bash scripts/deployment/dgpu/install_deps.sh
source .venv/bin/activate
uv run python gr00t/eval/run_gr00t_server.py \
--embodiment-tag GR1 \
--model-path nvidia/GR00T-N1.7
Example query from your environment loop in Python:
from gr00t.policy.server_client import PolicyClient
policy = PolicyClient(host="localhost", port=5555)
obs, info = env.reset()
action, info = policy.get_action(obs)
obs, reward, done, truncated, info = env.step(action)
For fine-tuning on your own platform use the LeRobot format and the provided script. Example launch:
Updates from N1.6: it's a direct swap. Change --model-path to nvidia/GR00T-N1.7 and your embodiment configs should work the same, with improvements thanks to the new VLM backbone and EgoScale.
License, support and hardware
GR00T N1.7 has a commercial license, allowing production deployments.
Supported on NVIDIA Ampere, Hopper, Lovelace, Blackwell and Jetson platforms.
If you build something with GR00T N1.7, NVIDIA invites you to share it with the community.
This release isn't just a model tweak. It's a scaling shift in how we transfer human skills to robots: more human egocentric data, a clear split between planning and execution, and tools ready for real-world production. Ready to bring dextrous manipulation to your robot?