Healthcare robotics can no longer be just vision and labels. Open-H-Embodiment presents the first large open dataset designed for robots to do, not just look: it includes robot bodies, synchronized vision-force-kinematic data, sim-to-real pairs, and cross-embodiment benchmarks.
Why does this matter to you? Because operating on soft tissue, suturing, or handling an ultrasound requires closed-loop control, contact dynamics, and long-horizon reasoning — not just image segmentation.
What is Open-H-Embodiment and what it contains
Open-H-Embodiment is a community initiative with 35 organizations (including Johns Hopkins, Technical University of Munich, NVIDIA, Stanford, and several hospitals and surgical companies) that gathered data to train and evaluate models for physical autonomy in surgery and ultrasound.
Volume: 778 hours of data under CC-BY-4.0 license.
Coverage: simulation, bench exercises (for example suturing), and real clinical procedures.
Robots: combines commercial platforms (CMR Surgical, Rob Surgical, Tuodao) and research platforms (dVRK, Franka, Kuka).
Goal: create data that contains multiple embodiments, contact dynamics, and closed traces to facilitate sim-to-real and shared benchmarks.
Data useful for closed-loop control and policy learning: we're not just talking about seeing a scene anymore; we're talking about learning to interact with tissue and tools in real loops.
Models and technical advances
The release includes two open-source post-trained models on the dataset: GR00T-H and Cosmos-H-Surgical-Simulator. Both are technical projects aiming to close the gap between simulation and reality.
GR00T-H: policy for surgical tasks
GR00T-H stems from the Isaac GR00T N family of Vision-Language-Action (VLA) models. It was trained with roughly 600 hours of the dataset and uses Cosmos Reason 2 2B as its VLM backbone.
Key designs to handle diverse embodiments and specialized hardware:
Unique Embodiment Projectors: each robot has a trainable MLP that maps its kinematics into a shared normalized action space.
State Dropout (100%): during inference the proprioceptive input is dropped to force the model to learn a system bias, improving real-world robustness.
Relative EEF Actions: relative end-effector actions to work around kinematic inconsistencies between platforms.
Metadata in prompts: instrument names and control-index mappings are injected into the VLM prompt to contextualize the task.
A prototype managed to execute a full suture on the SutureBot benchmark — a clear example of long-horizon skill.
Cosmos-H-Surgical-Simulator: WFM as a physical simulator
Cosmos-H-Surgical-Simulator is a World Foundation Model (WFM) fine-tuned from Cosmos Predict 2.5 2B. Its goal is to generate physically plausible surgical video conditioned on kinematic actions.
Sim-to-real: it learns tissue deformation, tool interaction, and complex phenomena (reflections, blood, smoke) directly from data.
Efficiency: 600 rollouts in 40 minutes on the model versus roughly 2 days with real bench experiments.
Use: generation of synthetic video-action pairs to balance and augment underrepresented data.
Training technique: fine-tuned on 9 embodiments and 32 datasets from Open-H using 64 A100 GPUs for about 10,000 GPU-hours. The unified action space has 44 dimensions.
Why this changes the technical conversation
If you work in research or product, what can you actually do with this?
Cross-embodiment benchmarks make it easier to compare policies across robots without rebuilding the entire pipeline.
Embodiment projectors let you reuse a single policy on multiple platforms with light adaptations.
WFMs give you a practical path to generate physically realistic synthetic data that speeds up training and validation iterations.
Are there still challenges? Absolutely. You'll need intention data, task traces annotated with failures and outcomes to move from perceptual control to autonomy based on reasoning. That's the roadmap for version 2: long procedures, explanations, and adaptive plans.
How to get started and participate
The effort is community-driven and open. If you want to reproduce experiments, try GR00T-H, generate rollouts with Cosmos-H, or contribute annotated data, there are public repositories and resources tied to the project.
Visit the project repository to clone, download the dataset, and review training and evaluation scripts. The invitation is explicit: add data, annotations, and benchmarks to build a verifiable and reusable Physical AI base for healthcare.
Medical robotics is entering a new phase: it's not just about seeing and predicting anymore, it's about doing precisely, explaining, and adapting. If you work in surgical robotics, medical imaging, or physical simulation, this is a practical technical entry point to rethink how we train autonomy in fragile, high-stakes environments.