Ai Engineering 9 min read

How to Get Started with Open-H, GR00T-H, and Cosmos-H for Healthcare Robotics Research

Learn how to use NVIDIA's new Open-H dataset and GR00T-H and Cosmos-H models to build and evaluate healthcare robotics systems.

NVIDIA and collaborators have released a usable open stack for healthcare robotics research: Open-H-Embodiment, GR00T-H, and Cosmos-H-Surgical-Simulator. You can use it to standardize data collection, study cross-embodiment policy learning, and build simulation workflows for surgical robotics. The official announcement and overview is the best starting point, and the rest of this guide focuses on how to work with the release in practice.

What each release component is for

The March 16, 2026 release includes three separate pieces, and they fit together cleanly.

ComponentWhat it isPrimary use
Open-H-EmbodimentOpen healthcare robotics datasetTraining and evaluation data
GR00T-H3B-parameter healthcare robot policy modelPolicy learning and imitation learning research
Cosmos-H-Surgical-SimulatorAction-conditioned surgical world modelSimulation, synthetic rollouts, and policy evaluation

According to the release, Open-H-Embodiment contains 778 hours of CC-BY-4.0 robotics training data across simulation, benchtop exercises, and real clinical procedures, spanning surgical robotics, ultrasound, and colonoscopy autonomy, with contributions from 35 organizations. That makes it useful both as a dataset and as a shared data standard.

If your team is already working on embodied AI pipelines, the main shift here is domain specificity. This is not a general robotics corpus adapted to healthcare. It is a healthcare robotics stack with dataset formatting, model training, and simulation all aligned around surgical and imaging workflows. That domain grounding matters in the same way domain grounding matters for text systems, which is also the core idea behind Fine-Tuning vs RAG: When to Use Each Approach.

Installation and setup

The Open-H contribution guide specifies the expected dataset tooling:

  • LeRobot dataset format v2.1
  • LeRobot package v0.3.3
  • Suggested sampling rate of at least 20 Hz
  • Suggested image resolution of at least 480p

The release positions the Hugging Face dataset page as the primary location for Open-H-Embodiment v1, and the GitHub repository documents the data format and contribution workflow:

The documentation does not provide a verified install command in the research provided here, so use the package and version requirements above when setting up your environment, and refer to the official repository for implementation details.

Start with the dataset, not the models

The practical entry point is Open-H-Embodiment v1. It gives you the schema, naming conventions, and metadata expectations that the released models were built around.

The contribution guide standardizes these core fields:

  • action
  • observation.state
  • observation.images.xxx

It also recommends healthcare-specific metadata, including:

  • surgical tool identity
  • ultrasound parameters

That standardization is the real unlock. If you want to compare policies across robot platforms, or eventually contribute data back to the ecosystem, matching the published format matters more than trying to optimize custom schemas up front. The same principle shows up in prompt and context-heavy AI systems, where consistency usually beats cleverness, which is why Context Engineering: The Most Important AI Skill in 2026 is relevant even outside text-only applications.

How to structure your data for Open-H

Open-H is explicitly designed for cross-embodiment training and evaluation. That means your data should preserve both common structure and robot-specific context.

Based on the Open-H guide, your collection pipeline should capture:

Field categoryRequired or recommended details
ActionsStandardized action field
Robot stateobservation.state
Imagesobservation.images.xxx
Sampling≥20 Hz suggested
Image quality≥480p suggested
MetadataEmbodiment info, surgical tool identity, ultrasound parameters where relevant

The release spans commercial systems such as CMR Surgical, Rob Surgical, and Tuodao, plus research systems like dVRK, Franka, and Kuka. If your lab uses one of those platforms, aligning your collection format with Open-H should be straightforward.

If your robot is outside that set, the same schema is still useful. The dataset is presented as a multi-institution standardization effort, not a closed benchmark tied to a single embodiment.

Using GR00T-H for policy research

GR00T-H is the policy model in the stack. The model card describes it as:

  • 3B parameters
  • Based on GR00T N1.6
  • Trained on a 601.50-hour subset of Open-H
  • Built from 58 datasets
  • Trained with a 98% / 2% train-validation split
  • Covers 7 robotic embodiments

The seven listed embodiments are:

  • CMR Versius
  • dVRK
  • dVRK-Si
  • UR5
  • Rob Surgical Bitrack
  • Tuodao MA2000
  • KUKA

Architecturally, GR00T-H combines:

  • vision and text transformers for observations and instructions
  • embodiment-indexed MLPs for proprioception and action handling
  • a flow-matching transformer for action generation

That combination makes the model most useful for researchers studying shared policy learning across hardware platforms. If your current system uses per-robot imitation policies, GR00T-H is a concrete example of a cross-embodiment approach.

There is one important documentation caveat. The blog states GR00T-H uses a common relative end-effector action space, while the model card says the released checkpoint standardizes to absolute end-effector positioning. For implementation decisions, treat the model card as the stronger release-specific source.

The model card also states the intended scope clearly: R&D only, not clinical deployment and not medical decision-making.

What to pay attention to in GR00T-H’s design

Several release details are useful if you want to reproduce the same training assumptions in your own experiments.

The announcement highlights these choices:

  • unique embodiment projectors
  • 100% state dropout during inference
  • relative end-effector action training
  • injection of instrument/control metadata into prompts

Even without a published implementation snippet in the research, these choices tell you how the training pipeline was shaped.

The embodiment projectors suggest that robot-specific adaptation was handled explicitly rather than hidden inside one universal state encoder. The metadata injection means prompts are carrying control context, not just natural language task descriptions. That is close in spirit to structured conditioning in LLM systems, where consistent inputs improve reliability, similar to what developers already apply with Structured Output from LLMs: JSON Mode Explained.

Using Cosmos-H-Surgical-Simulator for rollout generation

Cosmos-H-Surgical-Simulator is the simulation component. It was fine-tuned from NVIDIA Cosmos Predict 2.5 2B, and the model card specifies:

  • Based on Cosmos-Predict2.5-2B-Video2World
  • Accepts a 44-dimensional action vector
  • That vector is split as 22 dimensions per arm
  • Conditions on the current frame
  • Predicts the next 12 frames
  • Supports autoregressive rollout of full trajectories

The release article says it can generate physically plausible surgical video directly from kinematic actions, and reports 40 minutes for 600 rollouts compared with 2 days using real benchtop methods.

The training setup is also unusually specific:

  • trained on 9 robot embodiments
  • trained from 32 datasets in Open-H
  • used 64 A100 GPUs
  • for roughly 10,000 GPU-hours
  • with a unified 44D action space

Those numbers matter because they set expectations. This is an open research model, but it was not trained on commodity infrastructure.

Where Cosmos-H fits in a workflow

The simulator is best used as a world-model layer for evaluation and synthetic trajectory generation.

The model card names these supported procedure and task areas:

Platform or contextExample procedures or tasks
CMR Surgical Versius clinical procedurescholecystectomy, prostatectomy, inguinal hernia, hysterectomy
dVRK and MITICsuturing, tissue manipulation, peg transfer

A practical workflow looks like this:

  1. Collect or normalize trajectories in Open-H / LeRobot v2.1 format.
  2. Train or adapt policies against that standardized action and observation structure.
  3. Use Cosmos-H-Surgical-Simulator to generate action-conditioned rollouts for evaluation or synthetic augmentation.
  4. Benchmark long-horizon dexterity on known tasks such as suturing.

The release links that last point to SutureBot, a NeurIPS 2025 autonomous suturing benchmark with 1,890 demonstrations and evaluation on dVRK Si. If your work focuses on imitation quality, action sequencing, or insertion-point accuracy, a benchmarked task like that is a better first target than an open-ended “general surgical autonomy” claim.

Tradeoffs and limitations

This stack is useful immediately for research, but the constraints are important.

LimitationWhy it matters
R&D onlyGR00T-H is not for clinical deployment or medical decision-making
Early releaseAdoption is still early, so tooling and community examples are limited
Data format mattersOpen-H expects LeRobot v2.1 and associated field conventions
Compute requirements are highCosmos-H training used 64 A100s and ~10,000 GPU-hours
Documentation inconsistencyGR00T-H action space details differ between the announcement and model card

The dataset is also specialized. That is its strength, but it means you should not expect the breadth of large general robotics corpora. The release itself notes that Open-H’s 778 hours are smaller than some broad robot pretraining datasets, while being much more clinically relevant.

When to use this stack

Use Open-H + GR00T-H + Cosmos-H if your work sits in one of these categories:

  • healthcare robot imitation learning
  • cross-embodiment policy transfer
  • surgical simulation and synthetic data generation
  • policy evaluation for long-horizon manipulation
  • data standardization across institutions or labs

If your project is still at the stage of defining data interfaces and evaluation targets, prioritize Open-H first. If your project already has trajectories and an evaluation loop, the simulator may give you faster iteration on rollout testing. If you are building higher-level orchestration around these systems, the same planning concerns show up in agentic software stacks, which is where Multi-Agent Systems Explained: When One Agent Isn’t Enough becomes useful.

Start by aligning one existing dataset or collection pipeline to LeRobot v2.1 with the Open-H field conventions, then compare your current policy training setup against the released GR00T-H scope and action-space assumptions. After that, add Cosmos-H-Surgical-Simulator for offline rollouts on a narrow task such as suturing or peg transfer, and measure whether the simulated evaluations reduce benchtop iteration time enough to justify integrating it into your research loop.

Get Insanely Good at AI

Get Insanely Good at AI

The book for developers who want to understand how AI actually works. LLMs, prompt engineering, RAG, AI agents, and production systems.

Keep Reading