How to Get Started with Open-H, GR00T-H, and Cosmos-H for Healthcare Robotics Research

NVIDIA and collaborators have released a usable open stack for healthcare robotics research: Open-H-Embodiment, GR00T-H, and Cosmos-H-Surgical-Simulator. You can use it to standardize data collection, study cross-embodiment policy learning, and build simulation workflows for surgical robotics. The official announcement and overview is the best starting point, and the rest of this guide focuses on how to work with the release in practice.

What each release component is for

The March 16, 2026 release includes three separate pieces, and they fit together cleanly.

Component	What it is	Primary use
Open-H-Embodiment	Open healthcare robotics dataset	Training and evaluation data
GR00T-H	3B-parameter healthcare robot policy model	Policy learning and imitation learning research
Cosmos-H-Surgical-Simulator	Action-conditioned surgical world model	Simulation, synthetic rollouts, and policy evaluation

According to the release, Open-H-Embodiment contains 778 hours of CC-BY-4.0 robotics training data across simulation, benchtop exercises, and real clinical procedures, spanning surgical robotics, ultrasound, and colonoscopy autonomy, with contributions from 35 organizations. That makes it useful both as a dataset and as a shared data standard.

If your team is already working on embodied AI pipelines, the main shift here is domain specificity. This is not a general robotics corpus adapted to healthcare. It is a healthcare robotics stack with dataset formatting, model training, and simulation all aligned around surgical and imaging workflows. That domain grounding matters in the same way domain grounding matters for text systems, which is also the core idea behind Fine-Tuning vs RAG: When to Use Each Approach.

Installation and setup

The Open-H contribution guide specifies the expected dataset tooling:

LeRobot dataset format v2.1
LeRobot package v0.3.3
Suggested sampling rate of at least 20 Hz
Suggested image resolution of at least 480p

The release positions the Hugging Face dataset page as the primary location for Open-H-Embodiment v1, and the GitHub repository documents the data format and contribution workflow:

Open-H announcement: https://huggingface.co/blog/nvidia/physical-ai-for-healthcare-robotics
Open-H repository: https://github.com/open-h/data-collection
GR00T-H model card: https://huggingface.co/nvidia/GR00T-H
Cosmos-H-Surgical-Simulator model card: https://huggingface.co/nvidia/Cosmos-H-Surgical-Simulator

The documentation does not provide a verified install command in the research provided here, so use the package and version requirements above when setting up your environment, and refer to the official repository for implementation details.

Start with the dataset, not the models

The practical entry point is Open-H-Embodiment v1. It gives you the schema, naming conventions, and metadata expectations that the released models were built around.

The contribution guide standardizes these core fields:

action
observation.state
observation.images.xxx

It also recommends healthcare-specific metadata, including:

surgical tool identity
ultrasound parameters

That standardization is the real unlock. If you want to compare policies across robot platforms, or eventually contribute data back to the ecosystem, matching the published format matters more than trying to optimize custom schemas up front. The same principle shows up in prompt and context-heavy AI systems, where consistency usually beats cleverness, which is why Context Engineering: The Most Important AI Skill in 2026 is relevant even outside text-only applications.

How to structure your data for Open-H

Open-H is explicitly designed for cross-embodiment training and evaluation. That means your data should preserve both common structure and robot-specific context.

Based on the Open-H guide, your collection pipeline should capture:

Field category	Required or recommended details
Actions	Standardized `action` field
Robot state	`observation.state`
Images	`observation.images.xxx`
Sampling	≥20 Hz suggested
Image quality	≥480p suggested
Metadata	Embodiment info, surgical tool identity, ultrasound parameters where relevant

The release spans commercial systems such as CMR Surgical, Rob Surgical, and Tuodao, plus research systems like dVRK, Franka, and Kuka. If your lab uses one of those platforms, aligning your collection format with Open-H should be straightforward.

If your robot is outside that set, the same schema is still useful. The dataset is presented as a multi-institution standardization effort, not a closed benchmark tied to a single embodiment.

Using GR00T-H for policy research

GR00T-H is the policy model in the stack. The model card describes it as:

3B parameters
Based on GR00T N1.6
Trained on a 601.50-hour subset of Open-H
Built from 58 datasets
Trained with a 98% / 2% train-validation split
Covers 7 robotic embodiments

The seven listed embodiments are:

CMR Versius
dVRK
dVRK-Si
UR5
Rob Surgical Bitrack
Tuodao MA2000
KUKA

Architecturally, GR00T-H combines:

vision and text transformers for observations and instructions
embodiment-indexed MLPs for proprioception and action handling
a flow-matching transformer for action generation

That combination makes the model most useful for researchers studying shared policy learning across hardware platforms. If your current system uses per-robot imitation policies, GR00T-H is a concrete example of a cross-embodiment approach.

There is one important documentation caveat. The blog states GR00T-H uses a common relative end-effector action space, while the model card says the released checkpoint standardizes to absolute end-effector positioning. For implementation decisions, treat the model card as the stronger release-specific source.

The model card also states the intended scope clearly: R&D only, not clinical deployment and not medical decision-making.

What to pay attention to in GR00T-H’s design

Several release details are useful if you want to reproduce the same training assumptions in your own experiments.

The announcement highlights these choices:

unique embodiment projectors
100% state dropout during inference
relative end-effector action training
injection of instrument/control metadata into prompts

Even without a published implementation snippet in the research, these choices tell you how the training pipeline was shaped.

The embodiment projectors suggest that robot-specific adaptation was handled explicitly rather than hidden inside one universal state encoder. The metadata injection means prompts are carrying control context, not just natural language task descriptions. That is close in spirit to structured conditioning in LLM systems, where consistent inputs improve reliability, similar to what developers already apply with Structured Output from LLMs: JSON Mode Explained.

Using Cosmos-H-Surgical-Simulator for rollout generation

Cosmos-H-Surgical-Simulator is the simulation component. It was fine-tuned from NVIDIA Cosmos Predict 2.5 2B, and the model card specifies:

Based on Cosmos-Predict2.5-2B-Video2World
Accepts a 44-dimensional action vector
That vector is split as 22 dimensions per arm
Conditions on the current frame
Predicts the next 12 frames
Supports autoregressive rollout of full trajectories

The release article says it can generate physically plausible surgical video directly from kinematic actions, and reports 40 minutes for 600 rollouts compared with 2 days using real benchtop methods.

The training setup is also unusually specific:

trained on 9 robot embodiments
trained from 32 datasets in Open-H
used 64 A100 GPUs
for roughly 10,000 GPU-hours
with a unified 44D action space

Those numbers matter because they set expectations. This is an open research model, but it was not trained on commodity infrastructure.

Where Cosmos-H fits in a workflow

The simulator is best used as a world-model layer for evaluation and synthetic trajectory generation.

The model card names these supported procedure and task areas:

Platform or context	Example procedures or tasks
CMR Surgical Versius clinical procedures	cholecystectomy, prostatectomy, inguinal hernia, hysterectomy
dVRK and MITIC	suturing, tissue manipulation, peg transfer

A practical workflow looks like this:

Collect or normalize trajectories in Open-H / LeRobot v2.1 format.
Train or adapt policies against that standardized action and observation structure.
Use Cosmos-H-Surgical-Simulator to generate action-conditioned rollouts for evaluation or synthetic augmentation.
Benchmark long-horizon dexterity on known tasks such as suturing.

The release links that last point to SutureBot, a NeurIPS 2025 autonomous suturing benchmark with 1,890 demonstrations and evaluation on dVRK Si. If your work focuses on imitation quality, action sequencing, or insertion-point accuracy, a benchmarked task like that is a better first target than an open-ended “general surgical autonomy” claim.

Tradeoffs and limitations

This stack is useful immediately for research, but the constraints are important.

Limitation	Why it matters
R&D only	GR00T-H is not for clinical deployment or medical decision-making
Early release	Adoption is still early, so tooling and community examples are limited
Data format matters	Open-H expects LeRobot v2.1 and associated field conventions
Compute requirements are high	Cosmos-H training used 64 A100s and ~10,000 GPU-hours
Documentation inconsistency	GR00T-H action space details differ between the announcement and model card

The dataset is also specialized. That is its strength, but it means you should not expect the breadth of large general robotics corpora. The release itself notes that Open-H’s 778 hours are smaller than some broad robot pretraining datasets, while being much more clinically relevant.

When to use this stack

Use Open-H + GR00T-H + Cosmos-H if your work sits in one of these categories:

healthcare robot imitation learning
cross-embodiment policy transfer
surgical simulation and synthetic data generation
policy evaluation for long-horizon manipulation
data standardization across institutions or labs

If your project is still at the stage of defining data interfaces and evaluation targets, prioritize Open-H first. If your project already has trajectories and an evaluation loop, the simulator may give you faster iteration on rollout testing. If you are building higher-level orchestration around these systems, the same planning concerns show up in agentic software stacks, which is where Multi-Agent Systems Explained: When One Agent Isn’t Enough becomes useful.

Start by aligning one existing dataset or collection pipeline to LeRobot v2.1 with the Open-H field conventions, then compare your current policy training setup against the released GR00T-H scope and action-space assumptions. After that, add Cosmos-H-Surgical-Simulator for offline rollouts on a narrow task such as suturing or peg transfer, and measure whether the simulated evaluations reduce benchtop iteration time enough to justify integrating it into your research loop.

How to Get Started with Open-H, GR00T-H, and Cosmos-H for Healthcare Robotics Research

What each release component is for

Installation and setup

Start with the dataset, not the models

How to structure your data for Open-H

Using GR00T-H for policy research

What to pay attention to in GR00T-H’s design

Using Cosmos-H-Surgical-Simulator for rollout generation

Where Cosmos-H fits in a workflow

Tradeoffs and limitations

When to use this stack

Keep Reading

Cosmos 3 Open Omnimodel Merges World Simulation and Action

NVIDIA Nemotron 3 Super Redefines Agentic AI with Hybrid MoE

NVIDIA Introduces SPEED-Bench for Speculative Decoding

NVIDIA Launches Nemotron Coalition at GTC 2026

How to Run NVIDIA Nemotron 3 Nano 4B Locally on Jetson and RTX