How to Get Started with Open-H, GR00T-H, and Cosmos-H for Healthcare Robotics Research
Learn how to use NVIDIA's new Open-H dataset and GR00T-H and Cosmos-H models to build and evaluate healthcare robotics systems.
NVIDIA and collaborators have released a usable open stack for healthcare robotics research: Open-H-Embodiment, GR00T-H, and Cosmos-H-Surgical-Simulator. You can use it to standardize data collection, study cross-embodiment policy learning, and build simulation workflows for surgical robotics. The official announcement and overview is the best starting point, and the rest of this guide focuses on how to work with the release in practice.
What each release component is for
The March 16, 2026 release includes three separate pieces, and they fit together cleanly.
| Component | What it is | Primary use |
|---|---|---|
| Open-H-Embodiment | Open healthcare robotics dataset | Training and evaluation data |
| GR00T-H | 3B-parameter healthcare robot policy model | Policy learning and imitation learning research |
| Cosmos-H-Surgical-Simulator | Action-conditioned surgical world model | Simulation, synthetic rollouts, and policy evaluation |
According to the release, Open-H-Embodiment contains 778 hours of CC-BY-4.0 robotics training data across simulation, benchtop exercises, and real clinical procedures, spanning surgical robotics, ultrasound, and colonoscopy autonomy, with contributions from 35 organizations. That makes it useful both as a dataset and as a shared data standard.
If your team is already working on embodied AI pipelines, the main shift here is domain specificity. This is not a general robotics corpus adapted to healthcare. It is a healthcare robotics stack with dataset formatting, model training, and simulation all aligned around surgical and imaging workflows. That domain grounding matters in the same way domain grounding matters for text systems, which is also the core idea behind Fine-Tuning vs RAG: When to Use Each Approach.
Installation and setup
The Open-H contribution guide specifies the expected dataset tooling:
- LeRobot dataset format v2.1
- LeRobot package v0.3.3
- Suggested sampling rate of at least 20 Hz
- Suggested image resolution of at least 480p
The release positions the Hugging Face dataset page as the primary location for Open-H-Embodiment v1, and the GitHub repository documents the data format and contribution workflow:
- Open-H announcement: https://huggingface.co/blog/nvidia/physical-ai-for-healthcare-robotics
- Open-H repository: https://github.com/open-h/data-collection
- GR00T-H model card: https://huggingface.co/nvidia/GR00T-H
- Cosmos-H-Surgical-Simulator model card: https://huggingface.co/nvidia/Cosmos-H-Surgical-Simulator
The documentation does not provide a verified install command in the research provided here, so use the package and version requirements above when setting up your environment, and refer to the official repository for implementation details.
Start with the dataset, not the models
The practical entry point is Open-H-Embodiment v1. It gives you the schema, naming conventions, and metadata expectations that the released models were built around.
The contribution guide standardizes these core fields:
actionobservation.stateobservation.images.xxx
It also recommends healthcare-specific metadata, including:
- surgical tool identity
- ultrasound parameters
That standardization is the real unlock. If you want to compare policies across robot platforms, or eventually contribute data back to the ecosystem, matching the published format matters more than trying to optimize custom schemas up front. The same principle shows up in prompt and context-heavy AI systems, where consistency usually beats cleverness, which is why Context Engineering: The Most Important AI Skill in 2026 is relevant even outside text-only applications.
How to structure your data for Open-H
Open-H is explicitly designed for cross-embodiment training and evaluation. That means your data should preserve both common structure and robot-specific context.
Based on the Open-H guide, your collection pipeline should capture:
| Field category | Required or recommended details |
|---|---|
| Actions | Standardized action field |
| Robot state | observation.state |
| Images | observation.images.xxx |
| Sampling | ≥20 Hz suggested |
| Image quality | ≥480p suggested |
| Metadata | Embodiment info, surgical tool identity, ultrasound parameters where relevant |
The release spans commercial systems such as CMR Surgical, Rob Surgical, and Tuodao, plus research systems like dVRK, Franka, and Kuka. If your lab uses one of those platforms, aligning your collection format with Open-H should be straightforward.
If your robot is outside that set, the same schema is still useful. The dataset is presented as a multi-institution standardization effort, not a closed benchmark tied to a single embodiment.
Using GR00T-H for policy research
GR00T-H is the policy model in the stack. The model card describes it as:
- 3B parameters
- Based on GR00T N1.6
- Trained on a 601.50-hour subset of Open-H
- Built from 58 datasets
- Trained with a 98% / 2% train-validation split
- Covers 7 robotic embodiments
The seven listed embodiments are:
- CMR Versius
- dVRK
- dVRK-Si
- UR5
- Rob Surgical Bitrack
- Tuodao MA2000
- KUKA
Architecturally, GR00T-H combines:
- vision and text transformers for observations and instructions
- embodiment-indexed MLPs for proprioception and action handling
- a flow-matching transformer for action generation
That combination makes the model most useful for researchers studying shared policy learning across hardware platforms. If your current system uses per-robot imitation policies, GR00T-H is a concrete example of a cross-embodiment approach.
There is one important documentation caveat. The blog states GR00T-H uses a common relative end-effector action space, while the model card says the released checkpoint standardizes to absolute end-effector positioning. For implementation decisions, treat the model card as the stronger release-specific source.
The model card also states the intended scope clearly: R&D only, not clinical deployment and not medical decision-making.
What to pay attention to in GR00T-H’s design
Several release details are useful if you want to reproduce the same training assumptions in your own experiments.
The announcement highlights these choices:
- unique embodiment projectors
- 100% state dropout during inference
- relative end-effector action training
- injection of instrument/control metadata into prompts
Even without a published implementation snippet in the research, these choices tell you how the training pipeline was shaped.
The embodiment projectors suggest that robot-specific adaptation was handled explicitly rather than hidden inside one universal state encoder. The metadata injection means prompts are carrying control context, not just natural language task descriptions. That is close in spirit to structured conditioning in LLM systems, where consistent inputs improve reliability, similar to what developers already apply with Structured Output from LLMs: JSON Mode Explained.
Using Cosmos-H-Surgical-Simulator for rollout generation
Cosmos-H-Surgical-Simulator is the simulation component. It was fine-tuned from NVIDIA Cosmos Predict 2.5 2B, and the model card specifies:
- Based on Cosmos-Predict2.5-2B-Video2World
- Accepts a 44-dimensional action vector
- That vector is split as 22 dimensions per arm
- Conditions on the current frame
- Predicts the next 12 frames
- Supports autoregressive rollout of full trajectories
The release article says it can generate physically plausible surgical video directly from kinematic actions, and reports 40 minutes for 600 rollouts compared with 2 days using real benchtop methods.
The training setup is also unusually specific:
- trained on 9 robot embodiments
- trained from 32 datasets in Open-H
- used 64 A100 GPUs
- for roughly 10,000 GPU-hours
- with a unified 44D action space
Those numbers matter because they set expectations. This is an open research model, but it was not trained on commodity infrastructure.
Where Cosmos-H fits in a workflow
The simulator is best used as a world-model layer for evaluation and synthetic trajectory generation.
The model card names these supported procedure and task areas:
| Platform or context | Example procedures or tasks |
|---|---|
| CMR Surgical Versius clinical procedures | cholecystectomy, prostatectomy, inguinal hernia, hysterectomy |
| dVRK and MITIC | suturing, tissue manipulation, peg transfer |
A practical workflow looks like this:
- Collect or normalize trajectories in Open-H / LeRobot v2.1 format.
- Train or adapt policies against that standardized action and observation structure.
- Use Cosmos-H-Surgical-Simulator to generate action-conditioned rollouts for evaluation or synthetic augmentation.
- Benchmark long-horizon dexterity on known tasks such as suturing.
The release links that last point to SutureBot, a NeurIPS 2025 autonomous suturing benchmark with 1,890 demonstrations and evaluation on dVRK Si. If your work focuses on imitation quality, action sequencing, or insertion-point accuracy, a benchmarked task like that is a better first target than an open-ended “general surgical autonomy” claim.
Tradeoffs and limitations
This stack is useful immediately for research, but the constraints are important.
| Limitation | Why it matters |
|---|---|
| R&D only | GR00T-H is not for clinical deployment or medical decision-making |
| Early release | Adoption is still early, so tooling and community examples are limited |
| Data format matters | Open-H expects LeRobot v2.1 and associated field conventions |
| Compute requirements are high | Cosmos-H training used 64 A100s and ~10,000 GPU-hours |
| Documentation inconsistency | GR00T-H action space details differ between the announcement and model card |
The dataset is also specialized. That is its strength, but it means you should not expect the breadth of large general robotics corpora. The release itself notes that Open-H’s 778 hours are smaller than some broad robot pretraining datasets, while being much more clinically relevant.
When to use this stack
Use Open-H + GR00T-H + Cosmos-H if your work sits in one of these categories:
- healthcare robot imitation learning
- cross-embodiment policy transfer
- surgical simulation and synthetic data generation
- policy evaluation for long-horizon manipulation
- data standardization across institutions or labs
If your project is still at the stage of defining data interfaces and evaluation targets, prioritize Open-H first. If your project already has trajectories and an evaluation loop, the simulator may give you faster iteration on rollout testing. If you are building higher-level orchestration around these systems, the same planning concerns show up in agentic software stacks, which is where Multi-Agent Systems Explained: When One Agent Isn’t Enough becomes useful.
Start by aligning one existing dataset or collection pipeline to LeRobot v2.1 with the Open-H field conventions, then compare your current policy training setup against the released GR00T-H scope and action-space assumptions. After that, add Cosmos-H-Surgical-Simulator for offline rollouts on a narrow task such as suturing or peg transfer, and measure whether the simulated evaluations reduce benchtop iteration time enough to justify integrating it into your research loop.
Get Insanely Good at AI
The book for developers who want to understand how AI actually works. LLMs, prompt engineering, RAG, AI agents, and production systems.
Keep Reading
Nvidia Unveils DLSS 5 at GTC With Generative AI Neural Rendering for Games
Nvidia introduced DLSS 5 at GTC 2026, pitching 3D-guided generative AI rendering for more photoreal game graphics and broader AI use.
NVIDIA Unveils DLSS 5 Real-Time Generative Restyling for Games
NVIDIA introduced DLSS 5 at GTC 2026, adding real-time generative scene restyling for games ahead of a planned fall release.
NVIDIA Unveils NemoClaw at GTC as a Security-Focused Enterprise AI Agent Platform
NVIDIA introduced NemoClaw, an alpha open-source enterprise agent platform built to add security and privacy controls to OpenClaw workflows.
How to Use Claude Across Excel and PowerPoint with Shared Context and Skills
Learn how to use Claude's shared Excel and PowerPoint context, Skills, and enterprise gateways for faster analyst workflows.
Anthropic Makes Claude's 1M Token Context Generally Available
Anthropic made 1M-token context GA for Claude 4.6, removing long-context premiums and boosting throughput for large code and agent tasks.