NVIDIA Releases GR00T-H, the First Open Foundation Model for Surgical Robotics, Trained on 778 Hours of Operating Room Data
NVIDIA released GR00T-H at GTC 2026, a 3-billion-parameter vision-language-action model for surgical robots trained on the Open-H dataset of 778 hours of procedure recordings from 35 institutions and seven robotic platforms, with CMR Surgical contributing nearly 500 hours from its Versius system.
Overview
NVIDIA released GR00T-H at GTC 2026 on March 17, the first open foundation model purpose-built for surgical robotics. The 3-billion-parameter vision-language-action model can interpret live camera feeds from an operating room, read text instructions, and output continuous motor-control signals for robotic arms, according to the model’s technical documentation on Hugging Face. Alongside the model, NVIDIA published the Open-H-Embodiment dataset, a 778-hour collection of real and synthetic surgical procedure recordings gathered from 35 institutions worldwide, making it the largest open dataset for healthcare robotics.
The release marks the arrival of physical AI in the operating room. Where previous surgical robots executed pre-programmed movements under direct human teleoperation, GR00T-H is designed to let robotic systems interpret complex surgical environments and assist surgeons with contextual awareness of the procedure unfolding in front of them.
What We Know
GR00T-H is built on NVIDIA’s Isaac GR00T N1.6 architecture and post-trained specifically for healthcare applications. The model processes RGB camera images, floating-point proprioception vectors describing a robot’s joint positions, and natural-language text instructions, then generates continuous-value action vectors for motor control, according to the Hugging Face model card. Its architecture combines a SigLip2 vision encoder with a flow-matching transformer that uses adaptive layer normalization, plus embodiment-specific connectors that map between different robotic platforms’ proprioception and action spaces.
The Open-H dataset that underpins the model spans seven robotic surgical systems: CMR Surgical’s Versius, the da Vinci Research Kit, the da Vinci Research Kit Si, a UR5 arm, Rob Surgical’s Bitrack, Tuodao’s MA2000, and a KUKA industrial robot. The full 778 hours include synchronized video, robotic telemetry, force and torque measurements, and domain-specific sensor streams collected across simulation, benchtop, ex vivo, in vivo, and clinical environments. A curated 601.5-hour subset was used for training, with 98 percent allocated to training and 2 percent to validation, as documented on Hugging Face.
CMR Surgical contributed the largest single share of data to the initiative: nearly 500 hours of anonymized surgical recordings from its Versius robotic system, according to a GlobeNewsWire press release. The Cambridge-based company is also using NVIDIA’s Cosmos-H tool to generate physically accurate synthetic surgical data and evaluate new robotic control policies for future versions of Versius. Chris Fryer, chief technology officer at CMR Surgical, said the partnership is about “building foundations for the next generation of intelligent surgical systems.”
The broader NVIDIA physical AI initiative announced at GTC 2026 includes additional surgical robotics partners. Johnson & Johnson MedTech is using Isaac Sim and Cosmos-based workflows to train and validate systems for its Monarch urology platform. Medtronic is exploring NVIDIA’s IGX Thor hardware to deliver precision and functional safety in its surgical robotic systems, as detailed in NVIDIA’s GTC 2026 roundup. David Niewolny, head of business development for healthcare at NVIDIA, said that “medical technology leaders like CMR Surgical are accelerating a new generation of intelligent robotic systems that can assist surgeons, scale surgical expertise and ultimately expand access to high-quality care.”
What We Don’t Know
GR00T-H is released under a non-commercial NVIDIA license and is explicitly not intended for clinical deployment, patient care, or medical decision-making. How and when any of the participating companies plan to move from research prototypes to regulatory submissions for AI-assisted surgical features remains unclear. The path from a foundation model that can generate motor-control signals in a research setting to one that meets the functional safety and regulatory requirements of a live operating room has not been publicly outlined by NVIDIA or its partners.
Whether the 778 hours of training data are sufficient to generalize across the wide variety of surgical procedures, patient anatomies, and edge cases that real-world deployment demands is an open question. The model was trained on seven robotic platforms, but the surgical robotics market includes dozens of systems from companies that did not contribute data to Open-H. How well GR00T-H transfers to unseen platforms and procedures has not been benchmarked publicly.
The competitive dynamics are also uncertain. Intuitive Surgical, which dominates the installed base of surgical robots with over 9,000 da Vinci and Ion systems worldwide, was not listed among the Open-H contributors. Whether the market leader will participate in the open dataset effort, develop its own proprietary foundation model, or pursue a different approach to AI-assisted surgery has not been disclosed.
Looking Ahead
The release of GR00T-H and Open-H establishes a public benchmark and shared infrastructure for surgical robotics AI research that did not previously exist. The initiative’s emphasis on open data and open models mirrors the pattern that accelerated progress in natural language processing and computer vision, where foundation models trained on large shared datasets became the starting point for specialized applications. Whether the same dynamic will play out in surgical robotics, a domain defined by stringent safety requirements and regulatory oversight, will depend on how quickly the research community can demonstrate that these models improve surgical outcomes without introducing new risks.