OpenAI, AMD, Broadcom, Intel, Microsoft and NVIDIA Release MRC 1.0 to OCP, an Open RDMA Protocol Already Running OpenAI's Largest Training Clusters

Overview

A six-company collaboration of OpenAI, AMD, Broadcom, Intel, Microsoft and NVIDIA published version 1.0 of the Multipath Reliable Connection (MRC) specification through the Open Compute Project on May 6, 2026, according to NVIDIA. The protocol, an RDMA transport for AI training fabrics, is already running OpenAI’s largest Blackwell-class clusters at Microsoft’s Fairwater and Oracle Cloud Infrastructure’s Abilene data centers, the company said, and OpenAI describes the same deployments on its own announcement.

What MRC Does

MRC enables a single RDMA connection to distribute traffic across multiple network paths, according to NVIDIA, with the stated goal of improving throughput, load balancing and availability across the very large GPU fabrics used to train frontier models. NVIDIA frames the change as “replacing a single-lane road spanning a town with a cleverly laid-out street grid system paired with an on-the-fly traffic app,” the post says.

The specification adds two operational properties absent from a typical single-path RDMA flow:

Multipath load balancing. “MRC delivers high levels of GPU utilization by load-balancing traffic across all available paths,” NVIDIA writes, targeting the synchronous all-reduce-style traffic patterns that dominate large-model training.
Hardware-speed failover. Failure-bypass logic “can — in just microseconds — detect a network path failure and reroute traffic automatically,” the company says, with intelligent retransmission handling whatever data was in flight when a link dropped.

Those properties matter at gigascale because a single stalled or dropped flow can pause an entire synchronous training step across thousands of GPUs.

Production Before Standardization

The unusual sequencing here is that MRC was deployed in production before being released as an open specification. Sachin Katti, head of industrial compute at OpenAI, said, “Deploying MRC in the Blackwell generation was very successful and was made possible by a strong collaboration with NVIDIA,” according to NVIDIA. He added that “MRC’s end-to-end approach enabled us to avoid much of the typical network-related slowdowns and interruptions” experienced in earlier training runs, the post reports.

Microsoft’s Fairwater AI supercomputers and the OpenAI/Oracle Cloud Infrastructure site at Abilene “rely on MRC to deliver on performance, scale and efficiency requirements,” NVIDIA writes. MRC “is already deployed across all of OpenAI’s largest NVIDIA GB200 supercomputers that we use to train frontier models,” including the OCI Abilene site and Microsoft’s Fairwater clusters, OpenAI says on its announcement page.

Open Specification, Vendor-Neutral Pitch

The choice of Open Compute Project as host signals that the partnership wants MRC adopted across hardware lines, not just NVIDIA’s. The NVIDIA post names AMD, Broadcom, Intel, Microsoft and OpenAI as collaborators on the specification, and confirms that MRC has been released as an “open specification through the Open Compute Project.”

On NVIDIA’s own hardware, the company says MRC runs natively across its ConnectX SuperNICs and Spectrum-X Ethernet switches, where customers can choose between MRC and Spectrum-X Ethernet’s existing Adaptive RDMA protocol, according to the post. Whether and when AMD, Broadcom and Intel ship their own MRC-capable NICs and switches remains for those vendors to announce.

What We Don’t Know

The OCP specification text itself was not directly reviewed for this report; NVIDIA and OpenAI’s announcements describe MRC’s behavior at a high level rather than line-by-line.
The companies have not published quantitative performance numbers comparing MRC to single-path RDMA for specific workloads.
The release notes do not state when other members of the collaboration plan to ship hardware with native MRC support outside the NVIDIA platforms already in production.
Whether MRC will be folded into, or remain distinct from, the broader Ultra Ethernet Consortium roadmap is not addressed in either announcement we cited.

Why It Matters

Large AI training has, until now, depended on either proprietary InfiniBand fabrics or vendor-specific Ethernet enhancements layered on top of standard RoCE. By placing a multipath RDMA transport in OCP under a multi-vendor banner — with production deployments at the largest current AI clusters as the proof point — the partnership is trying to make resilient, multi-vendor Ethernet a default option for the next generation of training infrastructure rather than a custom build.