Microsoft deploys world's first 'supercomputer-scale' GB300 NVL72 Azure cluster — 4,608 GB300 GPUs linked together to form a single, unified accelerator capable of 92.1 exaFLOPS of FP4 inference

Microsoft deploys GB300 NVL72 supercluster inside Azure

(Image credit: Microsoft / Nvidia)

Microsoft has just upgraded its Azure cloud platform with Nvidia's Blackwell Ultra, deploying what it calls the world's first large-scale GB300 NVL72 supercomputing cluster. This cluster includes several racks housing exactly 4,608 GB300 GPUs connected by NVLink 5 switch fabric, which is then interconnected via Nvidia’s Quantum-X800 InfiniBand networking fabric across the entire cluster. This allows a single NVL72 rack to have a total memory bandwidth of 130 TB/s, with each rack providing 800 Gb/s of interconnect bandwidth per GPU.

The world's first large-scale @nvidia GB300 NVL72 supercomputing cluster for AI workloads is now live on Microsoft Azure. The deployment connects 4,600+ NVIDIA Blackwell Ultra GPUs using next-gen InfiniBand network—built to train and deploy advanced AI models faster than… pic.twitter.com/CmmDtcrlwnOctober 9, 2025

The 4,608 number, specified by Nvidia, denotes 64x GB300 NVL72 systems at play here, considering each rack has 72 Blackwell GPUs and 36 Grace CPUs (totaling 2,592 Arm cores). That would technically put it way behind a full hyperscale expansion, but this still marks a significant milestone for Nvidia's Grace Blackwell GB300, which recently set new benchmark records in inference performance. Microsoft says this cluster will be dedicated to OpenAI workloads, allowing for advanced reasoning models to run even faster and enable model training in “weeks instead of months.”

At rack level, each NVL72 system is said to deliver 1,440 petaflops of FP4 Tensor performance, powered by 37 terabytes of unified “fast memory,” which breaks down to 20 TB HBM3E for the GPU, and 17 TB LPDDR5X for the Grace CPU. As mentioned earlier, this memory is pooled together using NVLink 5 to make each rack work as a single, unified accelerator capable of 130 TB/s of direct bandwidth. The memory throughput is one of the most impressive parts of the GB300 NVL72, so it’s important to understand how it works.

The Quantum-X800 InfiniBand platform enables each of the 4,608 intra-connected GPUs to have a bandwidth of 800 Gb/s at the rack-to-rack level. In the end, every single GPU becomes connected, across racks and within.

The GB300 NVL72 cluster is liquid-cooled, using standalone heat exchangers and facility loops designed to minimize water usage under intense workloads. Nvidia says Microsoft needed to reimagine every layer of its data center for this deployment, and Microsoft happily points out how this is just the first of many clusters to come that will spread GB300 across the globe, taking it to full hyperscale potential. OpenAI and Microsoft already use GB200 clusters for training models, so this serves as a natural extension of their exclusive partnership.

Nvidia itself is heavily invested in OpenAI, with both recently signing a Letter of Intent (LoI) for a major strategic partnership that will see the chipmaker pour $100 billion into OpenAI, progressively. On the other hand, OpenAI will use Nvidia GPUs for its next-gen AI infrastructure, deploying at least 10 gigawatts (GW) worth of accelerators, starting with Vera Rubin next year. So, this GB300 NVL72 supercluster can be looked at as a precursor, almost materializing that investment since Microsoft is the one deploying the cluster for OpenAI, using Nvidia hardware.

Follow Tom's Hardware on Google News, or add us as a preferred source, to get our latest news, analysis, & reviews in your feeds.

TOPICS

Hassam Nasir is a die-hard hardware enthusiast with years of experience as a tech editor and writer, focusing on detailed CPU comparisons and general hardware news. When he’s not working, you’ll find him bending tubes for his ever-evolving custom water-loop gaming rig or benchmarking the latest CPUs and GPUs just for fun.