Google Cloud launches first Blackwell AI GPU-powered instances — 72-way GB200 with 72 B200 GPUs and 36 Grace CPUs

Dell servers based on Nvidia GB200. Image is for illustrative purposes only.
(Image credit: Switch)

Google Cloud has introduced its A4X virtual machines, which are powered by Nvidia's GB200-based NVL72 machines. These rack-scale systems feature 72 B200 GPUs and 36 Grace CPUs. According to Google, the new VMs are designed for large-scale AI workloads, such as large language models with long context windows, reasoning models, and scenarios that require massive concurrency. Google also offers A4 VMs for general AI training and development.

Google's A4X VMs leverage Nvidia's NVL72 machines, which have 72 B200 GPUs and 36 72-core Grace CPUs (2,596 Armv9-based Neovers V2 cores) interconnected with NVLinks. This enables seamless memory sharing across all 72 GPUs, improving response times and inference accuracy. The system also supports concurrent inference requests, making it suitable for multimodal AI applications.

A4X VMs also feature the Titanium ML network adapters built on Nvidia's ConnectX-7 NICs to ensure fast, secure, and scalable ML performance as it enables 28.8 terabits per second (72 × 400 Gbps) of uninterrupted low-latency GPU-to-GPU traffic using RoCE. Google Cloud's Jupiter network fabric connects NVL72 domains between each other, enabling seamless scaling to tens of thousands of Blackwell GPUs in a non-blocking cluster. In particular, AI teams can deploy A4X VMs via Google Kubernetes Engine (GKE), which supports clusters of up to 65,000 nodes. Google also touts advanced sharing and pipelining techniques to maximize GPU utilization for large deployments.

Anton Shilov
Contributing Writer

Anton Shilov is a contributing writer at Tom’s Hardware. Over the past couple of decades, he has covered everything from CPUs and GPUs to supercomputers and from modern process technologies and latest fab tools to high-tech industry trends.