Google Cloud launches first Blackwell AI GPU-powered instances — 72-way GB200 with 72 B200 GPUs and 36 Grace CPUs

Dell servers based on Nvidia GB200. Image is for illustrative purposes only.

(Image credit: Switch)

Google Cloud has introduced its A4X virtual machines, which are powered by Nvidia's GB200-based NVL72 machines. These rack-scale systems feature 72 B200 GPUs and 36 Grace CPUs. According to Google, the new VMs are designed for large-scale AI workloads, such as large language models with long context windows, reasoning models, and scenarios that require massive concurrency. Google also offers A4 VMs for general AI training and development.

Google's A4X VMs leverage Nvidia's NVL72 machines, which have 72 B200 GPUs and 36 72-core Grace CPUs (2,596 Armv9-based Neovers V2 cores) interconnected with NVLinks. This enables seamless memory sharing across all 72 GPUs, improving response times and inference accuracy. The system also supports concurrent inference requests, making it suitable for multimodal AI applications.

Performance-wise, A4X VMs deliver four times the training efficiency of previous A3 VMs that used Nvidia's H100 GPUs. In particular, Google Cloud promises 'over 1 ExaFLOPS' of computing power per GB200 NVL72 system, potentially producing 1440 FP8/INT8/FP6 PetaFLOPS performance, suitable for training and inference with concurrent workloads.

A4X VMs also feature the Titanium ML network adapters built on Nvidia's ConnectX-7 NICs to ensure fast, secure, and scalable ML performance as it enables 28.8 terabits per second (72 × 400 Gbps) of uninterrupted low-latency GPU-to-GPU traffic using RoCE. Google Cloud's Jupiter network fabric connects NVL72 domains between each other, enabling seamless scaling to tens of thousands of Blackwell GPUs in a non-blocking cluster. In particular, AI teams can deploy A4X VMs via Google Kubernetes Engine (GKE), which supports clusters of up to 65,000 nodes. Google also touts advanced sharing and pipelining techniques to maximize GPU utilization for large deployments.

A4X VMs also seamlessly integrate with Google Cloud services. Google supports Cloud Storage FUSE, which improves training data throughput by 2.9 times, while Hyperdisk ML accelerates model load times by 11.9 times.

Google Cloud now offers both A4 and A4X VMs, each optimized for different AI workloads. A4X, with GB200 NVL72 systems, is aimed at large-scale AI, long-context language models, and high-concurrency applications. At the same time, A4, powered by B200 GPUs and unknown processors, is better suited for general AI training and fine-tuning. The pricing of A4X and A4 is unknown.

TOPICS

Anton Shilov is a contributing writer at Tom’s Hardware. Over the past couple of decades, he has covered everything from CPUs and GPUs to supercomputers and from modern process technologies and latest fab tools to high-tech industry trends.