Nvidia's Grace Hopper GH200 Powers 1 ExaFLOPS Jupiter Supercomputer
Nvidia's GH200-powered centers offer 200 ExaFLOPS of AI performance.
Nvidia said that its Grace Hopper GH200 superchip featuring its own Arm-based CPU and Hopper-based GPU for artificial intelligence (AI) and high-performance computing (HPC) powers Jupiter, Europe's first ExaFLOPS supercomputer hosted at the Forschungszentrum Jülich facility in Germany. The machine can be used for both simulations and AI workloads, which sets it apart from the vast majority of supercomputer installed today. In addition, Nvidia said that its GH200 powers 40 supercomputers worldwide and their combine performance is around 200 'AI' ExaFLOPS.
The Jupiter supercomputer is powered by nearly 24,000 GH200 chips with 2.3 TB of HBM3E memory in total interconnected using Quantum-2 InfiniBand networking and cooled down using Eviden's BullSequana XH3000 liquid-cooling technology. The machine offers 1 ExaFLOPS of FP64 performance for simulations, including climate and weather modeling, material science, drug discovery, industrial engineering, and quantum computing. In addition, the machine can provide a whopping 90 ExaFLOPS of AI performance for training of large language models and similar workloads. All of this will be powered by Nvidia's software solutions like Earth-2, BioNeMo, Clara, cuQuantum, Modulus, and Omniverse.
Each node within the Jupiter supercomputer is a powerhouse by itself, featuring 288 Arm Neoverse cores and four H200 GPU for AI and HPC workloads and is capable of achieving a remarkable 16 PetaFLOPS of AI performance. The GH200 superchips are interconnected using Nvidia's NVLink connection, though Nvidia has not disclosed total bandwidth of this interconnection.
"At the heart of Jupiter is NVIDIA’s accelerated computing platform, making it a groundbreaking system that will revolutionize scientific research," said Thomas Lippert, director of the Jülich Supercomputing Centre. "Jupiter combines exascale AI and exascale HPC with the world’s best AI software ecosystem to boost the training of foundational models to new heights."
While Jupiter is a remarkable system — at the end of the day, this is Europe's first ExaFLOPS machine — it is not the only GH200-based supercomputer. The University of Bristol, in particular, is building a supercomputer with more than 5,000 GH200 with 141GB HBM3E, aiming to be the most powerful in the UK. Nvidia's Grace Hopper GH 200 is going to be used in more than 40 supercomputers around the world built by companies like Dell, HPE, and Lenovo. Combined performance of these system will be about 200 AI ExaFLOPS.
HPE, for example, will be using Nvidia's Grace Hopper GH200 superchip in its HPE Cray EX2500 supercomputers. These machines can be scaled up massively, using thousands of GH200 chips, which means faster and more efficient AI model training.
Nvidia's GH200 will also be available from ASRock Rack, Asus, Gigabyte, and Ingrasys by the end of the year. Nvidia also plans to offer free access to the GH200 through its LaunchPad program, making this powerful tech available to more people.
Stay On the Cutting Edge: Get the Tom's Hardware Newsletter
Get Tom's Hardware's best news and in-depth reviews, straight to your inbox.
Anton Shilov is a contributing writer at Tom’s Hardware. Over the past couple of decades, he has covered everything from CPUs and GPUs to supercomputers and from modern process technologies and latest fab tools to high-tech industry trends.
-
bit_user That picture of a floor full of racks, arranged in neat rows had me thinking... is there no value it a more compact arrangement? Maybe the switch network is too high-latency for a more physically compact layout to make much difference, but then what if you could connect all of the nodes in the same CXL topology? Could it ever make sense to pack all of the machines into more of a cube-type arrangement?Reply -
SunMaster
I imagine heat from the cubes centre will be challenging, growing exponentially with the cube size. Watercooling is great but no matter how great cooling solution you have you never get rid of all excess heat.bit_user said:That picture of a floor full of racks, arranged in neat rows had me thinking... is there no value it a more compact arrangement? Maybe the switch network is too high-latency for a more physically compact layout to make much difference, but then what if you could connect all of the nodes in the same CXL topology? Could it ever make sense to pack all of the machines into more of a cube-type arrangement?
Just my wild guess. -
bit_user
I'm not saying you'd pack them together without any space in between. Also, think about inside of those racks. They clearly dissipate enough heat that machines sandwiched between multiple other machines stay cool enough.SunMaster said:I imagine heat from the cubes centre will be challenging, growing exponentially with the cube size. Watercooling is great but no matter how great cooling solution you have you never get rid of all excess heat. -
SunMaster bit_user said:I'm not saying you'd pack them together without any space in between. Also, think about inside of those racks. They clearly dissipate enough heat that machines sandwiched between multiple other machines stay cool enough.
Resistance is futile -
JakeTheMate
Apologies if I didn’t quite get your idea and you are already aware of this. The racks are able to dissipate the heat due to hot rows and cold rows. Each row has the front of racks facing each other that draws in the cold air and each alternate row has the back of the racks facing each other and the warm air is drawn up and out of the DC.bit_user said:I'm not saying you'd pack them together without any space in between. Also, think about inside of those racks. They clearly dissipate enough heat that machines sandwiched between multiple other machines stay cool enough.
There are other benefits for not cramming racks to close to each other.
5 Advantages of Row Cooling vs. Room Cooling for Edge and Data Center Environments
Edit: In some of the data centers I've been in the rows are pretty narrow and not much wasted space. -
bit_user
Thanks. I've heard such things. I just wonder if there wouldn't be some worthwhile benefits to a 3D topology of the racks, rather than 2D.JakeTheMate said:Apologies if I didn’t quite get your idea and you are already aware of this. The racks are able to dissipate the heat due to hot rows and cold rows. Each row has the front of racks facing each other that draws in the cold air and each alternate row has the back of the racks facing each other and the warm air is drawn up and out of the DC.
It seems like we're starting to move beyond air cooling, anyhow. Once there's water cooling, I think it opens up opportunities for 3D arrangements.
Say, is Google still using containers, in their datacenters? The few pictures I've seen showed those stacked in ways that could be utilized by their network topology. -
JakeTheMate
You’re welcome. In terms of a 3D topology for performance and latency it’s not really by area but logically I can see how it should improve things.bit_user said:Thanks. I've heard such things. I just wonder if there wouldn't be some worthwhile benefits to a 3D topology of the racks, rather than 2D.
It seems like we're starting to move beyond air cooling, anyhow. Once there's water cooling, I think it opens up opportunities for 3D arrangements.
Say, is Google still using containers, in their datacenters? The few pictures I've seen showed those stacked in ways that could be utilized by their network topology.
Row cooling is a common topology in colocation or similar DC’s, never been in a hyperscaler DC but as far as I know they can have fairly exotic setup’s. Not familiar with Google and containers, will look it up. A cooling setup that may better suit your idea might be an immersion solution, just don’t know how practical it would be for a super computer due to how frequently they are replacing faulty parts.
Microsoft finds underwater datacenters are reliable, practical and use energy sustainably
Agreed regarding air cooling, DC designs are getting creative with managing cooling power and power usage in general.
World's largest data center being built in the Arctic -
bit_user
This is what I'm talking about:JakeTheMate said:Not familiar with Google and containers, will look it up.
https://en.wikipedia.org/wiki/Google_Modular_Data_Center
I can't find many recent references, so presumably that's not an approach they've kept with.