Nvidia Is Bringing Back the Dual GPU... for Data Centers
Fit for the 165 billion parameter GPT-3 model
Nvidia announced a new dual-GPU product, the H100 NVL, during its GTC Spring 2023 keynote. This won't bring back SLI or multi-GPU gaming, and won't be one of the best graphics cards for gaming, but instead targets the growing AI market. From the information and images Nvidia has released, the H100 NVL (H100 NVLink) will sport three NVLink connectors on the top, with the two adjacent cards slotting into separate PCIe slots.
Note that the existing H100 PCIe already had the three NVLink options, but the H100 NVL makes some other changes and will also only be sold as a paired card solution. It's an interesting change of pace, apparently to accommodate servers that don't support Nvidia's SXM option, with a focus on inference performance rather than training. The NVLink connections should help provide the missing bandwidth that NVSwitch gives on the SXM solutions, and there are some other notable differences as well.
Take the specifications. Previous H100 solutions — both SXM and PCIe — have come with 80GB of memory (HBM3 for the SXM, HBM2e for PCIe), but the actual package contains six stacks, each with 16GB of memory. It's not clear if one stack is completely disabled, or if it's for ECC or some other purpose. What we do know is that the H100 NVL will come with 94GB per GPU, and 188GB HBM3 total. We assume the "missing" 2GB per GPU is either for ECC or somehow related to yields, though the latter seems a bit odd.
Power is slightly higher than the H100 PCIe, at 350–400 watts per GPU (configurable), an increase of 50W. Total performance meanwhile ends up being effectively double that of the H100 SXM: 134 teraflops of FP64, 1,979 teraflops of TF32, and 7,916 teraflops FP8 (as well as 7,916 teraops INT8).
Basically, this looks like the same core design of the H100 PCIe, which also supports NVLink, but potentially now with more of the GPU cores enabled, and with 17.5% more memory. The memory bandwidth is also quite a bit higher than the H100 PCIe, thanks to the switch to HBM3. H100 NVL checks in at 3.9 TB/s per GPU and a combined 7.8 TB/s (versus 2 TB/s for the H100 PCIe, and 3.35 TB/s on the H100 SXM).
As this is a dual-card solution, with each card occupying a 2-slot space, Nvidia only supports 2 to 4 pairs of H100 NVL cards for partner and certified systems. How much would a single pair cost, and will they be available to purchase separately? That remains to be seen, though a single H100 PCIe can sometimes be found for around $28,000. So $80,000 for a pair of H100 NVL doesn't seem out of the question.
Stay On the Cutting Edge: Get the Tom's Hardware Newsletter
Get Tom's Hardware's best news and in-depth reviews, straight to your inbox.
Jarred Walton is a senior editor at Tom's Hardware focusing on everything GPU. He has been working as a tech journalist since 2004, writing for AnandTech, Maximum PC, and PC Gamer. From the first S3 Virge '3D decelerators' to today's GPUs, Jarred keeps up with all the latest graphics trends and is the one to ask about game performance.
-
bit_user @JarredWaltonGPU , I'm not really sure what's new here (besides the additional memory and power). Their existing H100 PCIe product already had 3x NVLink bridge that could be used to install the cards in pairs. This PDF is dated Nov. 30, 2022 and includes installation diagrams clearly showing that.Reply
https://www.nvidia.com/content/dam/en-zz/Solutions/gtcs22/data-center/h100/PB-11133-001_v01.pdf
It also states:
"The NVIDIA H100 PCIe operates unconstrained up to its maximum thermal design power (TDP) level of 350 W"
So, you're right that they did increase the power limits.
Over at Anandtech, Ryan Smith is claiming the additional capacity is from enabling the 6th stack, which is also now HBM3. That increases the memory bandwidth of a single card to 3.9 TB/s, according to him.
The reference to "GPT3-175B" makes me wonder if GPT3 was just too big to fit a pair of their existing H100 PCIe cards, hence the need for this upgrade. -
JarredWaltonGPU
Oh... I somehow got it into my head that H100 was always HBM3. Seems like it was HBM2e on the PCIe model. I guess HBM3 and HBM2e must not be all that different at a base level. It's still a bit odd on the memory capacity going to 94GB per card. Like, enabling the sixth stack makes sense. But why not the full 96GB per card? Were yields really that much better with disabling 2GB per card? Or maybe it's something with ECC, but I don't know. I'll ask Nvidia, see if it has a response.bit_user said:@JarredWaltonGPU , I'm not really sure what's new here (besides the additional memory and power). Their existing H100 PCIe product already had 3x NVLink bridge that could be used to install the cards in pairs. This PDF is dated Nov. 30, 2022 and includes installation diagrams clearly showing that.
https://www.nvidia.com/content/dam/en-zz/Solutions/gtcs22/data-center/h100/PB-11133-001_v01.pdf
It also states:
"The NVIDIA H100 PCIe operates unconstrained up to its maximum thermal design power (TDP) level of 350 W"
So, you're right that they did increase the power limits.
Over at Anandtech, Ryan Smith is claiming the additional capacity is from enabling the 6th stack, which is also now HBM3. That increases the memory bandwidth of a single card to 3.9 TB/s, according to him.
The reference to "GPT3-175B" makes me wonder if GPT3 was just too big to fit a pair of their existing H100 PCIe cards, hence the need for this upgrade.
As for the additional memory, I'm sure there's something about the extra VRAM enabling larger models. From what I can tell, 4-bit mode with OPT-13b needs at least ~10GB. So if you go up to 130b, it would be ~100GB, and 165b would be ~127GB. That's assuming truly linearly scaling but that's probably not accurate. Whatever the limits are, 188GB vs. 160GB means the model can be 17.5% larger. Bigger is better? 🙃 -
10tacle RIP SLI. Long time SLI builder here since the 3Dfx Voodoo2 days of the late 1990s. It used to be a common theme that SLI was popular to allow one to game on one mid-range GPU at lower settings until buying a second GPU and maybe bigger PSU when budget allowed to fully unlock a game's eye candy potential at single top end GPU performance (at least that was me). However, at some point in the mid-2010s game developers started not supporting dual SLI/Crossfire right around the time the 4th-gen consoles started becoming their development focus.Reply
What I don't remember is which triggered which first in SLI's death: did the game developers not developing dual GPU support drive the hardware manufacturers to stop supporting it or did the manufacturers drop support first and then the game developers stopped supporting it? Either way since we are now forced to buy one single much more expensive GPU these days (REALLY much more expensive), upgrade paths are fewer and far between for many. -
bit_user
Excellent summary. Good questions.10tacle said:Either way since we are now forced to buy one single much more expensive GPU these days (REALLY much more expensive), upgrade paths are fewer and far between for many.
As the latest nodes become increasingly expensive and GPU designers wrestle with chiplets and partitioning, I actually wonder if we could see a revival of multi-GPU. With PCIe 5 / CXL, we might not even need over-the-top connectors, although a dual GPU setup would typically mean 2 cards running at just x8.
Another key question is whether increasing use of ray tracing might be an enabler, here. Around 2006 or so, Intel started talking about it and making the case that it scales better than rasterization. That's when they put together a demo of raytraced Doom or Quake, running on a dual quad-core Core 2 Xeon workstation, with all the rendering done on the CPU cores. Maintaining the BVH tree could throw a wrench into that idea - I'm not sure how well you could distribute it.
Anyway, I think whatever happens with multi-die GPUs, the software which is designed to accommodate the partitioning will utilize the hardware more efficiently. And once modern game engines have to take that on board, multi-GPU is a couple more steps down that logical progression. So, I wouldn't rule out the possibility of seeing it return within the decade.