Nvidia publishes first Blackwell B200 MLPerf results: Up to 4X faster than its H100 predecessor, when using FP4

Blackwell
(Image credit: Nvidia)

Nvidia has published the first MLPerf 4.1 results of its Blackwell B200 processor. The results reveal that a Blackwell GPU offers up to four times the performance of its H100 predecessor based on the Hopper architecture, highlighting Nvidia's position as the leader in AI hardware. There are some caveats and disclaimers that we need to point out, however.

Based on Nvidia's results, a Blackwell-based B200 GPU delivers 10,755 tokens/second on a single GPU in a server inference test and 11,264 tokens/second in an offline reference test. A quick look at the publicly available MLPerf Llama 2 70B benchmark results reveals that a 4-way Hopper H100-based machine delivers similar results, lending credence to Nvidia's claim that a single Blackwell processor is about 3.7X– 4X faster than a single Hopper H100 GPU. But we need to dissect the numbers to better understand them.

Swipe to scroll horizontally
MLPerf 4.1 generative AI benchmark on Llama 2 70B model (tokens/second)
Row 0 - Cell 0 # of GPUsOfflineServerper GPU Offlineper GPU Server
Nvidia B200 180GB HBM3E111264107551126410755
Nvidia H100 80GB HBM3410700952226752381
Nvidia H200 141GB HBM3E14488420244884202
Nvidia H200 141GB HBM3E8321242973940163717
Anton Shilov
Contributing Writer

Anton Shilov is a contributing writer at Tom’s Hardware. Over the past couple of decades, he has covered everything from CPUs and GPUs to supercomputers and from modern process technologies and latest fab tools to high-tech industry trends.