Nvidia publishes first Blackwell B200 MLPerf results: Up to 4X faster than its H100 predecessor, when using FP4

Blackwell
(Image credit: Nvidia)

Nvidia has published the first MLPerf 4.1 results of its Blackwell B200 processor. The results reveal that a Blackwell GPU offers up to four times the performance of its H100 predecessor based on the Hopper architecture, highlighting Nvidia's position as the leader in AI hardware. There are some caveats and disclaimers that we need to point out, however.

Based on Nvidia's results, a Blackwell-based B200 GPU delivers 10,755 tokens/second on a single GPU in a server inference test and 11,264 tokens/second in an offline reference test. A quick look at the publicly available MLPerf Llama 2 70B benchmark results reveals that a 4-way Hopper H100-based machine delivers similar results, lending credence to Nvidia's claim that a single Blackwell processor is about 3.7X– 4X faster than a single Hopper H100 GPU. But we need to dissect the numbers to better understand them.

Swipe to scroll horizontally
MLPerf 4.1 generative AI benchmark on Llama 2 70B model (tokens/second)
Row 0 - Cell 0 # of GPUsOfflineServerper GPU Offlineper GPU Server
Nvidia B200 180GB HBM3E111264107551126410755
Nvidia H100 80GB HBM3410700952226752381
Nvidia H200 141GB HBM3E14488420244884202
Nvidia H200 141GB HBM3E8321242973940163717
Anton Shilov
Contributing Writer

Anton Shilov is a contributing writer at Tom’s Hardware. Over the past couple of decades, he has covered everything from CPUs and GPUs to supercomputers and from modern process technologies and latest fab tools to high-tech industry trends.

  • JRStern
    Very good!
    There was some release a few weeks ago where NVIdia was claiming more like 30x faster for the overall calculation - based on FP4! That's if you could convert all your FP8, FP16, and FP32 to FP4. Which you can't.
    Reply
  • Pierce2623
    News Flash!!! Two chips at fp4 four times as fast as one chip at fp8!!! (Think to himself “isn’t that EXACTLY the same amount of processing power per chip?”)
    Reply
  • YSCCC
    Ok, good gen on gen performance leap in theory would be attained, but can we have something budget and good performing in consumer market now?
    Reply
  • renz496
    YSCCC said:
    Ok, good gen on gen performance leap in theory would be attained, but can we have something budget and good performing in consumer market now?
    Already have lots of them in the market. For nvidia anything below $1500 is "budget" card.
    Reply
  • YSCCC
    renz496 said:
    Already have lots of them in the market. For nvidia anything below $1500 is "budget" card.
    Lol that's their definition of budget, I mean (sarcasm mode on) when will they dare separate the high margin coporate hype market vs the more civilian friending gaming offerings
    Reply
  • Kamen Rider Blade
    YSCCC said:
    Lol that's their definition of budget, I mean (sarcasm mode on) when will they dare separate the high margin coporate hype market vs the more civilian friending gaming offerings
    Never Again!

    They know Enterprise pays more, ALOT more.

    We Gamers get the left-overs / scraps.
    Reply
  • GenericUsername109
    Nvidia is a hot business now, chasing profit margins, revenue and earnings. I bet a wafer of these datacenter chips is way more profitable than anything retail buyers can afford. They have a huge order backlog on these, too. No rush to waste precious TSMC capacity on some low margin "hobby" stuff, when they can turn it to gold and print money.
    Reply