Nvidia Hopper H200 breaks MLPerf benchmark record with TensorRT — no Blackwell submissions yet, sorry

Nvidia Hopper HGX H200
(Image credit: Nvidia)

Nvidia reports that its new Hopper H200 AI GPU combined with its performance-enhancing TensorRT LLM has broken the record in the latest MLPerf performance benchmarks. The pairing together has boosted the H200's performance to a whopping 31,712 tokens a second in MLPerf's Llama 2 70B benchmark, a 45% improvement over Nvidia's previous generation H100 Hopper GPU.

Hopper H200 is basically the same silicon as H100, but the memory was upgraded to 24GB 12-Hi stacks of HBM3e. That results in 141GB of memory per GPU with 4.8 TB/s of bandwidth, where the H100 typically only had 80GB per GPU (94GB on certain models) with up to 3 TB/s of bandwidth.

This record will undoubtedly be broken later this year, or early next year, once the upcoming Blackwell B200 GPUs come to market. Nvidia likely has Blackwell in-house and undergoing testing, but it's not publicly available yet. It did claim performance up to 4X higher than H100 for training workloads, however.

Nvidia is the only AI hardware manufacturer in the market that has published full results since MLPerf's data center inference benchmarks became available in late 2020. The latest iteration of MLPerf's new benchmark utilizes Llama 2 70B, is a state-of-the-art language model leveraging 70 billion parameters. Llama 2 is more than 10x larger than GPT-J LLM that was used previously in MLPerf's benchmarks.

MLPerf benchmarks are a suite of benchmarks developed by ML Commons designed to provide unbiased evaluations of training and inference performance for software, hardware, and services. The entire suite of benchmarks consists of many AI neural network designs, including GPT-3, Stable Diffusion V2, and DLRM-DCNv2 to name a few.

Aaron Klotz
Contributing Writer

Aaron Klotz is a contributing writer for Tom’s Hardware, covering news related to computer hardware such as CPUs, and graphics cards.

With contributions from