Intel Ponte Vecchio Seemingly Offers 2.5x Higher Performance Than Nvidia's A100

Intel has detailed the company's Ponte Vecchio Xe-HPC GPU at Hot Chips 34. In the provided benchmarks, the chipmaker claims that Ponte Vecchio delivers up to 2.5x more performance than the Nvidia A100. But, as customary, take vendor-provided benchmarks with a pinch of salt.

Ponte Vecchio outperformed the A100 by significant margins in several Intel-selected benchmarks. Intel's powerhouse also flaunted a 2x lead in miniBUDE and 1.5x in ExaSMR. It's an interesting comparison considering that Ponte Vecchio isn't even out yet, and A100 (Ampere) has been on the market since 2020. And let's not forget that AMD's Instinct MI250X (Aldebaran) is reportedly three times faster than the A100. So Intel should worry about AMD and Nvidia's next-generation HPC products.

If Intel's numbers are accurate, Ponte Vecchio could be a potential competitor against Nvidia's next-generation H100 (Hopper). Based on the specifications we have so far, H100 should be at least twice as fast as the A100, what's even more menacing in AMD's Instinct MI300, fusing both Zen 4 CPU and CDNA 3 GPU chiplets into a single product. Dubbed as the world's first data center APU, AMD claims that the Instinct MI300 represents an 8x uplift in AI training performance compared to the Instinct MI250X.

Image 1 of 4

Ponte Vecchio will come in three flavors: OAM, x4 subsystem with Xe links, and x4 subsystem with Xe links on a dual-socket Sapphire Rapids platform. Unfortunately, Sapphire Rapids has suffered so many delays that it's not funny anymore. Barring further setbacks, some Sapphire Rapids products could finally debut in October. Nonetheless, the high-volume chips may not arrive until February 2023.

In its OAM form factor, Ponte Vecchio boasts support for both four GPU and eight GPU platforms. A two-stack Ponte Vecchio configuration pumps out 52 TFLOPs of FP32 and FP64 performance. For comparison, a single H100 SXM5 module peaks at 60 TFLOPs of FP32 and 30 TFLOPs of FP64 performance.

Ponte Vecchio features a 64MB register file, outputting up to 419 TBps of bandwidth. The L1 and L2 caches are 64MB and 408MB, respectively. The large L2 cache on Ponte Vecchio benefits specific workloads, such as 2D-FFT Case and DNN Case. In the presentation, Intel's results reveal substantial performance improvement from 80MB to 408MB in both scenarios.

TOPICS

Zhiye Liu is a news editor, memory reviewer, and SSD tester at Tom’s Hardware. Although he loves everything that’s hardware, he has a soft spot for CPUs, GPUs, and RAM.

7 Comments Comment from the forums

dalek1234

Can anyone actually believe Intel? Intel lives on a bed of lies. Probably bathes in it, too. Sooo....
Reply
Jimbojan

dalek1234 said:
Can anyone actually believe Intel? Intel lives on a bed of lies. Probably bathes in it, too. Sooo....
Aren't you just lying?
Reply
Eximo

Seems reasonable. A huge multi-chiplet GPU vs an older , smaller design and likely a process node shrink in there somewhere.
Reply
jkflipflop98

dalek1234 said:
Can anyone actually believe Intel? Intel lives on a bed of lies. Probably bathes in it, too. Sooo....

This isn't a movie and the underdog isn't going to win in the end. Just FYI.
Reply
rtoaht

dalek1234 said:
Can anyone actually believe Intel? Intel lives on a bed of lies. Probably bathes in it, too. Sooo....
Calm down and take your meds.
Reply
JayNor

pvc matrix performance , 1678 TOPS int8 and 839 TFLOPS bfloat16

AMD's MI250X is reported to do 383TF BF16. Int8 performance is also reported to be 383TOPs.

so, yeah, more than double the bf16 performance, which is meaningful for ai training. Appears to be playing a bigger part in hpc.

Looks like more than 4x the int8 performance.
Reply
ddcservices

jkflipflop98 said:
This isn't a movie and the underdog isn't going to win in the end. Just FYI.
Have you checked out the Intel quarterly report plus forecast for the rest of the year? Intel doesn't even think it will do well.
Reply

Show more comments