Intel has detailed the company's Ponte Vecchio Xe-HPC GPU at Hot Chips 34. In the provided benchmarks, the chipmaker claims that Ponte Vecchio delivers up to 2.5x more performance than the Nvidia A100. But, as customary, take vendor-provided benchmarks with a pinch of salt.
Ponte Vecchio outperformed the A100 by significant margins in several Intel-selected benchmarks. Intel's powerhouse also flaunted a 2x lead in miniBUDE and 1.5x in ExaSMR. It's an interesting comparison considering that Ponte Vecchio isn't even out yet, and A100 (Ampere) has been on the market since 2020. And let's not forget that AMD's Instinct MI250X (Aldebaran) is reportedly three times faster than the A100. So Intel should worry about AMD and Nvidia's next-generation HPC products.
If Intel's numbers are accurate, Ponte Vecchio could be a potential competitor against Nvidia's next-generation H100 (Hopper). Based on the specifications we have so far, H100 should be at least twice as fast as the A100, what's even more menacing in AMD's Instinct MI300, fusing both Zen 4 CPU and CDNA 3 GPU chiplets into a single product. Dubbed as the world's first data center APU, AMD claims that the Instinct MI300 represents an 8x uplift in AI training performance compared to the Instinct MI250X.
Ponte Vecchio will come in three flavors: OAM, x4 subsystem with Xe links, and x4 subsystem with Xe links on a dual-socket Sapphire Rapids platform. Unfortunately, Sapphire Rapids has suffered so many delays that it's not funny anymore. Barring further setbacks, some Sapphire Rapids products could finally debut in October. Nonetheless, the high-volume chips may not arrive until February 2023.
In its OAM form factor, Ponte Vecchio boasts support for both four GPU and eight GPU platforms. A two-stack Ponte Vecchio configuration pumps out 52 TFLOPs of FP32 and FP64 performance. For comparison, a single H100 SXM5 module peaks at 60 TFLOPs of FP32 and 30 TFLOPs of FP64 performance.
Ponte Vecchio features a 64MB register file, outputting up to 419 TBps of bandwidth. The L1 and L2 caches are 64MB and 408MB, respectively. The large L2 cache on Ponte Vecchio benefits specific workloads, such as 2D-FFT Case and DNN Case. In the presentation, Intel's results reveal substantial performance improvement from 80MB to 408MB in both scenarios.
Stay On the Cutting Edge: Get the Tom's Hardware Newsletter
Get Tom's Hardware's best news and in-depth reviews, straight to your inbox.
Zhiye Liu is a news editor and memory reviewer at Tom’s Hardware. Although he loves everything that’s hardware, he has a soft spot for CPUs, GPUs, and RAM.
-
dalek1234 Can anyone actually believe Intel? Intel lives on a bed of lies. Probably bathes in it, too. Sooo....Reply -
Jimbojan
Aren't you just lying?dalek1234 said:Can anyone actually believe Intel? Intel lives on a bed of lies. Probably bathes in it, too. Sooo.... -
Eximo Seems reasonable. A huge multi-chiplet GPU vs an older , smaller design and likely a process node shrink in there somewhere.Reply -
jkflipflop98 dalek1234 said:Can anyone actually believe Intel? Intel lives on a bed of lies. Probably bathes in it, too. Sooo....
This isn't a movie and the underdog isn't going to win in the end. Just FYI. -
rtoaht
Calm down and take your meds.dalek1234 said:Can anyone actually believe Intel? Intel lives on a bed of lies. Probably bathes in it, too. Sooo.... -
JayNor pvc matrix performance , 1678 TOPS int8 and 839 TFLOPS bfloat16Reply
AMD's MI250X is reported to do 383TF BF16. Int8 performance is also reported to be 383TOPs.
so, yeah, more than double the bf16 performance, which is meaningful for ai training. Appears to be playing a bigger part in hpc.
Looks like more than 4x the int8 performance. -
ddcservices
Have you checked out the Intel quarterly report plus forecast for the rest of the year? Intel doesn't even think it will do well.jkflipflop98 said:This isn't a movie and the underdog isn't going to win in the end. Just FYI.