An EPYC Miss? Microsoft Azure Instances Pair AMD's MI300X With Intel's Sapphire Rapids

(Image credit: AMD)

Microsoft's new AI-focused Azure servers are powered by AMD's MI300X datacenter GPUs, but are paired with Intel's Xeon Sapphire Rapids CPUs. AMD's flagship fourth-generation EPYC Genoa CPUs are powerful, but Sapphire Rapids appears to have a couple of key advantages when it comes to pushing along AI compute GPUs. It's not just Microsoft choosing Sapphire Rapids either, as Nvidia also seems to prefer it over AMD's current-generation EPYC chips.

There are likely several factors that convinced Microsoft to go with Intel's Sapphire Rapids instead of AMD's Genoa, but Intel's support for its Advanced Matrix Extensions (or AMX) instructions could be among the important reasons Microsoft tapped Sapphire Rapids. According to Intel, these instructions are tailored towards accelerating AI and machine learning tasks by up to seven times.

While Sapphire Rapids isn't particularly efficient and has worse multi-threaded performance than Genoa, its single-threaded performance is quite good for some workloads. This isn't something that only helps AI workloads specifically; it's just an overall advantage in some types of compute.

It's also worth noting that servers using Nvidia's datacenter-class GPUs also go with Sapphire Rapids, including Nvidia's own DGX H100 systems. Nvidia's CEO Jensen Huang said the "excellent single-threaded performance" of Sapphire Rapids was a specific reason why he wanted Intel's CPUs for the DGX H100 rather than AMD's.

Microsoft also praised AMD's open-source ROCm software. AMD has been hard at work bringing ROCm to parity with Nvidia's CUDA software stack, which largely dominates professional and server graphics. That Microsoft is putting its faith in ROCm is perhaps a sign that AMD's hardware-software ecosystem is improving rapidly.

TOPICS

Matthew Connatser is a freelancing writer for Tom's Hardware US. He writes articles about CPUs, GPUs, SSDs, and computers in general.

15 Comments Comment from the forums

George³

Hmm, according to this article AMD must have models Epyc Genoa with AI capabilities?
Reply
bit_user

Not sure AMX is a big driver, since any processing where that would provide a big advantage would better be done on the GPU.

Single-threaded performance is a plausible explanation, but I also wonder if it has anything to do with Intel's Xeon Max models (i.e. which include up to 64 GB of HBM). Or, maybe it's just that too many customers still have Intel-centric middleware / management infrastructure.

It'd sure be interesting to know, since Genoa X would seem a natural fit, with all its cores, L3 cache, memory bandwidth, and PCIe lanes.
Reply
2Be_or_Not2Be

I'd be far more willing to believe that Nvidia took the business decision of using Intel CPUs instead of AMD EPYC CPUs w/their DGX systems, just so they don't benefit their competitor. EPYC mostly beats Sapphire Rapids on a perf/watt basis, so don't think that's a detractor. Intel's instruction set doesn't seem like it's the big reason either, if the AI loads are running primarily on the GPUs.
Reply
thestryker

There are a lot of specific reasons to use one over the other, but I wonder how many were already planned before Intel had the late vulnerability which required them to delay SPR release and re-ramp high volume.

bit_user said:
but I also wonder if it has anything to do with Intel's Xeon Max models (i.e. which include up to 64 GB of HBM).
It's not this for Microsoft they're using Xeon 8480C CPUs which is likely a semi-custom design for them.
Reply
phitinh81

The main reason is MSFT got great deal on these Xeon, same as Nvidia did with DGX system. People running these VM for GPU intensive tasks like training AI models, top tier CPU is a waste. This is basic knowledge. Twisting it to Intel's favor is silly.
Reply
cyrusfox

phitinh81 said:
The main reason is MSFT got great deal on these Xeon, same as Nvidia did with DGX system. People running these VM for GPU intensive tasks like training AI models, top tier CPU is a waste. This is basic knowledge. Twisting it to Intel's favor is silly.
How is it twisting it? For Nvidia clear decision, for Microsoft who is already using AMD GPU, its a bit perplexing to put it on Intel. This is a rare win for Intel on the server/datacenter side so an article like this is warranted.

The reason they both chose Intel I imagine is due to a combination of price(economics) and platform support/stability rather than it being feature or performance base for what is essentially AI heavy machines. AMD does not care to lower margins to compete, and they likely don't need to , probably supply constrained on more lucrative datacenter contracts.
Reply
phitinh81

cyrusfox said:
How is it twisting it? For Nvidia clear decision, for Microsoft who is already using AMD GPU, its a bit perplexing to put it on Intel. This is a rare win for Intel on the server/datacenter side so an article like this is warranted.

The reason they both chose Intel I imagine is due to a combination of price(economics) and platform support/stability rather than it being feature or performance base for what is essentially AI heavy machines. AMD does not care to lower margins to compete, and they likely don't need to , probably supply constrained on more lucrative datacenter contracts.
Intel is selling Xeon at or below cost. Their financial report on Data Center is hard proof. This article questioning the choice MSFT & Nvidia made without pointing out the obvious: price & function of CPU on these machines. If that is not twisting or should i say misleading ? Article of course warranted as a moral boost Intel's server business badly needed. As always :)
Reply
TerryLaze

cyrusfox said:
This is a rare win for Intel on the server/datacenter side so an article like this is warranted.
AMD has barely a 25% market share, on revenue, compared to intel, rare is...not that.
https://www.tomshardware.com/pc-components/cpus/amd-comes-roaring-back-gains-market-share-in-laptops-pcs-and-server-cpus
cyrusfox said:
AMD does not care to lower margins to compete, and they likely don't need to , probably supply constrained on more lucrative datacenter contracts.
And yet they reduced their margins on data center by a huge amount, the operating income dropped to less than half of what it was compared to last year.
This is the 9 months ending comparison, almost the same revenue but far less actual money they made from that, that is called lower margin.
https://ir.amd.com/news-events/press-releases/detail/1163/amd-reports-third-quarter-2023-financial-resultsSegment and Category Information(1)

Data Center

Net revenue

$4,214

$4,388Operating income

$601

$1,404
Reply
bit_user

TerryLaze said:
And yet they reduced their margins on data center by a huge amount, the operating income dropped to less than half of what it was compared to last year.
It's not clear how much of that was due to price reductions vs. cost increases. The list price of Genoa models extends higher than Milan, so even having the same revenue suggests some loss of volume.
Reply
NinoPino

phitinh81 said:
Intel is selling Xeon at or below cost..
No company sell actual products below cost, it is suicidal. May be low margins.
For the rest I agree with your considerations.

phitinh81 said:
Their financial report on Data Center is hard proof. This article questioning the choice MSFT & Nvidia made without pointing out the obvious: price & function of CPU on these machines. If that is not twisting or should i say misleading ? Article of course warranted as a moral boost Intel's server business badly needed. As always :)

Reply

Show more comments