Intel's unreleased Emerald Rapids CPU impresses in leaked benchmarks — 48-core chips deliver big gains over Sapphire Rapids predecessors

5th Generation Xeon Emerald Rapids CPU
5th Generation Xeon Emerald Rapids CPU (Image credit: Intel)

Intel's upcoming 5th Gen Xeon Emerald Rapids CPUs will see an announcement on December 14, and they're certainly being tested as hardware benchmark finder Benchleaks has found a couple of Geekbench 5 results. The leaked benchmarks for the Xeon Platinum 8551C and Xeon Platinum 8558P show remarkable gains in multi-core performance for a refresh of Sapphire Rapids with higher clock speeds.

Both CPUs are 48-core models, and although the Xeon Platinum 8551C has a higher base frequency at 2.9 GHz than the Xeon Platinum 8558P's 2.7 GHz, in Geekbench 5, they essentially performed the same. Both chips scored roughly 1,360 in the single-core test and 48,300 in the multi-core. It's impossible to verify the integrity of these results, but there's nothing particularly unusual about them.

It's a bit difficult to compare these benchmarks to 4th Gen Sapphire Rapids Xeons as most people with multi-thousand dollars tend not to test using Geekbench 5, but we were able to find some for the 48-core Xeon Platinum 8468. One of these scores (seemingly tested by ServeTheHome) comes from a Linux server wielding two Xeon Platinum 8468s and another from a Windows server with just one Xeon Platinum 8468. The leaked benchmarks used a single 48-core Emerald Rapids chip in a Linux server, so there's no perfect comparison here, but it's close enough.

Swipe to scroll horizontally
Header Cell - Column 0 Xeon Platinum 8558PXeon Platinum 8468Xeon Platinum 8468 x2
Single-Core Score1,3641,4381,380
Multi-Core Score48,27335,14061,289

Assuming the data is accurate in every benchmark, leaked or otherwise, it's fair to say that Emerald Rapids doesn't improve single-core performance much. But it's a different story in multi-core performance, as the leaked Xeon Platinum 8558P beats the single-CPU Xeon Platinum 8468 Windows server by nearly 40% and is very close behind the dual-CPU Xeon Platinum 8468 Linux server with almost 80% of the performance despite having just half the cores.

These numbers won't make Emerald Rapids an AMD EPYC Genoa killer (or even tied with Genoa if we're being realistic), but it is certainly impressive, given it's just a refresh. Data centers that already have Sapphire Rapids or were preparing to upgrade to Sapphire Rapids can easily swap out Emerald Rapids since it uses the same platform.

5th Gen Emerald Rapids Xeons will launch on December 14, which is very close to the much higher-end Granite Rapids launch date, due in 2024. Fabbed on the much higher-end Intel 3 process, Granite Rapids could be the CPU that gets Intel back to competition with AMD's EPYC lineup. Emerald Rapids will have to do for now, though, and with its impressive AI performance, it might be just enough.

Matthew Connatser

Matthew Connatser is a freelancing writer for Tom's Hardware US. He writes articles about CPUs, GPUs, SSDs, and computers in general.

  • thestryker
    It's a pretty safe bet that EMR will clock higher than SPR given that it uses refreshed cores and the XCC CPUs are two tiles instead of four. Hopefully we'll get full core details on December 14th when the launch is supposed to be.

    My dream is enough cheaper w25xx than w24xx that I might be able to justify the purchase.
    Reply
  • bit_user
    thestryker said:
    It's a pretty safe bet that EMR will clock higher than SPR given that it uses refreshed cores and the XCC CPUs are two tiles instead of four. Hopefully we'll get full core details on December 14th when the launch is supposed to be.
    Faster memory will also be a big win, given that it's only running on 8 channels. We saw how much HBM helped certain workloads, so there are definitely memory bottlenecks at play.

    Regarding the number of tiles, are you aware of analysis of the scaling from the single-tile to the quad-tile version of SPR?
    Reply
  • thestryker
    bit_user said:
    Regarding the number of tiles, are you aware of analysis of the scaling from the single-tile to the quad-tile version of SPR?
    I never saw much in the way of anything on the MCC Xeons at all really. Chips and Cheese only did their coverage on XCC as far as I know.

    I know all the split controllers and the required 10 EMIB connections were a big manufacturing problem. I'm not sure there will be a performance advantage (beyond the obvious clocks/DRAM bandwidth) to the two tiles vs four, but there will be a notable improvement to the manufacturing side (mostly packaging).

    For some EMR details (it could change from here, but I doubt there will be any significant differences) https://www.semianalysis.com/p/intel-emerald-rapids-backtracks-on
    Reply
  • DavidC1
    Calling Emerald Rapids and Raptorlake Refresh as both being a Refresh is a big stretch isn't it?

    Emerald Rapids changes tile configuration and even number of tiles while Raptorlake Refresh barely increases clocks.

    @thestryker According to Semianalysis, he's hinting that the new configuration is indeed to aim for better performance. That's why they went to a bigger two-Tile setup despite costing even more to manufacture than the four Tile SPR.

    While some applications will only benefit from other aspects, many other server applications will benefit from the low-level changes the changed config brings.
    Reply
  • bit_user
    DavidC1 said:
    Calling Emerald Rapids and Raptorlake Refresh as both being a Refresh is a big stretch isn't it?
    Yeah, I was on the fence about nitpicking that point, but decided the article went into enough detail that people would hopefully understand the distinction.

    DavidC1 said:
    Emerald Rapids changes tile configuration and even number of tiles while Raptorlake Refresh barely increases clocks.
    Based on what I've seen, Raptor Refresh doesn't even deserve to be called a refresh. If it's the same silicon as before, then it's a rebrand. That's all.

    As for Emerald Rapids, does it have larger caches? Or are those unchanged from Sapphire Rapids.
    Reply
  • thestryker
    bit_user said:
    As for Emerald Rapids, does it have larger caches? Or are those unchanged from Sapphire Rapids.
    Cache varies by SKU, but the top end EMR has much more L3 cache than anything on SPR (6.25MB/core vs 2.625MB/core).
    DavidC1 said:
    @thestryker According to Semianalysis, he's hinting that the new configuration is indeed to aim for better performance. That's why they went to a bigger two-Tile setup despite costing even more to manufacture than the four Tile SPR.
    The wafer cost is higher, but packaging cost is much lower.
    Reply
  • bit_user
    thestryker said:
    Cache varies by SKU, but the top end EMR has much more L3 cache than anything on SPR (6.25MB/core vs 2.625MB/core).
    Nice!

    I'll bet folks at Intel have a little uptick in their blood pressure every time they see claims about AMD's large L3 cache figures, even though it's segmented and not unified like Intel's L3 caches. I think AMD's is probably more like a L2.5 cache, on their multi-CCD CPUs?

    thestryker said:
    The wafer cost is higher, but packaging cost is much lower.
    Relatively speaking, but if it costs Intel that much more to put 4 tiles per package than 2, it seems like they haven't mastered Foveros quite like they've been portraying.
    Reply
  • thestryker
    bit_user said:
    Relatively speaking, but if it costs Intel that much more to put 4 tiles per package than 2, it seems like they haven't mastered Foveros quite like they've been portraying.
    I don't believe SPR employs Foveros it's just EMIB connections (EMR is 3 and SPR is 10) and that's a physical limitation where fewer is better no matter what.
    Reply
  • bit_user
    thestryker said:
    I don't believe SPR employs Foveros it's just EMIB connections
    Sorry, I get the terminology mixed up.

    thestryker said:
    (EMR is 3 and SPR is 10) and that's a physical limitation where fewer is better no matter what.
    Uh, 10 for the Xeon Max models, with HBM? Otherwise, I don't see how you get to 10.
    Reply
  • thestryker
    bit_user said:
    Uh, 10 for the Xeon Max models, with HBM? Otherwise, I don't see how you get to 10.
    Nope, the Max are 14.
    10x EMIB on Sapphire Rapids
    Sapphire Rapids is going to be using four tiles connected with 10 EMIB connections using a 55-micron connection pitch. Normally you might think that a 2x2 array of tiles would need equal EMIBs per tile-to-tile connection, so in this case with 2 EMIBs per connection, that would be eight – why is Intel quoting 10 here? That comes down to the way Sapphire Rapids is designed.

    Because Intel wants SPR to look monolithic to every operating system, Intel has essentially cut its inter-core mesh horizontally and vertically. That way each connection through the EMIB is seen purely as the next step on the mesh. But Intel’s monolithic designs are not symmetric in either of those dimensions – usually features like the PCIe or QPI are on the edges, and not in the same place in every corner. Intel has told us that in Sapphire Rapids, this is similarly the case, and one dimension is using 3 EMIBs per connection while the other dimension is using 2 EMIBs per connection.
    https://www.anandtech.com/show/16921/intel-sapphire-rapids-nextgen-xeon-scalable-gets-a-tiling-upgrade/2
    Reply