Intel 'Emerald Rapids' 5th-Gen Xeon Platinum 8592+ Review: 64 Cores, Tripled L3 Cache and Faster Memory Deliver Impressive AI Performance

The sun shines brighter for Intel.

Intel Xeon Emerald Rapids
(Image: © Tom's Hardware)

Tom's Hardware Verdict

The fifth-gen Emerald Rapids Xeon Platinum 8592+ significantly improves Intel's competitive positioning against the competing AMD EPYC Genoa processors. Intel's addition of a more refined SoC architecture, 3x larger caches, and faster DDR5-5600 memory combines to make an incredibly competitive chip, particularly in AI workloads.

Pros

  • +

    Eight more cores for mainstream models

  • +

    Tripled L3 cache on highest-end models

  • +

    Performance in AI workloads

  • +

    Performance in both heavily- and lightly-threaded workloads

  • +

    Support for AMX, AVX-512, VNNI, BFloat 16

  • +

    Support for CXL Type 3 memory devices

Cons

  • -

    Trails AMD's peak of 64 cores for general-purpose chips

  • -

    Sub-32-core chips still have the same amount of L3 cache

Why you can trust Tom's Hardware Our expert reviewers spend hours testing and comparing products and services so you can choose the best for you. Find out more about how we test.

Intel is keeping things competitive: The company's new $11,600 flagship 64-core Emerald Rapids 5th-Gen Xeon Platinum 8592+ arrives as part of a complete refresh of the company's Xeon product stack. It will grapple with AMD's EPYC Genoa lineup of processors that continue to chew away Intel's market share. Our benchmarks on the following pages show that Emerald Rapids delivers surprisingly impressive performance uplifts, drastically improving Intel's competitive footing against AMD's Genoa. Critically, Intel's new chips have also arrived on schedule, a much-needed confirmation that the company's turnaround remains on track.

For Emerald Rapids, Intel has added four more cores to the flagship over the company's prior-gen chips, providing up to 128 cores and 256 threads per dual-socket server. It also tripled the L3 cache and moved to faster DDR5-5600 memory for the high-performance models. In concert with other targeted enhancements, including a significant redesign of the die architecture, the company claims these enhancements provide gen-on-gen gains of 42% in AI inference, 21% more performance in general compute workloads, and 36% higher performance-per-watt.

As with the previous-gen Sapphire Rapids processors, Emerald Rapids leverages the 'Intel 7' process, albeit a more refined version of the node, and the slightly-enhanced Raptor Cove microarchitecture. However, the new Emerald Rapids server chips come with plenty of new innovations and design modifications that far exceed what we've come to expect from a refresh generation — Intel moved from the complex quad-chiplet design for the top-tier Sapphire Rapids chips to a simpler two-die design that wields a total of 61 billion transistors, with the new die offering a more consistent latency profile. Despite the redesign, Emerald Rapids still maintains backward compatibility with the existing Sapphire Rapids 'Eagle Stream' platform, reducing validation time and allowing for fast market uptake of the new processors.

Emerald Rapids still trails in terms of overall core counts — AMD's Genoa tops out at 96 cores with the EPYC 9654, a 32-core advantage. As such, Emerald Rapids doesn't match Genoa in some of the densest general compute workloads. Provided the chip has enough memory throughput, the AMD's 50% core count advantage is tough to beat in most parallel workloads. However, Intel's chips still satisfy the requirements for the majority of the market — the highest-tier chips always comprise a much smaller portion of the market than the mid-range — and leans on its suite of in-built accelerators and performance in AI workloads to tackle AMD's competing 64-core chips with what it claims is a superior blend of performance and power efficiency.

There's no doubt that Emerald Rapids significantly improves Intel's competitive posture in the data center. However, AMD's Genoa launched late last year, and the company's Zen 5-powered Turin counterpunch will arrive in 2024. Those chips will face Intel's Granite Rapids processors, which are scheduled for the first half of 2024. A new battlefield has also formed, as AMD has its density-optimized Bergamo with up to 128 cores in the market, and Intel will answer with its Sierra Forest lineup with up to 288 cores early next year.

It's clear that the goalposts will shift soon for the general-purpose processors we have under the microscope today; here's how Intel's Emerald Rapids stacks up against AMD's current roster. 

Intel Emerald Rapids 5th-Gen Xeon Specifications and Pricing

Intel's fifth-gen Xeon lineup consists of 32 new models divided into six primary swimlanes (second slide), including processors designed for the cloud, networking, storage, long-life use, single-socket models, and processors designed specifically for liquid-cooled systems. The stack is also carved into Platinum, Gold, Silver, and Bronze sub-tiers. Notably, Intel hasn't listed any chips as scalable to eight sockets, a prior mainstay for Xeon. Now the series tops out at support for two sockets. Intel also offers varying levels of memory support, with eight-channel speeds spanning from DDR5-4400 to DDR5-5600. In contrast, all of AMD's Genoa stack supports 12 channels of DDR5-4800 memory.

Intel seemingly has a SKU for every type of workload, but Emerald Rapids' 32-chip stack actually represents a trimming of Intel's Xeon portfolio — the previous-gen roster had 52 total options. In contrast, AMD's EPYC Genoa 9004 Series family spans 18 models in three categories — Core Performance, Core Density, Balanced and Optimized — creating a vastly simpler product stack.

Emerald Rapids continues Intel's push into acceleration technologies that can be purchased outright or through a pay-as-you-go model. These purpose-built accelerator regions of the chip are designed to radically boost performance in several types of work, like compression, encryption, data movement, and data analytics (QAT, DSA, DLB, IAA), which typically require discrete accelerators for maximum performance. Each chip can have a variable number of accelerator 'devices' enabled, but the '+' models have at least one accelerator of each type enabled by default.

Emerald Rapids' TDPs range from 125W to 350W for the standard models, but the liquid-cooling-optimized chips peak at 385W. In contrast, AMD's standard chips top out at 360W but also have a 400W configurable TDP rating.

Swipe to scroll horizontally
ModelPriceCores/ThreadsBase/Boost (GHz)TDPL3 Cache (MB)cTDP (W)
EPYC Genoa 9654$11,805 96 / 1922.4 / 3.7360W384320-400
Intel Xeon 8592+ (EMR)$11,60064 / 1281.9 / 3.9350W320-
Intel Xeon 8490H (SPR)$17,00060 / 1201.9 / 3.5350W112.5-
Intel Xeon 8480+ (SPR)$10,71056 / 1122.0 / 3.8350W105-
EPYC Genoa 9554$9,087 64 / 1283.1 / 3.75360W256320-400
Intel Xeon 8562Y+ (EMR)$5,94532 / 642.8 / 4.1300W60-
Intel Xeon 8462Y+ (SPR)$3,58332 / 642.8 / 4.1300W60-
EPYC Genoa 9354$3,420 32 / 643.25 / 3.8280W256240-300
Intel Xeon 4516Y+ (EMR)$1,29524 / 482.2 / 3.7185W45-
Intel Xeon 6442Y (SPR)$2,87824 / 482.6 / 3.3225W60Row 9 - Cell 6
EPYC Genoa 9254$2,299 24 / 482.9 / 4.15200W128200-240
EPYC Genoa 9374F$4,85032 / 643.85 / 4.3320W256320-400
EPYC Genoa 9274F$3,060 24 / 484.05 / 4.3320W256320-400

The presence, or lack thereof, of Intel's in-built accelerators makes direct pricing comparisons to AMD's Genoa difficult, especially when accounting for the possibility of a customer purchasing additional acceleration functions.

The Intel Xeon Platinum 8592+ has 64 cores and 128 threads, four more cores than Sapphire Rapids' peak of 60 cores with the pricey and specialized 8490H. However, the 8592+ has eight more cores than Intel's last-gen general-purpose flagship, the 8480+.

As denoted by its '+' suffix, the 8592+ has one of each of the in-built accelerators activated. This is upgradeable to four units of each type of accelerator — for an additional fee (this is typically offered through OEMs, so pricing varies).

The 8592+'s cores run at a base of 2.0 GHz but can boost up to 3.0 GHz for all cores or 3.8 GHz on a single core. The chip is armed with 320MB of L3 cache — more than triple that of its prior-gen comparable. Intel's decision to boost L3 capacity will benefit a host of workloads, but there's a caveat. As we'll cover below, Emerald Rapids processors can have one of three different die configurations, and only the highest-end die (40 cores and higher) has the tripled cache capacity. Meanwhile, the 32-core and lesser models use a die that generally has the same amount of cache as the prior-gen.

Intel's processors now support up to DDR5-5600 in 1DPC (one DIMM per channel) mode and DDR5-4800 for 2DC, an improvement over the prior-gen's DDR5-4800. Intel has also tuned the UPI links to 20GT/s, a slight increase over the previous 16 GT/s.

All of the Emerald Rapids chips support the following:

  • LGA4677 Socket / Eagle Stream platform
  • Hyperthreading
  • Eight channels of DDR5 memory: Top-tier models run at up to DDR5-5600 (1DPC) and DDR5-4800 (2DPC), but speeds vary by model
  • 80 Lanes of PCIe 5.0 (EPYC Genoa has 128 lanes of PCIe 5.0)
  • Up to 6TB of memory per socket (same as Genoa)
  • CXL Type 3 memory support (Genoa also has support for Type 3)
  • AMX, AVX-512, VNNI, Bfloat 16 (Genoa does not support AMX)
  • UPI speed increased from 16 GT/s to 20 GT/s

Intel Emerald Rapids 5th-Gen Xeon Architecture

Intel employed two types of die with its last-generation Sapphire Rapids: an XCC (eXtreme Core Count) die design that was used in standard and mirrored configurations for the four-tile chips that stretched up to 60 cores, and a monolithic MCC (Medium Core Count) die for chips 32 cores and under.

Intel has moved to three different die designs for Emerald Rapids: An XCC die that is only used for the two-tile designs (up to 64 cores), a monolithic MCC die for models from 20+ to 32 cores, and a new monolithic EE LCC (Low Core Count) die for models with 20 cores or fewer.

Each XCC die has 30.5 billion transistors, totaling 61 billion transistors for the dual-XCC models. Despite stepping back to a dual-tile design, Emerald Rapids still employs roughly the same amount of die area as the Sapphire Rapids processors. Each XCC die comes with 33 cores, but one is disabled to defray the impact of defects during manufacturing. Intel says the die area for the XCC and MCC tiles is similar, but it hasn't shared exact measurements yet.

Intel's older quad-XCC-tile Sapphire Rapids design employed ten different EMIB interconnects to stitch them together into a quasi-monolithic device, but this added latency and variability issues and, thus, performance penalties in many types of workloads — not to mention design complexity.

In contrast, Emerald Rapids' new dual-XXC-tile only employs three EMIB connections between the two die, thus easing latency and variability concerns while improving performance and reducing design complexity. The reduced EMIB connections help in multiple ways: The associated reduction in die area devoted to the Network on Chip (NoC) and reduced data traffic saved 5–7 percent of overall chip power, which Intel then diverted to the power-hungry areas of the chip to deliver more performance.

As before, Intel provides an option to logically partition the die into separate regions to attempt to keep workloads on the same die (sub-NUMA cluster - SNC), thus avoiding the latency penalties of accessing another die. However, instead of offering up to four clusters as it did with the quad-tile Sapphire Rapids, Intel now only supports up to two clusters, one for each die, to optimize Emerald Rapids for latency-sensitive applications. The chip operates as one large cluster by default. Intel says this is optimal for the majority of workloads. In the above slides, you can see the latency impact of the different SNC configurations.

Intel's MCC and EE LCC dies are custom designs — Intel made separate designs for these dies so it could discard some of the unused functions, like the EMIB connections, to save die area and reduce cost and complexity. The second slide in the above album outlines Emerald Rapids' new alignment of the functional elements of the die, like the memory, PCIe, and UPI controllers.

Intel earmarked power reductions as a key focus area with Emerald Rapids. Most data center processors operate at 30–40 percent utilization in normal use (cloud is typically 70%), so Intel improved multiple facets of the design, including optimizing the cores and SoC interconnect for lower utilization levels. In tandem with the extra efficiencies wrung from the newer revision of the Intel 7 node, the company claims it has reduced power consumption by up to 110W when the system is at lower load levels. Intel claims the chips still deliver the same level of performance even when running in this Optimized Power Mode (OPM).

Intel also expanded upon its AVX and AMX licensing classes (explained here) to expose up to two bins of higher performance under heavy vectorized workloads. That delivers a nice additional boost.

Intel expanded support for CXL memory beyond the fledgling foray with Sapphire Rapids as well. Emerald Rapids now also supports Type 3 memory expansion devices, thus enabling new memory tiering and interleaving options. Adding externally connected memory increases both capacity and bandwidth, thus enabling the chip to perform similarly to having more memory channels at its disposal, albeit at the cost of the normal amount of latency associated with the CXL interconnect.

Paul Alcorn
Managing Editor: News and Emerging Tech

Paul Alcorn is the Managing Editor: News and Emerging Tech for Tom's Hardware US. He also writes news and reviews on CPUs, storage, and enterprise hardware.

  • Tech0000
    Thank you for the write article!
    1. correction: 2nd table first page Intel Xeon 8462Y+ (SPR) price should be the same as 8562Y+ (EMR) price = $5,945. Right now (11.22pst) you have Intel Xeon 8462Y+ (SPR) priced at $3,583 - which is wrong.

    2. I would have liked to see the comparison between EMR and SPR for the same model, e.g. 8462Y+ vs 8562Y+ to better understand and isolate the generational core for core and model improvement (mostly everything else being equal). It's hard to derive conclusions, when you are comparing different models and core configs with test results numbers allover the place - one winning over the other depending on the test performed.
    I suspect that a 8462Y+ vs 8562Y+ comparison would result net very modest gain (due marginally higher all core turbo) and that the real performance gains are in top tier SKUs with triple L3 cache, accepting faster DDR5 etc.

    3. As a workstation ship the single socket 8558U seams to be pretty good "value" (relative to the other intel SKUs) actually. $3720 for a 48 core chip with 250GB L3 cache is not too bad for a corporate WS. Not as capable (in terms of accelerators) as the other loaded high end pricy SKUs, but for 48 cores at $77/core it is pretty good. Maybe this chip can be packaged and used as a candidate Xeon W9-3585X or similar chip...
    Reply
  • thestryker
    I'm assuming a chunk of the losses are due to Zen 4 being much more efficient but it would be nice to see some clock graphs (not for every test, but maybe one per category) if possible. If that isn't possible maybe running this same suite on a 13900K/14900K and 7950X to give some context since these are extremely close in threaded performance despite Intel using more power.

    Appreciate the immediate look and hope to see some more, or maybe some Xeon W review action when those EMR refreshes come out!
    Reply
  • Vanderlindemedia
    Ouch intel.

    AMD is a generation ahead of you.
    Reply
  • tamalero
    Admin said:
    We put Intel’s fifth-gen Emerald Rapids Xeon Platinum 8592+ through the benchmark paces against AMD's EPYC Genoa to see which server chips come out the winner.

    Intel 'Emerald Rapids' 5th-Gen Xeon Platinum 8592+ Review: 64 Cores, Tripled L3 Cache and Faster Memory Deliver Impressive AI Performance : Read more
    Whats with these "you can trust our review made by pros" ?
    The first one I've ever seen do that thing was Gamer Nexus.
    Now seems everyone wants to add those kind of "claims" on their own reviews.
    Reply
  • bit_user
    Thanks for the review, as always!

    Some more potential cons:
    Still significantly lagging Genoa on energy-efficiency.
    PCIe deficit in 1P configurations (80 Emerald Rapids vs. 128 lanes for Genoa). In 2P configurations, Genoa can run at either 128 or match Emerald Rapids' 160 lanes, if you reduce the inter-processor links to just 3.
    Fewer memory channels (8 vs. 12 for Genoa), though the number of channels per-core is the same.
    The 96-core EPYC Genoa 9654 surprisingly falls to the bottom of the chart in all three of the TensorFlow workloads, implying that its incredible array of chiplets might not offer the best latency and scalability for this type of model.
    I did see a few such inversions in Phoronix' review, but fewer and way less severe. This should be investigated. I recommend asking AMD about it, @PaulAlcorn . It almost looks to me like you might've had a CPU heatsink poorly mounted, forgot to replace the fan shroud, or something like that. It's way worse than anything you saw in your original Genoa review, where we basically only saw inversions in stuff that didn't parallelize too well.
    https://www.tomshardware.com/reviews/amd-4th-gen-epyc-genoa-9654-9554-and-9374f-review-96-cores-zen-4-and-5nm-disrupt-the-data-center/5In this review, it almost seems like the EPYC 9554 is outperforming the 9654 more often than not!
    Reply
  • bit_user
    BTW, I find it a little weird that they still don't have a monolithic version that's just 1 of XCC tiles, even as just a stepping stone, before you get down to the range of the regular MCC version.
    Reply
  • bit_user
    Tech0000 said:
    2. I would have liked to see the comparison between EMR and SPR for the same model, e.g. 8462Y+ vs 8562Y+ to better understand and isolate the generational core for core and model improvement (mostly everything else being equal). It's hard to derive conclusions, when you are comparing different models and core configs with test results numbers allover the place - one winning over the other depending on the test performed.
    I suspect that a 8462Y+ vs 8562Y+ comparison would result net very modest gain (due marginally higher all core turbo) and that the real performance gains are in top tier SKUs with triple L3 cache, accepting faster DDR5 etc.
    I'd imagine the issue is that they can only test the review samples they're sent by Intel.

    Phoronix tested a limited number of benchmarks with different DDR5 speeds. Seems like the faster DDR5 wasn't a huge win, but sadly none of the AI benchmarks were included. Those should've skewed the geomean a bit higher.
    https://www.phoronix.com/review/intel-xeon-ddr5-5600
    thestryker said:
    If that isn't possible maybe running this same suite on a 13900K/14900K and 7950X to give some context since these are extremely close in threaded performance despite Intel using more power.
    To make the results more applicable, I'd suggest the E-cores should be disabled.
    Reply
  • thestryker
    bit_user said:
    BTW, I find it a little weird that they still don't have a monolithic version that's just 1 of XCC tiles, even as just a stepping stone, before you get down to the range of the regular MCC version.
    What does the XCC offer in Xeon Scalable that MCC doesn't? I was trying to think of something but the specs of all the SKUs seem so random for EMR I couldn't figure out what you'd be referring to.
    bit_user said:
    To make the results more applicable, I'd suggest the E-cores should be disabled.
    That would remove the entire point I was getting at of using the desktop parts as a comparison. The 13900K/14900K consistently go back and forth with the 7950X in MT performance at stock settings in standard CPU benchmarks despite the extra power consumption on the Intel side. Though with the IPC between RPL/Zen 4 so close maybe disabled E-cores + 1 CCD disabled would make for a good comparison as then it would be just 8 P-cores vs 8 Zen 4 cores. I haven't seen any such comparison though so this is just a wild guess.
    Reply
  • bit_user
    thestryker said:
    What does the XCC offer in Xeon Scalable that MCC doesn't?
    I just meant that perhaps they could get more mileage out of their chiplet usage. Like, maybe there are some XCC tiles with a defect in the EMIB section, so just put those on a substrate by themselves and sell it as 32C or less.

    thestryker said:
    That would remove the entire point I was getting at of using the desktop parts as a comparison.
    Okay, well if you don't exclude the E-cores, then I don't see how those tests would be relevant to these server CPUs.

    thestryker said:
    with the IPC between RPL/Zen 4 so close maybe disabled E-cores + 1 CCD disabled would make for a good comparison as then it would be just 8 P-cores vs 8 Zen 4 cores. I haven't seen any such comparison though so this is just a wild guess.
    Heh, you might just get your chance! The new Xeon E-series 2400 have their E-cores disabled (sounds ironic, eh?). So, if anyone benchmarks a Xeon E-2488 against a Ryzen 7700X, then it'd be exactly what you're talking about.

    Annoyingly (for me), the new Xeon E 2400 also have their GPUs disabled. Otherwise, I might've been interested. I guess they could still announce G-versions, later.
    Reply
  • thestryker
    bit_user said:
    I just meant that perhaps they could get more mileage out of their chiplet usage. Like, maybe there are some XCC tiles with a defect in the EMIB section, so just put those on a substrate by themselves and sell it as 32C or less.
    Ah yeah I get what you mean, but they'd be limited to 4 memory channels and half the PCIe lanes as well. I would love to know what happens in that circumstance though... like do they have to toss the whole thing?
    bit_user said:
    Okay, well if you don't exclude the E-cores, then I don't see how those tests would be relevant to these server CPUs.
    Well like I said originally it's more to give a known quantity comparison than it is to get a direct reflection. What I mean by this being if there was a test that Genoa beat EMR, but the desktop CPUs were closer/equal you could extrapolate that the server CPU differences were more likely due to efficiency than architecture. It would definitely be much better if you had a P-core only setup which matched a Zen 4 setup in performance though for this comparison.
    bit_user said:
    Heh, you might just get your chance! The new Xeon E-series 2400 have their E-cores disabled (sounds ironic, eh?). So, if anyone benchmarks a Xeon E-2488 against a Ryzen 7700X, then it'd be exactly what you're talking about.
    Yeah that would be the ideal comparison. I'd love to see a die shot to see if they're using ones without E-cores.
    bit_user said:
    Annoyingly (for me), the new Xeon E 2400 also have their GPUs disabled. Otherwise, I might've been interested. I guess they could still announce G-versions, later.
    Yeah I was surprised there were so many SKUs listed but none with an IGP. In the past they've always launched at least a few with graphics. Another reason why I'd love to see a die shot.
    Reply