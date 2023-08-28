At Hot Chips 2023, Intel unveiled the first deep-dive details of its future Xeon Sierra Forest and Granite Rapids processors, with the former comprised of Intel's new Sierra Glen E-cores while the latter employs the new Redwood Cove P-cores. The forthcoming fifth-gen Xeon chips launch in the first half of next year with a new tile-based architecture that features dual I/O chiplets on the 'Intel 7' process paired with varying configurations of compute cores etched on the 'Intel 3' process. This design allows Intel to craft multiple products based on different types of cores while maintaining the same underlying configuration that preserves hardware and firmware compatibility along with support for a shared software stack. The new chips will drop into the Birch Stream platform.

Intel claims the fifth-gen Xeon Sierra Forest's E-Core-based design will provide up to 2.5x better rack density and 2.4x higher performance per watt than its fourth-gen Xeon chips, while the P-Core powered Granite Rapids will provide 2 to 3x the performance in mixed AI workloads, partially stemming from a 2.8X improvement in memory bandwidth. Let's dive in.

Sierra Forest and Granite Rapids Architecture

Intel initially moved to a tile-based (chiplet-esque) architecture with its fourth-gen Xeon Sapphire Rapids processors, but Sierra Forest and Granite Rapids bring a new level of disaggregation to the approach.

Intel employed a four-die design with Sapphire Rapids, with each die containing a portion of the relevant I/O functions, like memory and PCIe controllers. The new fifth-gen processors fully disaggregate some I/O functions to two separate HSIO chiplets etched on the Intel 7 process. These HSIO dies are placed at the top and bottom of the chip package with one to three compute dies fabbed with the Intel 3 process in the center, all tied together with an unspecified number of EMIB (Embedded Multi-Die Interconnect Bridge) interconnects fused within the substrate and connected to a die-to-die interconnect in the die at each end of the bridge.

Combined, the two HSIO dies support up to 136 lanes of PCIe 5.0/CXL 2.0, up to 6 UPI links (144 lanes), and compression, cryptography, and data streaming accelerators in a similar fashion to Sapphire Rapids' acceleration engines. Each HSIO die also includes power control circuitry that manages the compute chiplets. Intel has now done away with the requirement for a chipset, thus allowing the processors to be self-booting, much like AMD's EPYC processors.

The compute tiles will employ either Redwood Cove P-cores (Performance cores) for Granite Rapids or Sierra Glen E-cores for Sierra Forest. Each compute die contains the cores, L2 and L3 cache, and the fabric and caching home agent (CHA). They also house DDR5-6400 memory controllers on each end of the die, with up to 12 channels total (1DPC or 2DPC) of either standard DDR memory or the new MCR memory that provides 30-40% more memory bandwidth than standard DIMMs. Intel will vary the number of memory channels per compute chiplet — here we see three memory controllers on the product with a single compute chiplet, while designs with two or more compute chiplets have two memory controllers. The compute dies share their L3 cache with all other cores in what Intel refers to as a 'logically monolithic mesh,' but they can also be partitioned into sub-NUMA clusters to optimize latency for certain workloads.

Intel will not provide models with both P-cores and E-cores in the same package. Granite Rapids is what we would perceive as a traditional Xeon data center processor — these models come equipped with only P-cores that can deliver the full performance of Intel's fastest architectures. Each P-core comes with 2MB of L2 cache and 4MB of L3. Intel hasn't revealed core counts for Granite Rapids yet, but did reveal that the platform supports from one to eight sockets in a single server.

Meanwhile, Sierra Forest's E-core (Efficiency core) lineup consists of chips with only smaller efficiency cores, much like we see with Intel's Alder and Raptor Lake chips. The E-cores are arranged into four-core clusters that share a 4MB L2 cache slice and 3MB of L3 cache. The E-Core-equipped processors come with up to 144 cores and are optimized for the utmost power efficiency and performance density. That means each E-core compute chiplet wields 48 cores. These chips can drop into single- and dual-socket systems.

Intel Sierra Glen E-Core Microarchitecture

Intel will detail the E-core and P-core architectures in the Hot Chips presentation that begins momentarily. We'll update these sections as we learn more, but here are the slides we downloaded before the presentation.

Intel Redwood Cove P-Core Microarchitecture

The Redwood Cove architecture for the P-cores now supports AMX with FP16 acceleration, a key addition that will boost performance in AI workloads. Intel also doubled the L1 instruction cache capacity to 64 KB to better address code-heavy data center workloads. Redwood Cove also employs software-optimized prefetches and enhanced its branch prediction engine. The microarchitecture also features enhanced floating point performance by moving from 4 and 5 cycle operations to three-cycle floating point, which boosts IPC.

Intel Xeon Roadmap

In a bit of good news for Intel, the company's data center roadmap remains on track. Sierra Forest will arrive to market in the first half of 2024, with Granite Rapids following shortly thereafter.

Swipe to scroll horizontally Row 0 - Cell 0 2023 2024 2025 Intel P-Cores Emerald Rapids - Intel 7 | Sapphire Rapids HBM Granite Rapids - Intel 3 Row 1 - Cell 3 AMD P-Cores 5nm Genoa-X Turin - Zen 5 — Intel E-Cores — 1H - Sierra Forest - Intel 3 Clearwater Forest - Intel 18A AMD E-Cores 1H - Bergamo - 5nm - 128 Cores — —

Here we can see how Intel’s roadmap looks next to AMD’s data center roadmap. The current high-performance battle rages on between AMD’s EPYC Genoa, launched last year, and Intel’s Sapphire Rapids, launched early this year. Intel has its Emerald Rapids refresh generation coming in Q4 of this year, which the company says will come with more cores and faster clock rates, and it has already released its HBM-infused Xeon Max CPUs. AMD recently released its 5nm Genoa-X products. Next year, Intel’s next-gen Granite Rapids will square off with AMD’s Turin.



In the efficiency swim lane, AMD’s Bergamo takes a very similar core-heavy approach as Sierra Forest by leveraging AMD’s dense Zen 4c cores. Bergamo is already on the market, while Intel’s Sierra Forrest won’t arrive until the first half of 2024. AMD's 5th-gen EPYC Turin chips launch before the end of 2024, but the company hasn't outlined its second-gen Zen 4c model. Intel now has its second-gen E-core-powered Clearwater Forest on the roadmap for 2025.

We'll update the article during the presentation. Stay tuned.