Cascade Lake and Friends
Intel announced its Cascade Lake line of Xeon Scalable data center processors at its Data-Centric Innovation Day here in San Francisco. The second-generation lineup of Xeon Scalable processors comes in 53 flavors that span up to 56 cores and 12 memory channels per chip, but as a reminder that Intel company is briskly expanding beyond “just” processors, the company also announced the final arrival of its Optane DC Persistent Memory DIMMs along with a range of new data center SSDs, Ethernet controllers, 10nm Agilex FPGAs, and Xeon D processors.
This broad spectrum of products leverages Intel’s overwhelming presence in the data center (it currently occupies ~95% of the worlds server sockets), as a springboard to chew into other markets, including its new assault on the memory space with the Optane DC Persistent Memory DIMMs. The long-awaited DIMMs open a new market for Intel and have the potential to disrupt the entire memory hierarchy, but also serve as a potentially key component that can help the company fend off AMD’s coming 7nm EPYC Rome processors.
Intel designed the new suite of products to address data storage, movement, and processing from the edge to the data center, hence its new Move, Store, Process mantra that encapsulates its end-to-end strategy. We're working on our full review of the Xeon Scalable processors, but in the meantime, let's take a closer look at a few of Intel's announcements.
56 Cores, 112 Threads, and a Whole Lotta TDP
AMD has already made some headway with its existing EPYC data center processors, but the company’s forthcoming 7m Rome processors pose an even bigger threat with up to 64 cores and 128 threads packed into a single chip, wielding a massive 128 cores and 256 threads in a single dual-socket server. The increased performance, and reduced power consumption, purportedly outweighs Intel’s existing lineup of Xeon processors, so Intel turned to a new line of Cascade Lake-AP processors to shore up its defenses in the high core-count space. These new processors slot in as a new upper tier of Intel's Xeon Platinum lineup.
These new 9000-series chips come packing up to 56 cores and 112 threads in a dual-die MCM (Multi-Chip Module) design, meaning that two die come together to form a single chip. Intel claims the processors offer the highest-performance available for HPC, AI, and IAAS workloads. The processors also offer the most memory channels, and thus access to the highest memory bandwidth, of any data center processor. Performance density, high memory capacity, and blistering memory throughput are the goal here, which plays well to the HPC crowd. This approach signifies Intel's embrace of a multi-chip design, much like AMD's EPYC processors, for its highest core-count models.
|Row 0 - Cell 0||Cores / Threads||Base / Boost Freq. (GHz)||L3 Cache||TDP|
|Xeon Platinum 9282||56 / 112||2.6 / 3.8||77 MB||400W|
|Xeon Platinum 9242||48 / 96||2.3 / 3.8||71.5 MB||350W|
|Xeon Platinum 9222||32 / 64||2.3 / 3.(7||71.5 MB||250W|
|Xeon Platinum 9221||32 / 64||2.1 / 3.7||71.5 MB||250W|
The 9200-series comes in three flavors with 56-, 48-, and 32-core models on offer. Clock speeds top out with the 56-core Xeon Platinum 9282 model with a 3.8 GHz boost, and base speeds weigh in at an impressive 2.6 GHz. The flagship Xeon Platinum 9282 also comes equipped with 77MB of L3 cache.
Each processor has two internal die that consists of modified 28-core XCC (extreme core count) die, and each die wields a six-channel memory controller. Together, this gives the processor access to 12 channels of DDR4-2933 memory, providing up to 24 memory channels and 3TB of DDR4 memory in a two-socket server. That facilitates up to 407 GB/s of memory throughput for a two-socket server equipped with the 56-core models.
Intel still uses garden-variety thermal interface grease between the die and heatspreader, but the 9282 weighs in with a monstrous 400W TDP, while the 48-core models have a 350W TDP and the 32-core models slot in with a 250W rating. Intel says the 400W models require water cooling, while the 350W and 250W models can use traditional air cooling. Unlike the remainder of the Cascade Lake processors, these chips are not compatible with previous-generation sockets. Instead of being socketed processors, the 9200-series processors come in a BGA (Ball Grid Array) package that is soldered directly to the host motherboard via a 5903-ball interface.
The 9200-series chips also expose up to 40 PCIe 3.0 lanes per chip, for a total of 80 lanes in a dual socket server. Each die has 64 PCIe lanes at its disposal, but Intel carves off some of the lanes for UPI (Ultra-Path Interconnect) connection that tie together the two die inside the processor, while others are dedicated to communication between two chips in a two socket server. Overall, that provides four UPI pathways per socket with a total of 10.4 GT/s of throughput.
A dual-socket server presents itself as a quad-socket server to the host, meaning the four NUMA nodes appear as four distinct CPUs, but the dual-die topology poses latency challenges for access to 'far' memory banks. Intel says it has largely mitigated the problem with a single-hop routing scheme that provides 79ns of latency for near memory and 130ns for far memory access.
Intel has provided its partners with a reference platform design that crams up to four nodes, each containing two of the 9200-series processors, into a single 2U rack enclosure. Intel hasn't announced pricing for the chips, largely because they will only be available inside OEM systems, but says they are shipping to customers now.
Let's shift gears to the standard socketed Xeon processors.