The always-increasing performance demands of cloud datacenters requires CPU developers to rethink their designs in a bid to deliver maximum performance per socket while confronting cost constraints set by slowing Moore's Law. AMD's EPYC 'Begamo' is the industry's first x86 cloud-native CPU that is based on the specially-tailored Zen 4c microarchitecture that maintains essentially the same feature set with the Zen 4 microarchitecture while halving core size requirements, reports SemiAnalysis.
AMD's EPYC 'Bergamo' processor packs 128 cores and sits in the same Socket SP5 as the 96-core EPYC 'Genoa' CPU and has a similar 12-channel DDR5-4800 memory subsystem as well as uses the same I/O die (codenamed Floyd), meaning that it also has 128 PCIe Gen5 lanes and other peculiarities of SP5 products. Being a cloud-native system-on-chip (SoC) — and, to some degree, a response to emerging Arm-based datacenter-grade SoCs from Ampere, Amazon, Google, and Microsoft — Bergamo's design was shaped by multiple factors, including efficiency, power usage, die size, and low total cost of ownership (TCO) rather than by the aim to deliver the maximum per-core performance.
|Row 0 - Cell 0||EPYC 9654||EPYC 9754||EPYC 9734|
|Microarchitecture||Zen 4/Persephone||Zen 4c/Dionysus||Zen 4c/Dionysus|
|Total L2 Cache||96MB||128MB||112MB|
|L3 Cache per CCX||32MB||16MB||16MB|
|Total L3 Cache||384MB||256MB||256MB|
|CCX per CCD||1||2||2|
|Cores per CCD||8||16||14|
|Rated Memory Speed||DDR5-4800||DDR5-4800||DDR5-4800|
|Memory Bandwidth||460.8 GB/s||460.8 GB/s||460.8 GB/s|
|PCIe 5.0 Lanes||128||128||128|
On the microarchitecture level, Zen 4c retains the same design as Zen 4, including identical features and instructions-per-clock-performance, but they are configured and implemented in a drastically different way, SemiAnalysis claims. When it comes to Zen 4c 'Dionysus' cores, they are about 35.4% smaller compared to Zen 4 'Persephone' cores, according to SemiAnalysis. To achieve this, AMD had to implement a number of design tricks. The analysts believe:
- It reduced boost clock targets from 3.70 GHz to 3.10 GHz. This made timing closure simpler and decreased the need for extra buffer cells to meet relaxed timing constraints. Today's designs are often constrained by routing density and congestion, so lowering frequency allows for tighter packing of signal pathways, enhancing the density of standard cells.
- It lowered the number of physical partitions of a die and packed logic closer together, which made debugging and introducing fixes harder but reduced die size.
- It used denser 6T dual-port SRAM cells for Zen 4c as opposed to 8T dual-port SRAM circuits for Zen 4 to reduce SRAM area. As a result, while Zen 4 and Zen 4c cores have similar L1 and L2 cache sizes, the area used by caches in case of Zen 4c is lower, but these caches also are not as fast as those inside Zen 4.
- Finally, it removed through-silicon vias (TSVs) arrays for 3D V-Cache, to further save silicon.
These were not the only methods of die area reduction used by AMD. According to SemiAnalysis, AMD's Bergamo is based on eight Vindhya core complex dies (CCDs) that pack 16 Zen 4c cores (up from eight Zen 4 cores per CCD) — which is justified as cores got smaller, but which also impacts clock speed potential. Each CCD also features two eight-core core complexes (CCX) and 32MB of L3 cache, or 16MB per CCX. By contrast, each Zen 4 CCX has 32MB of L2, which greatly increases its size compared to Zen 4c CCX.
Overall, we could say that AMD's Zen 4c and Bergamo make design trajectory shift as the company needed to fit 128 Zen 4-class cores into the same 360W – 400W power envelope as Genoa. Reduced frequency targets, usage of denser SRAM cells, and cutting L3 per CCX in half certainly enabled AMD to increase its core count, but how that impacted per-core performance is something that we will still have to find out.
SemiAnalysis says that AMD is preparing to launch two Bergamo processors later this month: the 128-core EPYC 9754 and its slightly cut-down sibling, the 112-core EPYC 9734. Given that operators of exascale datacenters tend to have specific requirements for their deployments, we can only wonder how many custom and semi-custom Bergamo offerings AMD will eventuall produce, but for now two models are set to be introduced already next week.
"You are going to hear about this next week with Bergamo, which is a cloud-native optimized device with high-density and very good performance-per-watt in energy efficiency for cloud-native computing," said Dan McNamara, AMD's server business chief, at the Bank of America 2023 Global Technology Conference (via SeekingAlpha).