AMD's EPYC 'Bergamo' and Zen 4c Detailed: Same as Zen 4, But Denser
Zen 4c microarchitecture and Bergamo design under the microscope

The always-increasing performance demands of cloud datacenters requires CPU developers to rethink their designs in a bid to deliver maximum performance per socket while confronting cost constraints set by slowing Moore's Law. AMD's EPYC 'Begamo' is the industry's first x86 cloud-native CPU that is based on the specially-tailored Zen 4c microarchitecture that maintains essentially the same feature set with the Zen 4 microarchitecture while halving core size requirements, reports SemiAnalysis.
AMD's EPYC 'Bergamo' processor packs 128 cores and sits in the same Socket SP5 as the 96-core EPYC 'Genoa' CPU and has a similar 12-channel DDR5-4800 memory subsystem as well as uses the same I/O die (codenamed Floyd), meaning that it also has 128 PCIe Gen5 lanes and other peculiarities of SP5 products. Being a cloud-native system-on-chip (SoC) — and, to some degree, a response to emerging Arm-based datacenter-grade SoCs from Ampere, Amazon, Google, and Microsoft — Bergamo's design was shaped by multiple factors, including efficiency, power usage, die size, and low total cost of ownership (TCO) rather than by the aim to deliver the maximum per-core performance.
Row 0 - Cell 0 | EPYC 9654 | EPYC 9754 | EPYC 9734 |
Design | Genoa | Bergamo | Bergamo |
Microarchitecture | Zen 4/Persephone | Zen 4c/Dionysus | Zen 4c/Dionysus |
Cores/Threads | 96/192 | 128/256 | 112/224 |
L1i Cache | 32KB | 32KB | 32KB |
L1d Cache | 32KB | 32KB | 32KB |
L2 Cache | 1MB | 1MB | 1MB |
Total L2 Cache | 96MB | 128MB | 112MB |
L3 Cache per CCX | 32MB | 16MB | 16MB |
Total L3 Cache | 384MB | 256MB | 256MB |
CCD | Durango | Vindhya | Vindhya |
CCD Count | 12 | 8 | 8 |
CCX per CCD | 1 | 2 | 2 |
Cores per CCD | 8 | 16 | 14 |
I/O Die | Floyd | Floyd | Floyd |
Memory Channels | 12 | 12 | 12 |
Rated Memory Speed | DDR5-4800 | DDR5-4800 | DDR5-4800 |
Memory Bandwidth | 460.8 GB/s | 460.8 GB/s | 460.8 GB/s |
PCIe 5.0 Lanes | 128 | 128 | 128 |
TDP/Max TDP | 360W/400W | 360W/400W | 360W/400W |
Socket | SP5 | SP5 | SP5 |
Scalability | 2P | 2P | 2P |
On the microarchitecture level, Zen 4c retains the same design as Zen 4, including identical features and instructions-per-clock-performance, but they are configured and implemented in a drastically different way, SemiAnalysis claims. When it comes to Zen 4c 'Dionysus' cores, they are about 35.4% smaller compared to Zen 4 'Persephone' cores, according to SemiAnalysis. To achieve this, AMD had to implement a number of design tricks. The analysts believe:
- It reduced boost clock targets from 3.70 GHz to 3.10 GHz. This made timing closure simpler and decreased the need for extra buffer cells to meet relaxed timing constraints. Today's designs are often constrained by routing density and congestion, so lowering frequency allows for tighter packing of signal pathways, enhancing the density of standard cells.
- It lowered the number of physical partitions of a die and packed logic closer together, which made debugging and introducing fixes harder but reduced die size.
- It used denser 6T dual-port SRAM cells for Zen 4c as opposed to 8T dual-port SRAM circuits for Zen 4 to reduce SRAM area. As a result, while Zen 4 and Zen 4c cores have similar L1 and L2 cache sizes, the area used by caches in case of Zen 4c is lower, but these caches also are not as fast as those inside Zen 4.
- Finally, it removed through-silicon vias (TSVs) arrays for 3D V-Cache, to further save silicon.
These were not the only methods of die area reduction used by AMD. According to SemiAnalysis, AMD's Bergamo is based on eight Vindhya core complex dies (CCDs) that pack 16 Zen 4c cores (up from eight Zen 4 cores per CCD) — which is justified as cores got smaller, but which also impacts clock speed potential. Each CCD also features two eight-core core complexes (CCX) and 32MB of L3 cache, or 16MB per CCX. By contrast, each Zen 4 CCX has 32MB of L2, which greatly increases its size compared to Zen 4c CCX.
Overall, we could say that AMD's Zen 4c and Bergamo make design trajectory shift as the company needed to fit 128 Zen 4-class cores into the same 360W – 400W power envelope as Genoa. Reduced frequency targets, usage of denser SRAM cells, and cutting L3 per CCX in half certainly enabled AMD to increase its core count, but how that impacted per-core performance is something that we will still have to find out.
SemiAnalysis says that AMD is preparing to launch two Bergamo processors later this month: the 128-core EPYC 9754 and its slightly cut-down sibling, the 112-core EPYC 9734. Given that operators of exascale datacenters tend to have specific requirements for their deployments, we can only wonder how many custom and semi-custom Bergamo offerings AMD will eventuall produce, but for now two models are set to be introduced already next week.
"You are going to hear about this next week with Bergamo, which is a cloud-native optimized device with high-density and very good performance-per-watt in energy efficiency for cloud-native computing," said Dan McNamara, AMD's server business chief, at the Bank of America 2023 Global Technology Conference (via SeekingAlpha).
Stay On the Cutting Edge: Get the Tom's Hardware Newsletter
Get Tom's Hardware's best news and in-depth reviews, straight to your inbox.
Anton Shilov is a contributing writer at Tom’s Hardware. Over the past couple of decades, he has covered everything from CPUs and GPUs to supercomputers and from modern process technologies and latest fab tools to high-tech industry trends.
-
AMD's EPYC 'Bergamo' processor packs 128 cores and sits in the same Socket SP5 as the 96-core EPYC 'Genoa' CPU
Because it is drop-in compatible with the existing SP5 socket with no software port required since the same Zen 4 ISA is used.
So, AMD also managed to fit twice the number of cores and threads with the same L3 cache within a die size that's under 10% bigger than the Zen 4 CCD (72.7mm2 vs 66.3mm2).
Considering that AMD used a 8 CCD package for its EPYC Bergamo CPUs, it might be possible we get a 12 CCD package in future as well, which could offer up to 192 cores and 384 threads. -
bit_user Could this possibly be the same core they used on Phoenix? Is there any chance that Bergamo uses the same N4 process for its CCDs?Reply
Or, maybe Phoenix didn't go that route due to the impact on single-thread performance. -
TJ Hooker
Yes, Phoenix uses Zen4c, at least according to the semianalysis article referenced in this article.bit_user said:Could this possibly be the same core they used on Phoenix? Is there any chance that Bergamo uses the same N4 process for its CCDs?
Or, maybe Phoenix didn't go that route due to the impact on single-thread performance. -
Here is the exact core breakdown: So just 4 partitions (L2, Front End, Execution, FPU) now with Zen4C.Reply
bit_user said:Could this possibly be the same core they used on Phoenix? Is there any chance that Bergamo uses the same N4 process for its CCDs?
Or, maybe Phoenix didn't go that route due to the impact on single-thread performance.
Nah the original Phoenix doesn't use the exact same core design, but the upcoming Phoenix 2 does, using AMD's hybrid Big.Little-like design.
Codenamed Phoenix 2, this SKU will leverage a 2+4 design with two P-cores and four E-cores. The former refers to the standard Zen 4 cores used on the Ryzen 7000 processors, while the latter is internally known as the “classic dense core”.
But phoenix uses the dense core in the context of "Efficiency” cores, whereas Bergamo is actually using a denser core variant of Zen 4c, slightly different from phoenix 2's E-core if past documents and rumors are to be believed.
Phoenix has a variant that use "big-LITTLE" like core design, which contains a CCX with 2+4 design with standard Zen 4 and a dense-optimized variant of Zen 4 (Zen 4c/Zen 4 Dense).
Zen 4C and Zen 4 could be more alike than Intel’s Performance and Efficiency cores. It seems for the half size Zen4c AMD is sacrificing just the half of L3 and some frequency. Zen 4 core is also about double the physical size of a Zen 4C core, while Intel’s P-core is approximately 3.5 times larger than its E-core.
AMD’s strategy of maintaining two mostly similar cores could prove advantageous.
Size ratio, AMD ~2:1.
Intel ~3.5:1. -
qwertymac93
The codename of the I/O die is "Floyd". A pink I/O die would be "Pink Floyd".bit_user said:Huh?
Har har. -
usertests This is the specific kind of information I wanted to know about Zen 4c density other than L3 reduction, although some of it is still speculation for now. Looks good.Reply -
TJ Hooker
Is it realistic for zen 4c have the same IPC as zen 4, despite having half the L3 per core/per CCX, and apparently slower cache overall (due to condensed SRAM cell structure)?Admin said:On the microarchitecture level, Zen 4c retains the same design as Zen 4, including identical features and instructions-per-clock-performance
Edit: And that assumes that cache structure really is the only thing that's functionally changed. -
bit_user
In the Semi-Analysis table @Metal Messiah. posted, the L2 cells didn't shrink. If it's just L3, and if their latency increased proportional to the longer clock tick, then the software-visible latency would be the same and your only concern would be the smaller capacity.TJ Hooker said:Is it realistic for zen 4c have the same IPC as zen 4, despite having half the L3 per core/per CCX, and apparently slower cache overall (due to condensed SRAM cell structure)?
Just a hypothetical, but reducing software-visible latencies is one of the nice things about lowering clock speed. Energy-efficiency is another.
I'm definitely intrigued by AMD's approach. In future generations, I think they can't help but tweak other aspects of the microarchitecture, since the parameters optimized for high-speed cores will no longer be optimal for lower-speed ones. Not to mention area- and power- efficiency.