AMD's EPYC 'Bergamo' and Zen 4c Detailed: Same as Zen 4, But Denser

AMD
(Image credit: AMD)

The always-increasing performance demands of cloud datacenters requires CPU developers to rethink their designs in a bid to deliver maximum performance per socket while confronting cost constraints set by slowing Moore's Law. AMD's EPYC 'Begamo' is the industry's first x86 cloud-native CPU that is based on the specially-tailored Zen 4c microarchitecture that maintains essentially the same feature set with the Zen 4 microarchitecture while halving core size requirements, reports SemiAnalysis

AMD's EPYC 'Bergamo' processor packs 128 cores and sits in the same Socket SP5 as the 96-core EPYC 'Genoa' CPU and has a similar 12-channel DDR5-4800 memory subsystem as well as uses the same I/O die (codenamed Floyd), meaning that it also has 128 PCIe Gen5 lanes and other peculiarities of SP5 products. Being a cloud-native system-on-chip (SoC) — and, to some degree, a response to emerging Arm-based datacenter-grade SoCs from Ampere, Amazon, Google, and Microsoft — Bergamo's design was shaped by multiple factors, including efficiency, power usage, die size, and low total cost of ownership (TCO) rather than by the aim to deliver the maximum per-core performance. 

Swipe to scroll horizontally
Row 0 - Cell 0 EPYC 9654EPYC 9754EPYC 9734
DesignGenoaBergamoBergamo
MicroarchitectureZen 4/PersephoneZen 4c/DionysusZen 4c/Dionysus
Cores/Threads96/192128/256112/224
L1i Cache32KB32KB32KB
L1d Cache32KB32KB32KB
L2 Cache1MB1MB1MB
Total L2 Cache96MB128MB112MB
L3 Cache per CCX32MB16MB16MB
Total L3 Cache384MB256MB256MB
CCDDurangoVindhyaVindhya
CCD Count1288
CCX per CCD122
Cores per CCD81614
I/O DieFloydFloydFloyd
Memory Channels121212
Rated Memory SpeedDDR5-4800DDR5-4800DDR5-4800
Memory Bandwidth460.8 GB/s460.8 GB/s460.8 GB/s
PCIe 5.0 Lanes128128128
TDP/Max TDP360W/400W360W/400W360W/400W
SocketSP5SP5SP5
Scalability2P2P2P

On the microarchitecture level, Zen 4c retains the same design as Zen 4, including identical features and instructions-per-clock-performance, but they are configured and implemented in a drastically different way, SemiAnalysis claims.  When it comes to Zen 4c 'Dionysus' cores, they are about 35.4% smaller compared to Zen 4 'Persephone' cores, according to SemiAnalysis. To achieve this, AMD had to implement a number of design tricks. The analysts believe: 

  • It reduced boost clock targets from 3.70 GHz to 3.10 GHz. This made timing closure simpler and decreased the need for extra buffer cells to meet relaxed timing constraints. Today's designs are often constrained by routing density and congestion, so lowering  frequency allows for tighter packing of signal pathways, enhancing the density of standard cells.
  • It lowered the number of physical partitions of a die and packed logic closer together, which made debugging and introducing fixes harder but reduced die size.
  • It used denser 6T dual-port SRAM cells for Zen 4c as opposed to 8T dual-port SRAM circuits for Zen 4 to reduce SRAM area. As a result, while Zen 4 and Zen 4c cores have similar L1 and L2 cache sizes, the area used by caches in case of Zen 4c is lower, but these caches also are not as fast as those inside Zen 4. 
  • Finally, it removed through-silicon vias (TSVs) arrays for 3D V-Cache, to further save silicon. 

These were not the only methods of die area reduction used by AMD. According to SemiAnalysis, AMD's Bergamo is based on eight Vindhya core complex dies (CCDs) that pack 16 Zen 4c cores (up from eight Zen 4 cores per CCD) — which is justified as cores got smaller, but which also impacts clock speed potential. Each CCD also features two eight-core core complexes (CCX) and 32MB of L3 cache, or 16MB per CCX. By contrast, each Zen 4 CCX has 32MB of L2, which greatly increases its size compared to Zen 4c CCX. 

Overall, we could say that AMD's Zen 4c and Bergamo make design trajectory shift as the company needed to fit 128 Zen 4-class cores into the same 360W – 400W power envelope as Genoa. Reduced frequency targets, usage of denser SRAM cells, and cutting L3 per CCX in half certainly enabled AMD to increase its core count, but how that impacted per-core performance is something that we will still have to find out.

SemiAnalysis says that AMD is preparing to launch two Bergamo processors later this month: the 128-core EPYC 9754 and its slightly cut-down sibling, the 112-core EPYC 9734. Given that operators of exascale datacenters tend to have specific requirements for their deployments, we can only wonder how many custom and semi-custom Bergamo offerings AMD will eventuall produce, but for now two models are set to be introduced already next week.

"You are going to hear about this next week with Bergamo, which is a cloud-native optimized device with high-density and very good performance-per-watt in energy efficiency for cloud-native computing," said Dan McNamara, AMD's server business chief, at the Bank of America 2023 Global Technology Conference (via SeekingAlpha).

Anton Shilov
Contributing Writer

Anton Shilov is a contributing writer at Tom’s Hardware. Over the past couple of decades, he has covered everything from CPUs and GPUs to supercomputers and from modern process technologies and latest fab tools to high-tech industry trends.

  • Metal Messiah.
    AMD's EPYC 'Bergamo' processor packs 128 cores and sits in the same Socket SP5 as the 96-core EPYC 'Genoa' CPU

    Because it is drop-in compatible with the existing SP5 socket with no software port required since the same Zen 4 ISA is used.

    So, AMD also managed to fit twice the number of cores and threads with the same L3 cache within a die size that's under 10% bigger than the Zen 4 CCD (72.7mm2 vs 66.3mm2).

    Considering that AMD used a 8 CCD package for its EPYC Bergamo CPUs, it might be possible we get a 12 CCD package in future as well, which could offer up to 192 cores and 384 threads.
    Reply
  • bit_user
    Could this possibly be the same core they used on Phoenix? Is there any chance that Bergamo uses the same N4 process for its CCDs?

    Or, maybe Phoenix didn't go that route due to the impact on single-thread performance.
    Reply
  • -Fran-
    The real question here is if that I/O die* is pink.

    Regards.
    Reply
  • bit_user
    -Fran- said:
    The real question here is if that I/O die* is pink.
    Huh?
    Reply
  • TJ Hooker
    bit_user said:
    Could this possibly be the same core they used on Phoenix? Is there any chance that Bergamo uses the same N4 process for its CCDs?

    Or, maybe Phoenix didn't go that route due to the impact on single-thread performance.
    Yes, Phoenix uses Zen4c, at least according to the semianalysis article referenced in this article.
    Reply
  • Metal Messiah.
    Here is the exact core breakdown: So just 4 partitions (L2, Front End, Execution, FPU) now with Zen4C.


    bit_user said:
    Could this possibly be the same core they used on Phoenix? Is there any chance that Bergamo uses the same N4 process for its CCDs?

    Or, maybe Phoenix didn't go that route due to the impact on single-thread performance.

    Nah the original Phoenix doesn't use the exact same core design, but the upcoming Phoenix 2 does, using AMD's hybrid Big.Little-like design.

    Codenamed Phoenix 2, this SKU will leverage a 2+4 design with two P-cores and four E-cores. The former refers to the standard Zen 4 cores used on the Ryzen 7000 processors, while the latter is internally known as the “classic dense core”.

    But phoenix uses the dense core in the context of "Efficiency” cores, whereas Bergamo is actually using a denser core variant of Zen 4c, slightly different from phoenix 2's E-core if past documents and rumors are to be believed.

    Phoenix has a variant that use "big-LITTLE" like core design, which contains a CCX with 2+4 design with standard Zen 4 and a dense-optimized variant of Zen 4 (Zen 4c/Zen 4 Dense).

    Zen 4C and Zen 4 could be more alike than Intel’s Performance and Efficiency cores. It seems for the half size Zen4c AMD is sacrificing just the half of L3 and some frequency. Zen 4 core is also about double the physical size of a Zen 4C core, while Intel’s P-core is approximately 3.5 times larger than its E-core.

    AMD’s strategy of maintaining two mostly similar cores could prove advantageous.

    Size ratio, AMD ~2:1.

    Intel ~3.5:1.
    Reply
  • qwertymac93
    bit_user said:
    Huh?
    The codename of the I/O die is "Floyd". A pink I/O die would be "Pink Floyd".
    Har har.
    Reply
  • usertests
    This is the specific kind of information I wanted to know about Zen 4c density other than L3 reduction, although some of it is still speculation for now. Looks good.
    Reply
  • TJ Hooker
    Admin said:
    On the microarchitecture level, Zen 4c retains the same design as Zen 4, including identical features and instructions-per-clock-performance
    Is it realistic for zen 4c have the same IPC as zen 4, despite having half the L3 per core/per CCX, and apparently slower cache overall (due to condensed SRAM cell structure)?

    Edit: And that assumes that cache structure really is the only thing that's functionally changed.
    Reply
  • bit_user
    TJ Hooker said:
    Is it realistic for zen 4c have the same IPC as zen 4, despite having half the L3 per core/per CCX, and apparently slower cache overall (due to condensed SRAM cell structure)?
    In the Semi-Analysis table @Metal Messiah. posted, the L2 cells didn't shrink. If it's just L3, and if their latency increased proportional to the longer clock tick, then the software-visible latency would be the same and your only concern would be the smaller capacity.

    Just a hypothetical, but reducing software-visible latencies is one of the nice things about lowering clock speed. Energy-efficiency is another.

    I'm definitely intrigued by AMD's approach. In future generations, I think they can't help but tweak other aspects of the microarchitecture, since the parameters optimized for high-speed cores will no longer be optimal for lower-speed ones. Not to mention area- and power- efficiency.
    Reply