AMD CEO Lisa Su unveiled the first details about the company's EPYC Milan-X processors, which come with a 3D-stacked L3 cache called 3D V-Cache, during its Accelerated Data Center event today. AMD says that its new cache-stacking technology, which it will add to the existing Zen 3-powered EPYC Milan models to create the new Milan-X chips, will bring up to 768MB of total L3 cache per chip. That means there will soon be dual-socket servers with an eye-popping 1.5 GB of L3 cache in the system. AMD also shared a few examples of workloads that will benefit, and an impressive benchmark result that shows a 60% performance improvement.
The chips will come to market in Q1 2022, but they are available as a preview instance in Azure now. Microsoft has released its own performance projections, too, but we'll cover those in the article below as well.
As a quick refresher, AMD introduced its 3D V-Cache technology at CES 2021, showing a third-gen Ryzen prototype outfitted with an additional chunk of L3 cache. 3D V-Cache uses a novel new hybrid bonding technique that fuses an additional 64MB of 7nm SRAM cache stacked vertically atop the Ryzen compute chiplets to triple the amount of L3 cache per Ryzen chip. AMD claims that brings up to a 15% performance improvement in some games, meaning those chips will vie for the title of Best CPU for gaming when they come to market early next year. We've since learned many more details about those chips, including deep-dive info on the packaging tech at a Hot Chips presentation earlier this year.
AMD EPYC Milan-X Specifications
Now AMD is bringing this same tech to its long-rumored Milan-X data center processors, but it hasn't yet shared detailed specifications of the new chips. However, it has confirmed via its briefings and endnotes that the chips will come in at least 16-, 32- and 64-core variants, lining up with an earlier leaked list of the product stack. In fact, we've even seen them listed for sale at a B2B retailer. Here are the purported specs:
|Processor||Cores/Threads||Base Clock||Boost Clock||TDP||L3 Cache (L3 + 3D V-Cache)|
|Epyc 7773X||64/128||2.2 GHz||3.5 GHz||280 W||768 MB|
|Epyc 7573X||32/64||2.8 GHz||3.6 GHz||280 W||768 MB|
|Epyc 7473X||24/48||2.8 GHz||3.7 GHz||240 W||768 MB|
|Epyc 7373X||16/32||3.05 GHz||3.8 GHz||240 W||768 MB|
Like with the consumer variants, AMD stacks a single 6x6mm layer of L3 cache directly over the L3 cache already present on each CCD (compute chiplet).
Each CCD has 32MB of L3 cache before the modification. Adding the vertically-stacked L3 cache slice adds another 64 MB of cache, bringing the total to 96MB per CCD. The Milan-X chips will stretch up to 64-core models with eight CCDs, which brings the total to 768MB of L3 cache per chip. AMD has confirmed that its chips support higher stacks of L3, and HardwareLuxx has even found server BIOS settings that enable up to four cache stacks per chip with existing AMD EPYC Milan servers.
The stacked L3 cache adds a roughly ~10% overhead to overall latency, which is comparable to the standard latency impact from simply adding capacity with standard on-die techniques. That's partly because the additional L3 cache slice is somewhat 'dumb' — all the control circuitry resides on the existing CCD, which helps reduce the latency overhead. In addition, because the larger cache reduces trips to main memory due to higher L3 cache hit rates, the additional capacity relieves bandwidth pressure on main memory, thus reducing latency and thereby improving application performance from multiple axes.
AMD uses the same Zen 3 cores as normal; the control circuitry for 3D V-Cache was added as a forward-looking design choice during the initial design phases. AMD uses the existing EPYC Milan chips as the building block, so the chips will drop into the SP3 sockets in EPYC servers (a BIOS update is required). That reduces qualification time and speeds time to market.
AMD reiterated many of the benefits of the solder-less hybrid bonding technique that enables 3D V-Cache, like a 200X interconnect density increase over 2D chiplets and a 15X density increase and 3X energy efficiency gain over micro-bump 3D packaging. AMD says hybrid bonding also improves thermals, transistor density, and interconnect pitch over other 3D approaches, making it the most flexible active-on-active silicon stacking tech.
Additionally, AMD says no software modifications are required to leverage the increased cache capacity, though it is working with several partners to create certified software packages. Those packages might see further performance optimizations, too.
AMD says Milan-X will provide up to a 50% uplift in certain 'targeted workloads' that largely consist of various product development softwares. That includes computational fluid dynamics (CFD), finite element analysis (FEA), structural analysis, and electronic design automation (EDA), with the latter involving chip design.
AMD touted the performance of the existing AMD EPYC Milan models in three workloads, showing two EPYC 75F3 beating two Intel Xeon 8362 in three of those workloads — but those benchmarks don't include Milan-X.
AMD avoided a head-to-head comparison to Intel's chips with its Milan-X, instead showing a 66% performance uplift with a 16-core Milan-X over its standard 16-core EPYC chip in a chip design (EDA) RTL verification workload with Synopsys VCS. We included the test endnotes at the bottom of the article.
AMD says that Milan-X will benefit a broader selection of workloads, too, which you can find in the album above. The company also listed several ISVs that are already working on certified software packages, like Altair, Cadence, Synopsys, and others. Those certified solutions will be ready at launch.
AMD hasn't yet released official specs or pricing, but we'll update as that information becomes available. The chips come to market in Q2 2022.
Update: Azure HBv3 VMs with Milan-X CPUs
Microsoft has issued documentation for the Milan-X HBv3 VMs, and we have an in-depth analysis of the VMs and benchmarks in this separate article. Here is the short version with the following performance projections and VM size details and technical overview:
- Up to 80% higher performance for CFD workloads
- Up to 60% higher performance for EDA RTL simulation workloads
- Up to 50% higher performance for explicit finite element analysis workloads
- Up to 120 AMD EPYC 7V73X CPU cores (EPYC with 3D V-cache, “Milan-X”)
- Up to 96 MB L3 cache per core (3x larger than standard Milan CPUs, and 6x larger than “Rome” CPUs)
- 350 GB/s DRAM bandwidth (STREAM TRIAD), up to 1.8x amplification (~630 GB/s effective bandwidth)
- 448 GB RAM
- 200 Gbps HDR InfiniBand (SRIOV), Mellanox ConnectX-6 NIC with Adaptive Routing
- 2 x 900 GB NVMe SSD (3.5 GB/s (reads) and 1.5 GB/s (writes) per SSD, large block IO)
Be sure to hit Microsoft's Azure post for even more EPYC Milan-X benchmarks (opens in new tab).
- MORE: Best CPUs for Gaming
- MORE: CPU Benchmark Hierarchy
- MORE: AMD vs Intel
- MORE: All CPUs Content
Epyc 7373X16/323.05 GHz3.8 GHz240 W768 MB
That looks TOO good to be true... 768MB of* cache for a single CCD config? Is that correct? If so, that's basically ThreadRipper 3D when it comes out :O
Such a weird setup that would be, haha.
AMD is again striking down the memory access speed bottleneck that haunts modern extra-high-core-count CPUs. First strike was on the memory controllers when the first multicore CPUs bottlenecked on them.
This would require saned size TLBs and more page table accessors to keep up with the increased load though (naturally, when you access more memory in the same timespan, you need larger TLBs to avoid bottlenecking at page translation phase), I wonder if and how they'll manage to fit this all in already tight thermal budget.
Also, on a side note, growing up caches this high means there are no good expectations for DDR5 evolution.
For practical purpose, having that much of cache means virtualized OS kernels most juicy bits will almost always reside in caches, that can't be a bad thing.
The reason all the processors listed have 768 MB of cache is that they all have 8 active CCDs (with 96MB of L3 per CCD, 32MB on the CCD itself + 64MB of 3D V-cache). They just have varying numbers of cores enabled per CCD.
Interesting then! So there's a good chance the plebeian desktop parts with 3D cache will be 192MB total L3 (I'm pretty sure it's L4 >_>) for all SKUs if we're lucky XD
Then again, they could segment the market even more like that... A 6c/12t with 96MB and another 6c/12t with 192MB; that would be hella interesting.
There is no information from this article on desktop ryzen so why both asking for it?
Its clearly talking about Enterprise products.