AMD Working to Bring CXL Memory Tech to Future Consumer CPUs

Renoir
(Image credit: Fritzchens Fritz)

AMD representatives made an unexpected reveal today on the company's Meet the Experts webinar: AMD is working to bring CXL memory technology to its consumer CPUs over the next three to five years. This would bring persistent memory devices, like SSDs, onto the memory bus to improve performance. Compute eXpress Link (CXL) enables improved performance, lower latency, and memory expansion capabilities by bringing remote memory devices into the same pool with system DRAM. Think of it as enabling you to expand your memory by plugging in an SSD or more memory onto a device that slots into your PCIe or M.2 port.

Unlike Optane, which Intel is killing off due to poor adoption, CXL already enjoys broad industry support through an open protocol and can support many types of memories. In fact, AMD and Intel, among many others, are working together on the new specification.

The Meet the Experts show covered a diverse range of topics, including AMD's AM5 platform, DDR5 memory, and PCIe 5.0 SSDs. The host then opened the floor to questions. In response to the question of why storage devices aren't connected to the memory bus, AMD's Senior Developer Manager Leah Schoeb explained that persistent memories (like SSDs) and memory currently communicate with different protocols, preventing communication. 

"[...]It's not that in the future, we won't be bridging that communication. That's something that we're looking at with technologies such as CXL. So you'll find over the next, you know, three to five years, you'll see it first in the server area, but you'll find moving down into the client [consumer] area, ways that we can make sure that memory and storage can communicate on the same bus through CXL."

The host asked Phison's Senior Manager of Technical Marketing, Chris Ramseyer, if the company had any more to add to the topic.

"Well, to be honest, I'm on calls about this. Some of those are with Leah. I'm not sure how much I can really give out. We haven't announced anything in this area. But I can say that there is progress being made. And, again, this will be another ecosystem-type project, where it's not just going to be Phison and not just AMD putting this together. We're all going to have to work together to do this, and these collaborations have really advanced PCs over the last few years[...]," he commented.

As a reminder, the CXL spec is an open industry standard that provides a cache coherent interconnect between CPUs and accelerators, like GPUs, smart I/O devices, DPUs, and various flavors of DDR4/DDR5 and persistent memories. The interconnect allows the CPU to work on the same memory regions as the connected devices, thus improving performance and power efficiency while reducing software complexity and data movement.

However, the protocol requires dedicated silicon in both the host CPU and the connected device to function, like memory, persistent memory, GPU, or other accelerators. That requires the feature to be baked into the chip, and as with any new technology, CXL will take some time to mature.

The first CXL-capable processors are right around the corner, though: Intel's Sapphire Rapids and AMD's EPYC Genoa will come with early revisions of the specification built around the PCIe 5.0 interface. New revisions of the CXL spec still under development will support PCIe 6.0 and more sophisticated capabilities, like memory sharing and pooling. AMD will reveal its EPYC Genoa server chips in a live stream on November 10, while Sapphire Rapids is expected to arrive early next year, so CXL technology is on the cusp of real-world use.

AMD's disclosure today doesn't give us a specific date or chip generation for CXL support in consumer CPUs, but the three-to-five-year window mentioned suggests we could see it well after PCIe 6.0 devices, which debut in the 2024 timeframe.

Paul Alcorn
Managing Editor: News and Emerging Tech

Paul Alcorn is the Managing Editor: News and Emerging Tech for Tom's Hardware US. He also writes news and reviews on CPUs, storage, and enterprise hardware.

  • -Fran-
    Hm... I don't see how CXL would be really good on consumer systems just yet, but it is definitely a "nice to have" right now. I wonder if by the time Intel and AMD have a platform with solid CXL support, that is when we'll get proper examples on how CXL will benefit us poor plebs.

    Anyone with some good examples? Other than "soopah fast gaems loadz", that is.

    Regards.
    Reply
  • rluker5
    "Unlike Optane, which Intel is killing off due to poor adoption, CXL already enjoys broad industry support through an open protocol. In fact, AMD and Intel, among many others, are working together on the new specification. "
    And we already have both pagefile and more ram than most need (32GB for good DDR5 performance). Windows10 doesn't even fill your extra ram well and some prefer it.

    I'm a fan of Optane so hopefully I'm not too biased to see something here, but what are the benefits of including something with orders of magnitude worse latency than Optane in your ram pool if Optane wasn't worth it? Optane dimms had a read latency as low as 250ns. ns! Compared to 100+ microseconds for nand. Not even in the same prefix.

    I totally wanted my OS suspended in persistent memory so I could forgo the penalties of using a traditional filesystem, and it while would still be nice, having my pc fishing for bits and pieces in Nand is 1/2 the way to using just an HDD. At that point why even bother?
    Reply
  • mamasan2000
    To my knowledge, CXL aims to "solve" the problem with machine learning and other similar tasks where you have huge datasets. You can't fit it all in RAM. Just not possible. But you want the next storage to be as close to RAM as possible to not bottleneck like crazy. Seems like SSDs will be the cache for DRAM. And SSDs are much cheaper and bigger in terms of storage than RAM. So it is about cost as well.
    The other thing that seems to be a trend is, RAM can't keep up with CPUs anymore. Say a CPU has improved throughput by 50% while RAM only increased bandwidth by 20%. And it gets worse year by year. HBM seems to be one solution but an expensive one. Then there is calculations done on the RAM sticks themselves. RAM with compute. Seems to be very simple computational tasks but even a little work done is better than zero, if the CPU is busy.
    This might be of interest, even if it isn't an exact answer: 6eQ7P2TD7isView: https://www.youtube.com/watch?v=6eQ7P2TD7is
    Reply
  • InvalidError
    -Fran- said:
    Hm... I don't see how CXL would be really good on consumer systems just yet, but it is definitely a "nice to have" right now.
    I can imagine at least one obvious use-case: once CPUs have 8+GB of on-package RAM, the DDR interface can be ditched in favor of 32 additional PCIe links for CXL and then you can simply use CXL-to-whatever riser cards for extra memory of whatever type your needs dictate. Need a good amount of cheap memory? Get a DDR4 riser. Need a massive amount of space for a resident-in-memory database? Get a NVDIMM riser. Want something for which additional DIMMs will still be available 4-5 years from now? Get a DDR5 riser. Need to add memory but DDR5 is no longer obtainable for a reasonable price after being supplanted by DDR6? Get a DDR6 riser. SO-DIMMs are significantly cheaper than DIMMs? Get a SO-DIMM riser instead of a standard DIMM one.

    By switching out the DRAM interface for CXL, you could use whatever memory type you want in whatever form factor you want for expansion as long as there is an appropriate riser for it. No need to worry about which CPU is compatible with what memory type.

    mamasan2000 said:
    HBM seems to be one solution but an expensive one.
    The main reason HBM is expensive is low volume. Start using it everywhere and the price should drop close to parity with normal memory since it is fundamentally the same, just with a different layout to accommodate the wide interface and stacking.
    Reply
  • jasonf2
    CXL from what I have read will allow a disaggregation of the entire memory structure from bound devices. So imagine a machine infrastructure that instead of having dedicated memory for individual components all memory for all devices coexists in a memory pool with many to many communications capabilities. The huge implication for this that I could see in the consumer space is the ability to side step many of the physical memory limits presented with dedicated discrete GPU and DIMM modules running independently. Honestly The datacenter is going to get the most gain. Being able to have multiple CPUs share the memory pool by direct many to many communication will reduce TCO while improving performance. But in that same vein it could be at least plausible to have a GPU with no onboard memory (or a very small base amount) that is pulling directly off of the main system pool. Such a setup would have a cost advantage in mid tier setups. While shared memory has been in play since the 90s in the prebuilt scene it really hasn't been that great because of the latency and overhead necessary to communicate between multiple busses. If it is all on the same interconnect layer a huge part of that issue should be mitigated. The many to many ability should also let a whole new set of accelerators be built that can precache slow to fast storage and reduce CPU overhead.
    Reply
  • SSGBryan
    -Fran- said:

    Anyone with some good examples? Other than "soopah fast gaems loadz", that is.

    Regards.

    I do 3d art - anything that can get my render engine more memory is a fine thing, and something that I am very interested in.

    At the hobbyist level, I was running out of ram (128Gb) a decade ago. The ability to add a Tb of ram to my render engine would really be helpful, since we have too many idiot vendors that believe that 8K texture maps are a good thing.
    Reply
  • bit_user
    I'd seen speculation that Apple might do this for their next Mac Pro. They still could beat AMD to the punch, but it's good to see AMD innovating.
    Reply
  • bit_user
    -Fran- said:
    Hm... I don't see how CXL would be really good on consumer systems just yet, but it is definitely a "nice to have" right now. I wonder if by the time Intel and AMD have a platform with solid CXL support, that is when we'll get proper examples on how CXL will benefit us poor plebs.
    Imagine an era where CPUs have in-package DRAM, like Apple's M-series. That lowers latency, potentially increases speed (i.e. if they take advantage of being in-package by adding more memory channels), and could decrease motherboard cost/complexity, due to no DIMM slots or need to route connections to them.

    Great! So, what's the down side? The main one would be capacity. High-end users will bristle at the limits of what can be had in-package, plus there's no ability to upgrade over time (i.e. without replacing your entire CPU + RAM, making it an expensive proposition).

    So, the way out of this conundrum is to add CXL links, which can now even be switched (i.e. enabling them to fan out to a much larger number of devices), and put your memory on the CXL bus. The down side of this is that CXL memory would be much slower to access. Still, waaay faster than SSDs, but slower than directly-connected DIMMs. That can be mitigated by the use of memory tiers, where frequently-used memory pages can be promoted to the "fast" memory, in package. Infrequently-used pages can be evicted and demoted to "slow" memory, out on CXL. This isn't merely a fiction, either. Support for memory tiering and page promotion/demotion already exists in the Linux kernel.

    What this means is that you could scale out your memory to TBs, in a relatively low-cost workstation. You'd just need a motherboard with a CXL switch and enough slots. Or, if you get a cheaper board without a switch, you can still add RAM while enjoying the benefits that in-package memory stacks can provide.
    Reply
  • bit_user
    rluker5 said:
    having my pc fishing for bits and pieces in Nand is 1/2 the way to using just an HDD. At that point why even bother?
    CXL has native support for DRAM. It's like accessing memory on your dGPU, except faster (and cache-coherent).
    Reply
  • bit_user
    mamasan2000 said:
    To my knowledge, CXL aims to "solve" the problem with machine learning and other similar tasks where you have huge datasets.
    Like PCIe, there's not only one use case for it. Another is servers with very large in-memory databases.
    Reply