World's first hybrid CXL device combines flash memory and DRAM — storage tiering comes to remote memory over PCIe

Samsung CMM-H CXL Expansion Card
(Image credit: Samsung)

Samsung has unveiled a new Compute Express Link (CXL) Add-in Card called the CXL Memory Module-Hybrid for Tiered Memory (CMM-H TM), which adds additional RAM and flash memory that can be remotely accessed by CPUs and accelerators. The expansion card comes with a mixture of high-speed DRAM and NAND flash and is intended to provide a cost-effective way to boost memory capacity for servers without using locally installed DDR5 memory, which often isn't an option in oversubscribed servers. 

Samsung's solution runs on the Compute Express Link (CXL), an open industry standard that provides a cache-coherent interconnect between CPUs and accelerators, thus allowing CPUs to use the same memory regions as connected devices utilizing CXL. The remote memory, or in this case, a hybrid RAM/flash memory device, is accessible over the PCIe bus, which comes at the cost of ~170-250ns of latency, or roughly the cost of a NUMA hop. CXL was introduced in 2019 and is in its third revision, featuring PCIe 6.0 support.

The CXL spec supports three types of devices: Type 1 devices are accelerators that lack local memory, Type 2 devices are accelerators with their own memory (like GPUs, FPGAs, and ASICs with DDR or HBM), and Type 3 devices consist of memory devices. The Samsung device falls into the Type 3 category. 

CMM-H TM is an offshoot of Samsung's CMM-H CXL memory solution. Samsung says it is the world's first FPGA-based tiered CXL memory solution and is designed to "tackle memory management challenges, reduce downtime, optimize scheduling for tiered memory, and maximize performance, all while significantly reducing the total cost of ownership."

This new CMM-H isn't as fast as DRAM; however, it adds a beefy slab of capacity via the flash but hides a lot of latency with a clever memory caching feature built into the expansion card. Hot data is moved to the card's DRAM chips to improve speed, while less used data is stored in NAND storage. Samsung says this behavior happens automatically, but some applications and workloads can give the device hints to improve performance through an API. Naturally, this will add some latency for cached data, which isn't ideal for all use cases, particularly those that rely on tight 99th percentile performance. 

Samsung's new expansion card will provide its customers with new ways to expand their server's memory capacity. This new design paradigm is becoming more important as more advanced large language models continue to demand more memory from their host machines and accelerators. 

Aaron Klotz
Freelance News Writer

Aaron Klotz is a freelance writer for Tom’s Hardware US, covering news topics related to computer hardware such as CPUs, and graphics cards.

  • chaz_music
    Sounds just like the memory add-in cards from the IBM XT compatible days (mid 1980's). I believe you had to load a driver to allow DOS to access the RAM as paged memory due to addressing issues. Yep, even then, engineers were not thinking ahead.
  • jkhoward
    To bad Intel cancelled Optane, could have been an excellent technology for AI workloads.
  • bit_user
    I wonder if it has a mode (+ capacitors or battery) that dumps the DRAM contents to NAND, upon power loss. If you limit the usable capacity just to what fits in DRAM, that would make it behave just like Optane, @jkhoward .

    Except, DRAM is an order of magnitude faster than Optane. Given how expensive Optane was (nearing DRAM prices), it would struggle to compete with such a device.
  • Amdlova
    Optane is cheap intel are greedy :)