Samsung Soups Up 96 AMD MI100 GPUs With Radical Computational Memory

AMD Radeon Instinct MI210 images and slides
(Image credit: AMD)

Samsung has built the world's first large scale computing system using GPUs with built in processing-in-memory (PIM) chips. These memory modules, which were loaded onto 96 AMD Instinct MI100 GPUs, increased AI training performance by 2.5x, according to a report by Business Korea.

PIM is a new generation of computer memory that can speed up computationally complex workflows handled by processors such as CPUs and GPUs. As the name suggests, each memory module is capable of processing data on its own, reducing the amount of data needed to travel between the memory and the processor.

Samsung originally demonstrated the PIM-modified GPUs in October, but only recently combined 96 PIM-modified GPUs in a cluster. Compared to normal video memory, these modified MI100 chips not only performed 2.5x better, they also cut power consumption by 2.67x, drastically increasing the GPUs' efficiency at running AI algorithms.

Samsung has been developing PIM for some time now. The company demoed several implementations in 2021, involving several different memory types including DDR4, LPDDR5X, GDDR6, and HBM2. In LPDDR5 form, Samsung saw a 1.8x increase in performance with a 42.6% reduction in power consumption and a 70% reduction in latency on a test program involving a Meta AI workload. Even more impressive, these results were from a standard server system with no modifications to the motherboard or CPU (all that changed was a swap to PIM-enabled LPDDR5 DIMMs). 

Samsung isn't the only company developing PIM chips — SK hynix released its own PIM modules earlier this year. According to SK hynix's preliminary testing, its GDDR6-AiM (Accelerator in Memory) application sped up AI processing by 16x and reduced power consumption by 80%. That's a lot quicker than Samsung's modified MI100s, but we don't know what SK hynix used for testing so it's not a direct comparison.  

Regardless, PIM looks like a potent solution to speeding up AI-accelerated workflows. "As the head of the AI research center, I want to make Samsung a semiconductor company that uses AI better than any other company," Choi Chang-kyu, vice president and head of the AI Research Center at Samsung Electronics Advanced Institute of Technology, told Business Korea. 

Aaron Klotz
Contributing Writer

Aaron Klotz is a contributing writer for Tom’s Hardware, covering news related to computer hardware such as CPUs, and graphics cards.

  • rluker5
    Nice. I want some.
    Reply
  • Co BIY
    But what Tom's wants to know :

    Will it run Crisis ? ect ...

    Can it be GAMED?
    Reply
  • bit_user
    Co BIY said:
    But what Tom's wants to know :

    Will it run Crisis ? ect ...

    Can it be GAMED?
    No, the MI100 has no texture engines, ROPs, or display controllers, nor does it support graphics APIs like OpenGL or Direct3D.

    The MI100 was AMD's first CDNA product. None of the CDNA products do graphics. They do have video decode acceleration, but that's just for the benefit analyzing video streams with AI.
    Reply
  • bit_user
    This is just a taste. Wait until someone actually designs an entire accelerator around this stuff!

    The reason I say that is the PIM modules duplicate some functionality that's in the core compute die. So, if you removed that redundancy, it would free up some area in the core compute die for more compute that the PIM modules don't accelerate. The end result would be even greater speed up!
    Reply