Ampere sneaks out a 192-core CPU with 12-channel DDR5 memory

Ampere
(Image credit: Ampere)

Ampere Computing on Tuesday quietly added several new processors to its AmpereOne family without a formal announcement or press briefings. The quiet release comes after the company was bought by Softbank. The new AmpereOne M CPUs feature a 12-channel DDR5 memory subsystem and are aimed at applications that require more memory capacity as well as bandwidth. The new CPUs feature from 96 to 192 cores and require new motherboards. 

The AmpereOne M CPU family uses a 7228-pin FCLGA socket and includes six processors with 96, 144, 160, and 192 single-threaded custom Armv8.6+ cores operating at up to 3.60 GHz and equipped with a 2MB L2 cache.

The processors also feature 64 MB of system level cache. The key feature of the new CPUs compared to their predecessors is their 12-channel memory subsystem, which supports a maximum of one DIMM per channel and up to 3TB of addressable DDR5-5600 capacity. The memory subsystem is ECC-protected using SECDED and Symbol ECC to make it suitable for cloud datacenter workloads. 

Swipe to scroll horizontally
CPU ModelCoresFrequency (GHz)Power (W)
AmpereOne A192-32M1923.2348
AmpereOne A192-26M1922.6278
AmpereOne A160-28M1602.8262
AmpereOne A144-33M1443.3334
AmpereOne A144-26M1442.6239
AmpereOne A96-36M963.6331

When it comes to power consumption, AmpereOne M processors consume up to 348W, and to keep their power consumption in check, these CPUs support a combination of dynamic voltage and frequency scaling, adaptive voltage control, and fine-grained thermal sensors.

On the I/O front, the processor supports 96 PCIe 5.0 lanes with bifurcation capabilities down to x4 and has 24 device controllers to connect multiple accelerators, SSDs, network cards, and other high-performance components needed in AI and cloud deployments.

Ampere's AmpereOne M processors are still made on TSMC's N5 process technology, just like their predecessors, so the additional memory channels are indeed the key feature of the new CPUs. These processors can process up to 192 threads per socket, which is lower compared to AMD's 192-core EPYC 9965 CPUs, which support simultaneous multi-threading and therefore can process up to 384 threads simultaneously.

But perhaps, the purpose of AmpereOne M is not only to offer support for 3TB of memory for interested parties in the AI space, but rather to set the stage for the company's next-generation AmperOne MX processors that will feature 256 cores and 12 DDR5 memory channels. This upcoming CPU will be made on TSMC's N3 manufacturing process and therefore will add both features and performance efficiency to its list of advantages.

TOPICS
Anton Shilov
Contributing Writer

Anton Shilov is a contributing writer at Tom’s Hardware. Over the past couple of decades, he has covered everything from CPUs and GPUs to supercomputers and from modern process technologies and latest fab tools to high-tech industry trends.

  • derekullo
    828 gigabytes a second of memory bandwidth with DDR5-5600!!!
    Reply
  • Firestone
    Very exciting times for ARM. Give it another decade and maybe I won't have any more x86 software I'm dependent on keeping me from jumping ship. For those who can already do full arm workloads this sounds great.
    Reply
  • bit_user
    derekullo said:
    828 gigabytes a second of memory bandwidth with DDR5-5600!!!
    No, it's 537.6 GB/s (nominal), by my math.

    For server memory bandwidth, Intel's Granite Rapids takes the cake. Its MRDDR-8800 yields a nominal bandwidth of 844.8 GB/s. However, x86 uses memory bandwidth less efficiency, except in rare cases. So, for typical apps, this will behave more like 633.6 GB/s, or so.

    This move basically just catches AmpreOne up to the DRAM width and capacity Intel's Granite Rapids and AMD's Genoa and Turin. I think they both, at least theoretically, support 2 DPC. Probably not if each DIMM is quad-ranked, though. Anyway, Intel claims Granite Rapids can match AmpereOne M on memory capacity. I'm not sure about Turin, but I'd guess it does too.

    BTW, when putting so much memory in a server, the DIMMs burn so much power that the CPU cores might as well go ahead and run a little faster. In such a configuration, I think Ampere's emphasis on power-efficiency isn't quite such an advantage as they claim it to be.
    Reply
  • derekullo
    bit_user said:
    No, it's 537.6 GB/s (nominal), by my math.

    For server memory bandwidth, Intel's Granit Rapids takes the cake. Its MRDDR-8800 yields a nominal bandwidth of 844.8 GB/s. However, x86 uses memory bandwidth less efficiency, except in rare cases. So, for typical apps, this will behave more like 633.6 GB/s, or so.
    https://www.crucial.com/articles/about-memory/everything-about-ddr5-ramMath isn't 69x12 ?
    Reply
  • bit_user
    Firestone said:
    Give it another decade and maybe I won't have any more x86 software I'm dependent on keeping me from jumping ship. For those who can already do full arm workloads this sounds great.
    If you're more concerned about reducing operating costs than improving runtime, you could take a look at emulation. It might only run 70% as fast, but it would let you use much cheaper ARM instances and then you could probably more than make up for the performance loss by running a lot more of them.
    Reply
  • twin_savage
    derekullo said:
    828 gigabytes a second of memory bandwidth with DDR5-5600!!!
    I'd expect about 400GB/s peak memory bandwidth on these 12 channel 5600MT/s AmpereOne platforms. The custom core they use has fairly bad memory concurrency compared to Apple's implementation of ARM which drag down real world performance.
    Reply
  • bit_user
    twin_savage said:
    I'd expect about 400GB/s peak memory bandwidth on these 12 channel 5600MT/s AmpereOne platforms. The custom core they use has fairly bad memory concurrency compared to Apple's implementation of ARM which drag down real world performance.
    Well, you won't get 400 GB/s from a single core. But, they have 192 cores with which to saturate the memory subsystem.
    Reply
  • twin_savage
    bit_user said:
    Well, you won't get 400 GB/s from a single core. But, they have 192 cores with which to saturate the memory subsystem.
    Most definitely not, I doubt the single core memory performance of the new AmpereOne is much higher than the 15GB/s the old Neoverse N1 cores the brand use to use got.
    This new Ampere CPU is at a pretty severe disadvantage by staying on the ancient version 8 ISA, it's missing out on important SIMD instructions that version 9 brought (that coincidentally improve memory performance).
    Reply
  • thestryker
    derekullo said:
    https://www.crucial.com/articles/about-memory/everything-about-ddr5-ramMath isn't 69x12 ?
    I have no idea how they're doing the math on that one and their notes don't indicate how. If I had to guess I'd say it's some sort of overhead normalized calculation based on a dual channel system. That would make it 69*6 by their calculations which would be 414GB/s.

    This is how you calculate maximum memory bandwidth:

    ((A * 2) * B) * C = Maximum memory bandwidth in MB/s

    A = DRAM speed in MHz (you can also just use MT/s and ignore the multiplication here)
    B = Width in bytes (64-bit (divide by 8 to get bytes) is most common for DRAM modules)
    C = Number of memory channels in question

    So using the above formula we have:
    ((2800 * 2) * 8) * 12 = 537,600
    Reply
  • bit_user
    derekullo said:
    https://www.crucial.com/articles/about-memory/everything-about-ddr5-ramMath isn't 69x12 ?
    Sorry I missed this. @thestryker is right.

    The easiest way to remember is to think of the number (5600) as the bit rate per-pin, in MHz. When they say 12-channel, they mean 64-bit x12 = 768 bits. So, it's very easy to just multiply it out (5600 * 768), then divide by 8 bits per byte, and then divide by 1000 to go from MB/s to GB/s.
    Reply