AMD Threadripper Pro 3995WX Review: Ripping With 8 Memory Channels

Threadripping with eight memory channels

Lenovo ThinkStation P620
Editor's Choice
(Image: © Tom's Hardware)

Why you can trust Tom's Hardware Our expert reviewers spend hours testing and comparing products and services so you can choose the best for you. Find out more about how we test.

Threadripper Pro Memory Scaling

Swipe to scroll horizontally
Row 0 - Cell 0 Memory ChannelsAIDA Memory LatencySiSoft Aggregate Multi-Core BWSiSoft Per-Core BWSiSoft Single-Thread BW
TR Pro 3995WX 32GBDual (2x16GB)92.1ns35 GB/s560 MB/s30.67 GB/s
TR Pro 3995WX 64GBQuad (4x16GB)102ns70 GB/s1.1 GB/s35 GB/s
TR Pro 3995WX 128GBOcto (8x16GB)100ns136 GB/s2.13 GB/s36 GB/s
Intel Xeon W-3175XSexa (6x8GB)81.1ns82 GB/s2.93 GB/s15.4 GB/s
TR 3990XQuad (4x8GB)84.68ns51.58 GB/s825 MB/s35.93 GB/s

Here we can see the memory throughput advantages of running with eight memory channels as opposed to the four memory channels found on the consumer-class Threadripper models. 

The quad- and octo-channel Threadripper Pro setups featured similar latency, but the dual-channel arrangement clocked in at a lower 92.1ns while the consumer-class Threadripper 3990X clocked in at 84.68ns. That could benefit some latency-sensitive workloads, as we'll see in the benchmarks below. Meanwhile, the Xeon W-3175X weighed in at 81.1ns. 

We turned to SiSoft Sandra for bandwidth measurements. The first two SiSoft columns outline performance when all the cores are actively requesting data. With the Threadripper Pro chip, we can see the neat doubling in memory throughput from dual- to quad-channel, and then nearly another doubling to 126 GB/s with the octo-channel setup. You'll notice the per-core bandwidth scales here nicely, as well, when all cores are consuming bandwidth. Notably, the quad-channel 3990X offers superior memory bandwidth over the quad-channel 3995WX.

The final column outlines memory throughput when only a single core is active, meaning the core doesn't have to share any bandwidth with other cores. The jump from a dual-channel to quad-channel setup improves bandwidth to a single core by 15%. Meanwhile, the move to octo-channel memory has little benefit over quad-channel - the peak memory throughput to one core caps around 35 GB/s. That means the increased throughput of octo-channel memory won't provide additional performance in single-threaded workloads over the quad-channel configuration.

Threadripper Pro 3995WX Power Consumption and Efficiency

There are a few caveats to our power testing: The Lenovo ThinkStation P620 delivers all of its power directly through the motherboard, which prevents us from conducting CPU power measurements from the physical layer that we typically use to validate the results we log from the sensor loop. However, the results do fall within our general expectations - the chip often tops out right at AMD's prescribed 280W power limit. 

In contrast, the Threadripper 3990X follows a typical trend we've seen in the past with AMD's core heavy chips - they often draw less power when all cores are fully loaded than when the chip is partially loaded (that's why the 3970X draws more power than the 3990X). These power management differences often occur at the behest of motherboard firmware, and the Lenovo system doesn't expose any information that we could use to tease out the difference in approaches.

The Dominus Extreme that we used for the W-3175X also presents power measurement challenges. In order to sidestep the CPU's power limits, Asus offers a secondary power reporting option in the BIOS. Intel's recommended setting (default) reports current by dividing the value by 1.25x, and the readings can at times be inaccurate. As such, we've only included measurements that we were able to verify at the physical layer. Those measurements of ~320W power draw during the AIDA power test easily eclipse the rest of the test pool. 

As you can see, the Threadripper Pro chips consume much more power than their desktop PC counterparts, which is an unavoidable side effect of the tremendous core counts. As expected, most of the tests show that the 3995WX consumes a few more watts of power as more memory channels are utilized. 

Here we take a slightly different look at power consumption by calculating the cumulative amount of energy required to perform x264 and x265 HandBrake workloads and two Blender renders. We plot this 'task energy' value in Kilojoules on the left side of the chart.

These workloads are comprised of a fixed amount of work, so we can plot the task energy against the time required to finish the job (bottom axis), thus generating a handy power chart. Bear in mind that faster compute times and lower task energy requirements are ideal. 

AMD Threadripper Pro 3995WX Benchmark Test Setup

As expected, Lenovo's system doesn't support overclocking, even though AMD's Threadripper Pro chips do support the feature. That means we'll have to wait for the other motherboards to ascertain the benefits, and according to recent reports, those are on the cusp of release. 

Lenovo's ThinkStation is unabashedly designed for 100% stability, and as such, features like DRAM frequencies and timings aren't alterable in the motherboard firmware. As a result, we had to test with 128GB of memory capacity spread across eight DIMMs. These DIMMs run off of SPD values, so we were limited to DDR4-3200 with JEDEC timings of 24-22-22-52-74. 

That means we're forced to compare the Threadripper Pro to systems with disparate memory capacities and timings, which we would typically normalize as best we can between test subjects. That limitation prevents us from coming to firm overall conclusions on the finer aspects of performance relative to the consumer chips, but we can get a good-enough sense of what to expect from a Threadripper Pro system. All other hardware configurations, such as GPUs and SSDs, are identical between the systems in the tests below. 

We tested the Threadripper Pro in the configurations in the next table (you'll also see the configurations marked in the charts) to compare performance with two, four, and eight memory channels populated. This will give us an interesting view of how Threadripper scales with improved memory throughput and capacity. 

All of the normal caveats of Threadripper 3000 performance apply.  Windows 10 splits cores up into 'processor groups' of 64 threads apiece, so some applications and benchmarks that aren't tuned to span across the groups don't benefit from the increased thread count. For applications that can't span processor groups, some professional users will run multiple instances of a program in VMs to extract the utmost in performance. Even without that type of arrangement, we see a marked uplift in several applications that benefit from the awesome parallelism of 128 threads, and the software ecosystem is quickly adjusting to embrace this type of design more fully.

AMD's Ryzen Master software, which allows you to tune consumer Threadripper processors, isn't available with the Threadripper Pro chips. 

Swipe to scroll horizontally
AMD Socket sWRX8AMD Threadripper Pro 3995WX
Row 1 - Cell 0 Lenovo ThinkStation P620
Row 2 - Cell 0 8x 16GB SK hynix ECC - DDR4-3200
Intel Socket 3647Intel Xeon W-3175X
Row 4 - Cell 0 ASUS ROG Dominus Extreme
Row 5 - Cell 0 6x 8GB Corsair Vengeance RGB DDR4-2466
Intel Socket 1200 (Z490)Core i7-10700K, Core i9-10900K
Row 7 - Cell 0 Gigabyte Aorus Z490 Master
Row 8 - Cell 0 2x 8GB Trident Z Royal DDR4-3600 - Stock: DDR4-2933
AMD Socket AM4 (X570)AMD Ryzen 9 5950X, 5900X, 3950X
Row 10 - Cell 0 MSI MEG X570 Godlike
Row 11 - Cell 0 2x 8GB Trident Z Royal DDR4-3600 - Stock: DDR4-3200
Intel Socket 2066 (X299)Core i9-10980XE
Row 13 - Cell 0 MSI Creator X299
Row 14 - Cell 0 4x 8GB Trident Z Royal DDR4-3600 - Stock: DDR4-2933
AMD Socket SP3 (TR4)Threadripper 3960X, 3970X, 3990X
Row 16 - Cell 0 ASUS ROG Zenith II Extreme
Row 17 - Cell 0 4x 8GB Trident Z Royal DDR4-3600 - Stock: DDR4-3200
All SystemsGigabyte GeForce RTX 3090 Eagle - Gaming and ProViz applications
Row 19 - Cell 0 Nvidia GeForce RTX 2080 Ti FE - Application tests
Row 20 - Cell 0 2TB Intel DC4510 SSD
Row 21 - Cell 0 EVGA Supernova 1600 T2, 1600W
Row 22 - Cell 0 Open Benchtable
Row 23 - Cell 0 Windows 10 Pro version 2004 (build 19041.450)
Row 24 - Cell 0 Workstation Tests - 4x 16GB Corsair Dominator - Corsair Force MP600
CoolingCorsair H115i, Custom loop

MORE: Best CPUs

MORE: Intel and AMD CPU Benchmark Hierarchy

MORE: All CPUs Content

Paul Alcorn
Managing Editor: News and Emerging Tech

Paul Alcorn is the Managing Editor: News and Emerging Tech for Tom's Hardware US. He also writes news and reviews on CPUs, storage, and enterprise hardware.

  • CerianK
    Probably a pointless question, but I assume the 16GB are dual-rank... I would be curious how 16GB single-rank (which I understand exist, but are the minority in the market) modules would perform in the 128GB configuration? Probably no difference, but might be worth exploring with a few select benchmarks, if possible.
    Reply
  • gatg2
    hate to be that guy but, it's not actually the first PCIe 4.0 capable workstation on the market, that honor goes to the Talos II Secure Workstation
    https://www.raptorcs.com/TALOSII/
    Reply
  • fellow
    I love these, especially the 12-16 cores at 4GHz, much closer to 5900 and 5950 for lightly threaded workloads. Great solution for those wanting expandable server and workstation features.

    I like the look of those Raptors too, especially the pci-4 and memory bandwidth. May get a Blackbird for testing and open source (mostly) fast hardware. See Phoenix coverage Part 2— the first were not as promising.

    For Threadripper Pro, has there been any information about the socket and CPU upgrade path?

    My main concern is the upcoming release of Zen3 Threadrippers. I imagine there will then be a Zen3 Threadripper Pro in a couple quarters or a year from now. The memory and pci expansion makes this an excellent platform for future growth.

    Since AMD has been forward looking by using the same socket for Ryzen, is it safe to expect the Zen3 TRPro will be accepted in this new socket?

    Gracias,

    fellow
    Reply
  • Endymio
    ... the most powerful workstation chip on the market - it's 64 cores easily outweigh Intel's
    Emergency edit on aisle four, please.

    Also, do I misunderstand the article, or has Toms yet again pronounced a verdict on a product they as yet haven't seen, or has even been released?
    Reply
  • Intel has nothing to touch the thread ripper so there’s nothing wrong with that statement
    Reply
  • Endymio
    Mandark said:
    Intel has nothing to touch the thread ripper so there’s nothing wrong with that statement
    Examine the highlighted word.
    Reply
  • hitchhiker0
    Fantastic! I like them very much.
    Picking a Threadripper Pro 3975WX, 128 GB RAM, some SSD, some NVidia GPU and make a virtual desktop infrastructure for computer-aided designing.
    You can host 4-6 virtual desktops quickly.
    Reply
  • Stefan Dyulgerov
    Hey in your benchmarks, can you include compilation of the Unreal Engine editor?
    The engine is quite taxing on the cpu both c++ and the shaders.
    Most people that are alone struggle with it. If you work in studio you can share cores, but at home alone:)
    Reply
  • mikewinddale
    Nice review, thanks.

    But I just discovered something interesting that you missed in the review:

    If you install six (6) dimms, applications like AIDA64, CPU-Z, etc. will recognize it as "hexa" channel, but benchmarks will reveal that the actual memory throughput is equivalent to merely dual-channel.

    So you can populate four or eight DIMMs, but be careful with six.

    For my application, I started a 3955WX with 4x64 GB RAM. I discovered that wasn't enough, so I upgraded to 6x64. My application now had enough RAM, but performance declined. So I had to upgrade to 8x64.
    Reply
  • robcowart
    @mikewinddale In my testing it is even worse than sticking to 4 or 8 populated channels. Anything less than all 8 channels has a significant impact on performance. The hardware setup for these tests was: 3995wx, ASUS Pro Sage, 256GB 3200MHz, writing to 4 x Samsung 980 Pro in RAID-0. Interesting is that while throughput dropped, meaning that technically the system is doing less work, the CPU utilization increased when all 8 memory channels weren't populated. I do wonder if the different channel-to-chiplet affinity between your 16-core and my 64-core model is responsible for why you don't see as big of a hit as I do with only 4 channels populated.


    Reply