AMD Threadripper Pro 3995WX Review: Ripping With 8 Memory Channels

Threadripping with eight memory channels

Lenovo ThinkStation P620
Editor's Choice
(Image: © Tom's Hardware)

Why you can trust Tom's Hardware Our expert reviewers spend hours testing and comparing products and services so you can choose the best for you. Find out more about how we test.

Workstation CPU and GPU Benchmarks Test Notes

Some of these applications also make an appearance in our standard test suite, but those test configurations and benchmarks are focused on a typical desktop-class environment. In contrast, the following tests are configured to stress the systems with workstation-class workloads, which is a particular strength for the Threadripper processors given their hefty core counts. 

With the exception of the W-3175X and Threadripper Pro systems, we loaded down our test platforms with 64GB of DDR4 memory spread across four modules to accommodate the expanded memory capacity required for several of these workstation-focused tasks. Due to the W-3175X's six-channel memory controller and our limited stock of high-capacity DIMMs, we used six 8GB DIMMs for a total capacity of 48GB. As mentioned, we're stuck with testing with 128GB of DDR4-3200 ECC memory for the Threadripper Pro system - and at JEDEC timings. 

SPECviewperf 2020 on AMD Threadripper Pro 3995WX

The SPECviewperf 2020 benchmarks are hot off the press from the SPEC committee, so we decided to give the suite a spin with the Nvidia GeForce RTX 3090 to see how well the Threadripper Pro processors can push along a GPU in professional rendering applications. This has long been a weakness of previous-gen Threadripper processors, but the 3995WX performs admirably.

  • The following short descriptions are from Bob Cramblitt, communications director for SPEC. Each entry has a link to more detailed test descriptions on the SPEC website. 
  • 3ds max-07 - Autodesk 3ds Max 2016 - 11 tests representing rendering modes used in gaming, film visual effects, and architectural markets. 
  • maya-06 - Autodesk Maya 2017 - 10 rendering tests, including shaded, ambient occlusion, multi-sample anti-aliasing, and transparency.
  • catia-06 - Dassault Systems Catia v5 / 3DExperience - 10 tests ranging from 2.1 to 21 million vertices. Viewsets include several rendering modes - anti-aliasing, shaded, and shaded with edges. 
  • solidworks-05 - Dassault Systems Solidworks 2020 - 10 tests ranging from 2.1 to 21 million vertices. Viewsets include several rendering modes - shaded, shaded with edges, ambient occlusion, shaders, and environment maps.
  • energy-03 - OpendTect seismic visualization - 3D tests based on real-world seismic datasets.
  • medical-03 - 2D slice rendering and raycasting techniques found in medical applications.
  • creo-03 - Creo 4 - Model sizes range from 20 to 48 million vertices, multiple rendering modes.
  • snx-04 - Siemens NX 8.0 - 10 tests ranging from 7.15 to 8.45 million vertices with wireframe, anti-aliasing, shaded, shaded with edges, and studio mode rendering modes.

Per-core performance continues to reign supreme in most graphics-accelerated workloads. As a result, we see the consumer-focused chips, with their higher clock speeds and/or more efficient architectures with higher IPC, take the lead in many of these benchmarks. 

It is important to note that AMD now leads in workloads where it has traditionally trailed by large margins. The 3995WX took the lead over Intel's W-3175X and 10980XE in the Creo, CATIA, Maya (all of which benefit from increased octo-channel memory throughput), and Medical benchmarks. 

The 10980XE led the Siemens NX benchmark, while the W-3175X offered comparable performance to the 3995WX. 3DS Max also served as a bright spot for the Intel processors, albeit by a slim margin. The seismic modeling Energy benchmark shows that performance is comparable between the various processors in some of these applications.

Overall, the Threadripper 3995WX delivered a solid showing in these workloads, notching a big step forward from previous-gen models in several workloads while diminishing traditionally-large deltas (nearly to the negligible range) in the benchmarks where the previous-gen Threadripper processors trailed by large margins. 

Puget Systems Adobe Benchmarks on AMD Threadripper Pro 3995WX

Puget Systems is a boutique vendor that caters to professional users with custom-designed systems targeted at specific workloads. The company has developed a series of acclaimed benchmarks for Adobe software, which you can find here.

Adobe After Effects CC Render Node Benchmark on AMD Threadripper Pro 3995WX

AMD Threadripper Pro 3995WX Adobe Render Node Benchmark

(Image credit: Tom's Hardware)

The After Effects render node benchmark leverages the in-built aerender application that splits the render engine across multiple threads to maximize CPU and GPU performance. This test is memory-intensive, so RAM capacity and throughput are important and can be a limiting factor.

No surprises here - the combination of the 3995WX's 128 threads, octo-channel memory, and PCIe 4.0 throughput yield 7% more performance than the consumer-focused 3990X and 9% more performance than the Core i9-10980XE - but be aware that the 3995WX has a memory capacity advantage here and it's hard to ascertain how much of the benefit stems from increased bandwidth or capacity. 

Notably, the 3995WX leads the W-3175X by 21%, despite the latter's access to six-channel memory. We can likely chalk this up to the vagaries of Intel's mesh architecture. 

Adobe Premiere Pro CC Benchmark on AMD Threadripper Pro 3995WX

This benchmark measures live playback and export performance with several codecs at 4K and 8K resolutions. It also incorporates 'Heavy GPU' and 'Heavy CPU' effects that stress the system beyond a typical workload. Storage throughput also heavily impacts the score. 

The Threadripper processors are remarkably well suited for this type of work as they take sizeable leads over the competing Intel chips, and the addition of more memory  and throughput benefits the 3995WX, which takes a 4% lead over the 3970X. 

Adobe Photoshop CC Benchmark on Threadripper Pro 3995WX

The Photoshop benchmark measures performance in a diverse range of tasks, measuring the amount of time taken to complete general tasks and apply filters. This test leans heavily on GPU acceleration, and it's clear that high clock rates benefit performance tremendously. 

While the consumer chips take massive leads in the test due to their superior per-core performance, the 3995WX is right in the thick of the competition with comparable workstation-class chips. The 3995WX leads the W-3175X by 10% while trailing the Core i9-10980XE by 3%.

SPECworkstation 3 Benchmarks on AMD Threadripper Pro 3995WX

The SPECworkstation 3 benchmark suite is designed to measure workstation performance in professional applications. The full suite consists of more than 30 applications split among seven categories, but we've winnowed down the list to tests that largely focus specifically on CPU performance. We haven't submitted these benchmarks to the SPEC organization, so be aware these are not official benchmarks. We've upgraded to the new 3.0.4 revision that supports spanning the tests that support the feature across processor groups and sockets. 

Even though the SPECworkstation 3 software supports spanning workloads across multiple processor groups, not all applications can take advantage of the full 128 threads. As such, we're only presenting a few of the tests that indicate the benefit of increased memory throughput over the 3990X, or that show large deltas relative to competing chips. 

SPECworkstation 3's Rodinia LifeSciences benchmark steps through four tests that include medical imaging, particle movements in a 3D space, a thermal simulation, and image-enhancing programs. Like many of the subtests, the thermal simulation tool only runs on 64 threads, but we see improved performance via the increased memory throughput and capacity. 

NAMD is a parallel molecular dynamics code designed to scale well with additional compute resources and is one of the premier benchmarks used to quantify performance with simulation code. We recorded faster performance with the 3995WX in the smtv workload than with the 3990X, but most of the other subtests weren't impacted. 

The earth’s subsurface structure can be determined via seismic processing. One of the four basic steps in this process is the Kirchhoff Migration, which generates an image based on the available data using mathematical operations. Like many of the tests in this suite, it doesn't span across processor groups, so these results represent performance with 64 threads active. Spinning up multiple VMs would result in higher performance in concurrent workloads.

The Calculix workload is based on the finite element method for three-dimensional structural computations, and it typically responds well to higher core counts. We see gains borne of the higher memory throughput, but they aren't explosive. 

While we didn't see much of a performance improvement from increased memory bandwidth in most of the SPECworkstation 3 suite, these results are important - the performance gains borne of Threadripper 3995WX's copious helping of threads easily outweighs Intel's competing chips by large margins. 

MORE: Best CPUs

MORE: Intel and AMD CPU Benchmark Hierarchy Comparisons

MORE: All CPUs Content

Paul Alcorn
Managing Editor: News and Emerging Tech

Paul Alcorn is the Managing Editor: News and Emerging Tech for Tom's Hardware US. He also writes news and reviews on CPUs, storage, and enterprise hardware.

  • CerianK
    Probably a pointless question, but I assume the 16GB are dual-rank... I would be curious how 16GB single-rank (which I understand exist, but are the minority in the market) modules would perform in the 128GB configuration? Probably no difference, but might be worth exploring with a few select benchmarks, if possible.
    Reply
  • gatg2
    hate to be that guy but, it's not actually the first PCIe 4.0 capable workstation on the market, that honor goes to the Talos II Secure Workstation
    https://www.raptorcs.com/TALOSII/
    Reply
  • fellow
    I love these, especially the 12-16 cores at 4GHz, much closer to 5900 and 5950 for lightly threaded workloads. Great solution for those wanting expandable server and workstation features.

    I like the look of those Raptors too, especially the pci-4 and memory bandwidth. May get a Blackbird for testing and open source (mostly) fast hardware. See Phoenix coverage Part 2— the first were not as promising.

    For Threadripper Pro, has there been any information about the socket and CPU upgrade path?

    My main concern is the upcoming release of Zen3 Threadrippers. I imagine there will then be a Zen3 Threadripper Pro in a couple quarters or a year from now. The memory and pci expansion makes this an excellent platform for future growth.

    Since AMD has been forward looking by using the same socket for Ryzen, is it safe to expect the Zen3 TRPro will be accepted in this new socket?

    Gracias,

    fellow
    Reply
  • Endymio
    ... the most powerful workstation chip on the market - it's 64 cores easily outweigh Intel's
    Emergency edit on aisle four, please.

    Also, do I misunderstand the article, or has Toms yet again pronounced a verdict on a product they as yet haven't seen, or has even been released?
    Reply
  • Intel has nothing to touch the thread ripper so there’s nothing wrong with that statement
    Reply
  • Endymio
    Mandark said:
    Intel has nothing to touch the thread ripper so there’s nothing wrong with that statement
    Examine the highlighted word.
    Reply
  • hitchhiker0
    Fantastic! I like them very much.
    Picking a Threadripper Pro 3975WX, 128 GB RAM, some SSD, some NVidia GPU and make a virtual desktop infrastructure for computer-aided designing.
    You can host 4-6 virtual desktops quickly.
    Reply
  • Stefan Dyulgerov
    Hey in your benchmarks, can you include compilation of the Unreal Engine editor?
    The engine is quite taxing on the cpu both c++ and the shaders.
    Most people that are alone struggle with it. If you work in studio you can share cores, but at home alone:)
    Reply
  • mikewinddale
    Nice review, thanks.

    But I just discovered something interesting that you missed in the review:

    If you install six (6) dimms, applications like AIDA64, CPU-Z, etc. will recognize it as "hexa" channel, but benchmarks will reveal that the actual memory throughput is equivalent to merely dual-channel.

    So you can populate four or eight DIMMs, but be careful with six.

    For my application, I started a 3955WX with 4x64 GB RAM. I discovered that wasn't enough, so I upgraded to 6x64. My application now had enough RAM, but performance declined. So I had to upgrade to 8x64.
    Reply
  • robcowart
    @mikewinddale In my testing it is even worse than sticking to 4 or 8 populated channels. Anything less than all 8 channels has a significant impact on performance. The hardware setup for these tests was: 3995wx, ASUS Pro Sage, 256GB 3200MHz, writing to 4 x Samsung 980 Pro in RAID-0. Interesting is that while throughput dropped, meaning that technically the system is doing less work, the CPU utilization increased when all 8 memory channels weren't populated. I do wonder if the different channel-to-chiplet affinity between your 16-core and my 64-core model is responsible for why you don't see as big of a hit as I do with only 4 channels populated.


    Reply