Ryzen Threadripper 2 (2990WX and 2950X) Review: AMD Unleashes 32 Cores

Why you can trust Tom's Hardware Our expert reviewers spend hours testing and comparing products and services so you can choose the best for you. Find out more about how we test.

Architecture, NUMA & Game Mode

It Starts With 12nm LP

AMD's Threadripper 2 processors are manufactured on GlobalFoundries' 12nm LP process technology. The ported-over design helps boost transistor performance, but does not affect die area or transistor density. As a result, the Zeppelin die's ~4.8 billion transistors and 213mm2 area remain similar from first-gen Ryzen. The dual-die X-series models feature a total of 9.6 billion transistors and 426mm2 of silicon, while the quad-die WX processors feature 19.2 billion transistors over 852mm2.

Lower leakage current does enable 200 MHz-higher clock rates or an 80-120mV core voltage reduction at any given frequency compared to 14nm manufacturing. All told, AMD claims the 12nm design enables up to 11% less power consumption than 14nm-based Threadripper CPUs at the same clock rates, or up to 16% more performance at the same thermal design power. AMD also adds other nuanced refinements, like lower L1 (15%), L2 (9%), and L3 (8%) cache latencies, along with reduced memory latency (2%).

2990WX Architecture

Threadripper 2990WX borrows from AMD's EPYC server designs and comes with four active dies. The company fused off PCIe and memory control from two of the dies, creating silicon only useful for computing. Meanwhile, the other two I/O-enabled dies serve up two channels of DDR4 memory support and 32 lanes of PCIe 3.0 each.

Unfortunately, the compute dies suffer from increased latency on every request to main memory and PCIe-attached devices, as those requests always have to traverse the Infinity Fabric.

AMD added more Infinity Fabric channels to connect two more dies. Unfortunately, that has a tremendous impact on fabric bandwidth, which drops from 50 Gb/s on a 16-core Threadripper 2950X to 25 Gb/s in this implementation. And again, AMD measured performance with a 3200 MT/s data rate, meaning throughput at DDR4-2933 will be lower. Even with the benefits of tightly-controlled fabric scheduling magic, the combination of reduced bandwidth and 32 threads that must communicate over the fabric for I/O and memory requests has an impact on performance.

According to chip analyst David Schor at WikiChip, each request from the compute die requires interfacing with the Cache-Coherent Masters (CCM), which then interfaces with the CAKE (Coherent AMD socKet Extender) module that encodes the request and sends it to the remote I/O die. The remote CAKE module then decodes the request, fetches the requested data via the UMC (Unified Memory Controller), and then encodes the data and transmits it back to the compute die.

Increased traffic and reduced fabric throughput will have a tangible impact on memory-hungry applications, leading to sub-par performance scaling under some conditions. Although Threadripper 2990WX is clearly aimed at the semi-professional market, configurations hosting multiple GPUs may slow down due to increased fabric latency and reduced throughput to remote PCIe lanes. That'd also affect the performance of PCIe-based M.2 storage and LAN devices connected to remote dies.

MSI's MEG Creation motherboard diagram provides a nice summary of the split connectivity between dies. And be mindful of new population rules, such as inserting the first GPU into PCIe slot four, along with custom M.2 recommendations. You need to populate all four DRAM channels or follow dual-channel population rules in order to realize maximum performance, as performance drops sharply in some dual-channel configurations due to the distributed design.

AMD carves the Threadripper 2990WX into four NUMA domains that cannot be altered. As such, the processor does not have a local memory toggle for its Game Mode feature. Instead, the processor simply flips into "1/4" mode, which disables all but one die and effectively creates an 8C/16T CPU. Ryzen Master also has "1/2" and "Off" options that expose 16 cores and 32 threads, or 32 cores and 64 threads.

The company claims it could not enable the compute dies' memory and I/O controllers even partially without significantly overhauling the package's trace routing, requiring a new socket interface. AMD reps say they prioritized drop-in compatibility with the existing motherboard and cooler ecosystem, leading them to build Threadripper 2990WX the way it turned out.

AMD continues working with Microsoft to route threads to the die with direct-attached memory first, and then spill remaining threads over to the compute dies. Unfortunately, the scheduler currently treats all dies as equal, operating in Round Robin mode. As a result, even moderately-threaded applications can suffer at the hands of high memory latency and low throughput. This is further complicated by thread migration. According to AMD, Microsoft has not committed to a timeline for updating its scheduler.

The Zeppelin Building Block

The Zen architecture employs a four-core CCX (CPU Complex) building block. Each CCX has 8MB of L3 cache split into four slices; each core in the CCX accesses all L3 slices with the same average latency. Two CCXes come together to create an eight-core Zeppelin die, and they communicate with each other via AMD’s Infinity Fabric. The CCXes share the same dual-channel memory controller. This is basically two quad-core CPUs talking to each other over the Infinity Fabric pathway that also handles northbridge and PCIe traffic.

Although each core in a four-core CCX can access the local cache with the same average latency, trips to fetch data in adjacent CCXes incur a latency penalty. Communication between threads on cores located in disparate CCXes also suffers.

2950X Architecture & Game Mode

Threadripper 2950X mirrors the layout of AMD's first-gen Threadripper chips: two Zeppelin dies are connected via another layer of the Infinity Fabric. AMD flanks them with a pair of dummy dies that serve as non-functional fillers to ensure the heat spreader's structural integrity and consistent mating with the socket's pins.

Remember, each Zeppelin die has its own memory and PCIe controller. If a thread running on one core needs to access data resident in cache on another die, it has to traverse the fabric between those dies and incur significant latency. Naturally, the latency penalty between dies is higher than it is between CCXes in a single-die configuration. But AMD claims to have made some improvements there. The 2950X purportedly offers 64ns latency to near memory and 105ns to far memory, while the previous-gen 1950X had to wait 78ns and 133ns, respectively. As per usual, the speed of the Infinity Fabric is tied to the memory controller, so higher data rate settings are desirable. AMD measured Threadripper 2's fabric performance with a 3200 MT/s data rate, which means fabric latency at the recommended DDR4-2933 will be higher.

To combat the potential for performance regression as a result of its "go-wide" approach, AMD devised an interesting solution: it introduced a memory access switch that you can toggle via motherboard BIOS or the Ryzen Master software. The Local and Distributed settings flip between either NUMA (Non-Uniform Memory Access) or UMA (Uniform Memory Access), same as they did for AMD's first-gen Threadripper CPUs.

UMA (Distributed) is pretty simple; it allows both dies to access all of the attached memory. NUMA mode (Local) attempts to keep all data for the process executing on the die confined to its directly attached memory controller, establishing one NUMA node per die. The goal is to minimize requests to remote memory attached to the other die. NUMA works best if programs are designed specifically to utilize it. Even though most desktop PC software wasn't written with NUMA in mind, performance gains are still possible in non-NUMA applications.

AMD also allows you to disable cores in Legacy Compatibility mode, which disables one die via a Windows command. This allows some programs that won't function with 32 threads to execute properly, and it also eliminates cross-die communication. The system can still access I/O connected to the second die, though, so you don't lose any associated memory or attached peripherals.

A set of toggles generally offers the best performance in games and applications by combining these settings optimally. Game Mode disables one die with the Legacy Compatibility mode, and then switches the 2950X into Local memory mode, effectively creating an 8C/16T CPU. Creator mode uses the Distributed memory setting and disables Legacy Compatibility, providing access to Threadripper 2950X's full armament of 16 cores and 32 threads for demanding workloads.

MORE: Best CPUs

MORE: Intel & AMD Processor Hierarchy

MORE: All CPUs Content

  • Rdslw
    first table is broken 32/64 cores/threads :)
    Ryzen Threadripper 2990WX
    Ryzen Threadripper 2950X
    Socket
    TR4
    TR4
    Cores / Threads
    16 / 32
    16 / 32
    Reply
  • bilazaurus
    AMD Ryzen Threadripper 2. Your first choice for encoding!*

    *And nothing else.
    Reply
  • philipemaciel
    Wow, while the 2990WX is a bit of a letdown, the 2950X is a nice surprise. Plenty of bang for your buck!
    Reply
  • TEAMSWITCHER
    Avoid the flagship, buy the $900 part. Sounds a lot like Intel.
    Reply
  • alves.mvc
    Why does Tom's Hardware stopped using the HPC benchmark? It was the most interesting measurement for me that work daily with finite differences and finite elements. Can you return to that?
    Reply
  • totaldarknessincar
    Seems to me the best of both worlds continue to be Intel's 7900x which sells for $699 at microcenter. You get great gaming performance, and great multithreaded performance, and it's not 12-1800 bucks as some of these mega-threaded cards are.

    Despite all the fan-fare, it seems the 7980xe actually remains the best processor when overclocked overall.

    Lastly for gaming, it's still 8700K or 8086 as best, with the 2700x from AMD being the best when you factor gaming and some multi-threaded stuff, while being very competitive price wise.
    Reply
  • feelinfroggy777
    Very surprising performance from the 2950x. Almost enough to consider parting ways with my 1950x. Maybe when the pricing comes down some from the 2950x in a few months I will consider.

    The 2990wx on the other hand is a slight let down. Too bad they could not get the scaling down between the dies like they did with Threadripper 1. But I have read that was going to be an issue. Maybe AMD did not want the 2990wx to cannibalize their Epyc market.

    With that being said, the 2990wx is still a modern marvel of technology, even more so when you consider the price. Only couple of years ago a CPU with less than a third of the cores cost just as much.

    Competition sure is grand!
    Reply
  • basil.thomas
    Looks like Intel has an opportunity to bite AMD when they release their 28-core processor. I have a threadripper 2/x399 system but if I upgrade to the 2990wx, I will also upgrade the motherboard and the power supply as well. I think I may wait until the Intel 28 core comes out and see what kind of performance it delivers as I too notice running custom AI apps on the threadripper is barely faster than my old x99/6850 motherboard overclocked @ 4.3Ghz. I want max performance if I am going to pay over $1800 for the flagship which means core wars is just starting...

    MOD EDIT: watch your profanity
    Reply
  • ffleader1
    21228046 said:
    Seems to me the best of both worlds continue to be Intel's 7900x which sells for $699 at microcenter. You get great gaming performance, and great multithreaded performance, and it's not 12-1800 bucks as some of these mega-threaded cards are.

    Despite all the fan-fare, it seems the 7980xe actually remains the best processor when overclocked overall.

    Lastly for gaming, it's still 8700K or 8086 as best, with the 2700x from AMD being the best when you factor gaming and some multi-threaded stuff, while being very competitive price wise.
    Seem to me that you are mistaking best of both work with jack of all trade. No one who takes rendering seriously would want to sacrifice the performance for gaming. For that price, they may as well grab a 1950X. Sure you lose in gaming, but gain a huge jump in rendering. Also, I don't know about Microcenterbut it's still 1k on Amazon while 1950X is $850. 7900X is like a really really bad choice lol.
    Reply
  • g-unit1111
    Wow, 32 cores for $1,000? I have to say very impressive. Your move, Intel!
    Reply