AMD 64-Core Threadripper 3990X Review: Battle of the Flagships

The core wars rage on.

AMD Threadripper 3990X
Editor's Choice
(Image: © AMD)

Why you can trust Tom's Hardware Our expert reviewers spend hours testing and comparing products and services so you can choose the best for you. Find out more about how we test.

Be sure to check out the test notes on the previous page for important testing particulars. Be aware that further optimizations could unlock more performance from our server platforms, and all three servers come in 1U chassis that can have an impact on cooling, and thus performance. Also, the servers have varying memory capacities, but that's an unavoidable consequence of the unique platforms.

All AMD entries with "PBO" indicate an auto-overclocked configuration paired with with DDR4-3600. Intel's overclocked configurations also use DDR4-3600. 

It's also noteworthy that while we did experience many odd performance characteristics that disadvantage some platforms, this testing represents the current state of the software ecosystem.

As a reminder, here is a quick breakout of each server entry in the charts:

Swipe to scroll horizontally
Chart EntryProcessorsCores / ThreadsServer Test BedDRAM
2x EPYC 7742Two EPYC Rome 7742128 / 256Supermicro AS-1023US-TR416x 32GB DDR4-3200
2x Xeon 8280Two Intel Xeon Platinum 828056 / 112Dell/EMC PowerEdge R46012x 32GB DDR4-2933
EYPC 7742One EPYC Rome 7702P64 / 128Gigabyte R15Z-Z328x 32GB DDR4-3200

Encoding

Starting off with the LAME encoder, which is the quintessential example of a single-threaded test, may seem a bit...lame, but this series of tests helps explain some of the results you'll see throughout the rest of the review.  

Both AMD and Intel have made great strides with per-core performance in their HEDT lineups over the last few years. In the case of the 3990X, that improved performance in light workloads also spans out to multiple cores when the chip is under load, which benefits many of our rendering tests.

The Threadripper 3990X's faster clock speed over competing server chips is a big advantage for some applications, like the single-threaded LAME and FLAC encoding tests, and some of our rendering tests below.

Remember, Windows breaks processors up into groups of 64 cores, and some applications can't scale past those boundaries. Additionally, the server platforms are broken into several NUMA nodes. Zooming out to the threaded Handbrake tests, we see the advantage of the 3990X's clock speed take hold as it takes the top of the chart, even beating out the other Threadripper processors – but not by much. That's largely because the x264 test doesn't fully saturate the cores in both 64-core processor groups, meaning the bottleneck resides elsewhere, and the x265 test only scales across the cores in one processor group. That's particularly painful for the dual-EPYC and Xeon platforms because they suffer from disparate NUMA nodes. 

The SVT-AV1 test is designed to scale well across multiple cores, but the relatively short workload doesn't scale to the second processor group, and the higher clock speeds of the 3970X and overclocked Xeon W-3175X. 

Now let's look at a few workloads that scale well. 

Rendering

Cinebench R20.06 scales exceedingly well across both processor groups, and surprisingly, the overclocked 3990X even beats out the dual-EPYC server by a slim margin. Frankly, that's astounding. The 3990X also beats the dual-Xeon 8280 server by a whopping 62%. We can see the impact of the 7702P's lower clock speeds here, as that processor with the same number of cores and threads lags far behind the 3990X, even at stock settings. However, it almost matches the dual-Xeon server, highlighting the power of AMD's single-socket server platforms in these types of workloads.  

For reference, the overclocked Threadripper 3990X pulled a peak of 589W (package power) during the multi-core Cinebench run, compared to roughly 480W from both Xeon processors. 

Our POV-Ray charts are a bit of an eyesore, but that's because this application requires a new extension (to the existing 3.7 engine) so it can run across both processor groups. This highlights some of the challenges AMD will face with the software ecosystem as it works to unlock the full performance in threaded workloads, but also how the company is already moving forward on that front. We also included the pre-patch test results for all impacted platforms to highlight the advantages. 

Again, the overclocked 3990X delivers devastating performance that ekes by the dual-EPYC platform and provides more than twice the performance of the patched dual-Xeon 8280 system, but there's a catch: While the workload scaled perfectly across all 128 threads of the 3990X and 256 threads for the dual-EPYC server, the patch doesn't appear to work on more than one NUMA node with Intel processors, which is a separate issue from processor grouping.

We extrapolated the performance of the benchmark if it were to run on both of the Xeon server's NUMA nodes, and will follow up to see if we can get a new version.  

The Threadripper 3990X pulled a peak of 639W during this test compared to the dual-Xeon's extrapolated value of 800W. 

V-Ray scales across both processor groups/NUMA nodes for both Intel and AMD platforms, which gives the 3990X a nice lead over the dual-Xeon system and all other single-chip competitors at stock settings, though it did take PBO to beat the dual-EPYC system. The Corona ray tracing benchmark also spanned both groups and handed the 3990X a convincing win. 

We couldn't run some of our benchmarks on the server platforms, but the Blender benchmark marks another strong win for the 3990X over the competing consumer processors.

The Cinebench single-threaded test finds the 3990X falling behind processors with higher frequencies, but still beating the server competition. Meanwhile, the 3990X puts up an impressive across-the-board win in the single-threaded POV-Ray test. 

A few of our other tests, like rendering and visualization, photo editing, and LuxMark, respond better to higher clock rates, so the 3990X struggles to keep pace with other consumer chips.  

Compression, Decompression, Encryption, AVX

The 7-zip workload works directly from the memory, removing storage bottlenecks from the equation, but it only executes across one processor group/NUMA node. That disadvantages the 3990X and server platforms (particularly the EPYC server) compared to the consumer processors, at least in terms of performance scaling. Despite the restriction, the 3990X posts incredibly impressive results in the compression tests, but clock rates play a big role in the decompression test. That gives the 3970X the win. 

The multi-core y-cruncher test pounds the processor with a threaded AVX workload that spans across both processor groups and NUMA nodes. Here the dual-Xeon 8280 exerts its AVX prowess, but the EPYC 7702P takes the lead. The 3990X falls a bit further down the pecking order, and given that this test works directly from memory, its quad-channel memory subsystem serves as its achilles heel. In either case, it still beats out all other HEDT processors at stock settings. 

The AIDA suite of tests, which includes the Zlib compression/decompression test, AES, SHA3, and HASH tests, scales perfectly across NUMA nodes and processor groups, which gives the dual-EPYC system a commanding lead in all of the tests. The dual-Xeon server is also very competitive, and the Threadripper 3990X easily beats all of the consumer-class silicon. 

Office and Productivity

These tests find us back in the realm of mainstream and HEDT platforms. The Threadripper 3990X isn't the best solution for many of the mundane workloads in PCMark 10 and the Microsoft Office suite, but it does deliver acceptable levels of performance. Naturally, these workloads aren't optimized for a behemoth like the 3990X, so these results aren't surprising, and this certainly isn't the target market.  

Web Browser

Browsers tend to be impacted more by the recent security mitigations than other types of applications, so Intel has generally taken a haircut in these benchmarks of fully-patched systems. While the mitigations have chipped away at Intel's lead in these tests, Intel processors still largely outperform competing AMD chips in these types of strictly single-threaded applications. 

MORE: Best CPUs

MORE: Intel & AMD Processor Hierarchy

MORE: All CPUs Content

Paul Alcorn
Managing Editor: News and Emerging Tech

Paul Alcorn is the Managing Editor: News and Emerging Tech for Tom's Hardware US. He also writes news and reviews on CPUs, storage, and enterprise hardware.

  • mohammed2006
    Threadripper 3990X performance gape is not enough to justify it over 3970x. which i think is the one to buy.
    Reply
  • King_V
    As the article states, though - this is for specific types of workload/use cases.
    Reply
  • knekker
    A large number of applications don't scale well with NUMA architectures, particularly with Windows, which is the operating system of choice for visual effects artists.
    I work in the VFX industry, where I've been at ILM, DNEG, MPC and Cinesite that work on most of the block buster movies, and I can tell you this. Windows is definitely not the OS of choice, that would be linux.
    I do however currently work at a smaller vfx studio, and they use Windows.
    Reply
  • splave
    Great read Paul! I love that the 64 core makes the 32 core look reasonable now haha
    Reply
  • Roland Of Gilead
    Really enjoyed that one! Great comparison of the HEDT CPU's v Server and Mainstream, the good, the bad, and the ugly!

    Although, I don't get the almost apologetic tone in the Gaming Test notes. Yes, we know these CPU's aren't meant for gaming, but HEDT users, I'm sure, like to down tools too and game after a hard days slog! I suspect they'd like to know, along with the majority of the community, and anyone who'd be genuinely interested in these CPU's in the first place, what kind of gaming performance they can expect (and it's pretty damn good, by all accounts! ) from them.

    Anyway, including the gaming metrics is just being comprehensive. That's why I come to Tom's. Comprehensive is good. Don't resist the urge to include these benches in future comparison's. Don't mind the detractors! :D
    Reply
  • domih
    So you could run a Cassandra 21-node cluster on one PC with 21 Virtual Machines each allocated with 6 threads, keeping 2 threads for the host. With a mobo max memory of 256GB, each VM could be allocated 11GB leaving 25GB for the host. AMD enables you to have fun 🆒
    Reply
  • Phaaze88
    knekker said:
    I work in the VFX industry, where I've been at ILM, DNEG, MPC and Cinesite that work on most of the block buster movies, and I can tell you this. Windows is definitely not the OS of choice, that would be linux.
    I do however currently work at a smaller vfx studio, and they use Windows.
    Would your line of work actually enjoy using the 3990X, or would it just stick with something Intel again, due to the time and money lost swapping platforms?
    Reply
  • bamboe
    Well here they did a linux test if you like it
    https://www.phoronix.com/scan.php?page=article&item=3990x-threadripper-linux&num=1
    Reply
  • rjacko01
    I have also worked Framestore, ILM, MPC etc & the idea of running windows for vfx on that scale is seriously scary. I think it's fair to say 95%+ of vfx are linux, cause only a few smaller houses run windows, often with horrendous results.
    Reply
  • derekullo
    Hypothetically with 256 megabytes of L3 you could also have a 128 thread monero miner.

    My extrapolation from the 3970X (28900 hashes/second x 2) = 57800 hashes/second x 0.9 (due to scaling not being completely linear due to lower clock speeds) = 52020 hashes per second

    Putting that into a monero calculator with a 300 watt power drain for the system and 0.06 Cost per KWh we get $1,364 profit a year.

    $3990 / $1364 = 2.9 years to recoup your investment.

    https://www.cryptocompare.com/mining/calculator/xmr?HashingPower=52020&HashingUnit=H/s&PowerConsumption=300&CostPerkWh=0.06&MiningPoolFee=1
    Comparing this to a Geforce 2080Ti we get a strangely similar 2.89 years to recoup its investment.
    $1300 / $1.23 a day = 1057 days / 365 days = 2.89 years

    With the 3950x clocking higher and being 35% cheaper per core it would make more sense to use 3 - 3950x in 3 separate rigs than a 3990x.
    Reply