AMD Threadripper 3990X Scores Another Win: We Test New SPECWorkstation 3 Update

SPECworkstation 3 Test Notes

We ran this series of tests with the new patched v.3.0.4 version of the benchmark (marked as 'new' in the charts) and provide direct comparisons to the previous v.3.0.3 version ('old') that don't utilize all 128 threads in some subtests. "PBO" denotes an overclocked configuration. As mentioned on the previous page, the older version of the benchmark wouldn't run some subtests on the 3990X, or the results were too far outside of the expected range to be usable. Now we can run those tests, but we will only have one 3990X entry in those charts. 

Some of these applications also make an appearance in our standard test suite, but those test configurations and benchmarks are focused on a typical desktop-class environment. In contrast, these tests are configured to stress the systems with workstation-class workloads. 

With the exception of the W-3175X system, we loaded down our test platforms with 64GB of DDR4 memory spread across four modules to accommodate the expanded memory capacity required for several of these workstation-focused tasks. Due to the W-3175X's six-channel memory controller and our limited stock of high-capacity DIMMs, we used six 8GB DIMMs for a total capacity of 48GB. All systems were tested at the vendor-specified supported memory data transfer rates for their respective stock configurations, and DDR4-3600 for the overclocked settings. Test conditions mirror those explained in our Threadripper 3990X review, but we've also included a breakdown of the test systems at the end of the page.

The full suite consists of more than 30 applications split among seven categories, but we've winnowed down the list to tests that largely focus specifically on CPU performance. We haven't submitted these benchmarks to the SPEC organization, so these are not official benchmarks.

NAMD

NAMD is a parallel molecular dynamics code designed to scale well with additional compute resources and is one of the premier benchmarks used to quantify performance with simulation code. We couldn't run this benchmark with the previous SPECworkstation version, but now it ticks right along as the 3900X's 128 threads tear through the benchmark. 

The 3990X beats Intel's competing 28-core Xeon W-3175X by a massive margin, even after we overclocked the latter to the limits. Workload scalability is important here: The 3990X is more than twice as fast as the 32-core Threadripper 3970X in some tests, so the 3900X's additional cache may come into play. 

Media and Entertainment

With the new version of the benchmark, the LuxRender CPU test scales much better than we see with the widely-used version of the benchmark we include in our normal test suite. It's important that potential customers can see the benefits of optimized code, so this massive improvement is welcome. We slid the test results from our standard suite into the second spot in the album, but be aware that is from the stand-alone LuxRender benchmark that isn't a part of the SPEC suite. With SEPCworkstation 3, the deltas between the 3990X and competing chips is now significantly widened, especially relative to the externally-available LuxRender benchmark utility, so hopefully LuxRender releases a new version of its stand-alone benchmark to unleash this type of performance.

We run the new Blender Benchmark beta in our regular suite of tests, but different types of render jobs can stress processors in unique ways. Here we can see a breakout of several industry-standard benchmark renders that largely favor the Zen 2 architecture. You'll notice that the Threadripper 3990X tends to be more competitive in the longer-duration render workloads, which falls right in line with the company's guidance that workload intensity has a big impact on performance. We verified that this portion of the benchmark suite runs across both processor groups with the previous-generation of the benchmark, so most of these results show little variability. We do see a slight regression with the new version in the 3BMWs workload.

HandBrake

We noticed that our standard HandBrake tests, which aren't part of the SPEC suite, didn't scale well at all with the 3990X, with the x265 test in particular showing subpar performance scaling considering the 3990X's massive compute resources. 

However, with the optimized SPEC code, we can now see the massive performance improvements we expect. For reference, the first two charts in the album outline SPEC performance, while the second two charts come from our own external test tool. The SPEC results are much more representative of the real-world performance benefits of optimized code, particularly to highlight the extreme deltas relative to competing processors.

Product Development and Energy

The earth’s subsurface structure can be determined via seismic processing. One of the four basic steps in this process is the Kirchhoff Migration, which is used to generate an image based on the available data using mathematical operations.

This test kicked off odd results with the older version of the benchmark, but now we see the expected improvements, and they're impressive. 

SRMP algorithms are used for discrete energy minimization. This test didn't run correctly, either, but now yields usable and realistic results. We see nearly linear performance scaling in this test, which is impressive.

Calculix is based on the finite element method for three-dimensional structural computations. The new SPEC update did generate a few results that we wouldn't expect given the 3990X's helping of compute resources. Calculix certainly falls into that bucket. This could be a oddity from our test environment, and we're working to diagnose the issue. That starts with retesting the comparison processors, and we'll update as necessary. In either case, we shouldn't expect these massive improvements in this program, and we'll follow up with SPEC if we hit a brick wall in our examination. 

Financial and General Workloads

The financial services simulations are used to project risk and uncertainty in financial forecasting models and run across SIMD lanes, meaning the vectorized code should unlock the ultimate in performance from processors with the requisite compute elements. We found that these SPECWorkstation 3 tests continue to not scale across processor groups, but this could be due to the underlying code, as opposed to the benchmark itself. 

The Python benchmark conducts a series of math operations, including numpy and scipy math libraries, with Python 3.6. This test also includes multithreaded matrix tests that would obviously benefit from more cores, provided the software can utilize the host processing resources correctly. Naturally, the multithreaded matrix workload favors Threadripper 3000, but we don't see the expected performance improvements. This test does span the two processor groups, but doesn't scale well. That's probably attributable to the underlying code instead of the SPEC benchmark. 

The Intel processors dominate the numpy and scipy tests, but that also represents another challenge in the benchmark ecosystem for AMD: These tests are compiled in Intel's MKL library, but alternative libraries would improve AMD's standing.  

Rodinia LifeSciences

SPECworkstation 3's Rodinia LifeSciences benchmark steps through four tests that include medical imaging, particle movements in a 3D space, a thermal simulation, and image-enhancing programs. These workloads don't execute across both processor groups with the old version of the benchmark, but now they do. That removes the 3970X's clock speed as its big advantage and allows the 3990X to take an expected lead in the Heartwall thermal simulation. 

We also ran into unexpectedly good results in Srad and Lavamd, but a bit of regression with Hotspot. We're taking a deeper look at these results to troubleshoot. 

Thoughts

We come away even more impressed with the Threadripper 3990X after this series of tests. Aside from the Threadripper 3990X's brutal performance in threaded workloads that can exploit its resources, the rapid industry adoption of the important tools and software to support the chip and expose its raw performance potential is incredibly important. 

The updated benchmark isn't a panacea that can solve all issues in all benchmarks, as some of the underlying applications still aren't tuned for the unique Zen 2 architecture. However, these results underline our previous conclusions about the Threadripper 3990X, which we bestowed with an Editor's Choice award: 

"The Threadripper 3990X is pretty much exactly what AMD says it is: A highly specialized processor that provides incredible performance in a narrow cross-section of workloads, but at an extremely attractive price point given its capabilities." 

The SPEC organization also announced that it has begun developing a completely new SPEC CPU suite that will replace the existing SPEC CPU 2017 benchmark. After acceptance, the organization will award up to $9,000 and free benchmark software to individuals that submit new benchmarks that take advantage of multi-core processors and parallel, multi-threaded computing. 

That marks another new wave of development of industry-standard benchmarks that will accurately measure the performance of core-heavy chips, but that submission period will last a year, meaning those new tests will be a bit further out on the horizon than the reworked SPECworkstation 3 benchmarks. 

Kudos to the SPEC organization for moving quickly to assure industry-standard benchmarks that reflect the current state of the market. There might be a few teething pains in the process, but we'll take that over industry stagnation any day of the week. If the rest of the industry moves as fast as the SPEC organization, these types of core-heavy processors could become a more common reality for enthusiasts sooner rather than later

Swipe to scroll horizontally
AMD Socket sTRX4 (TRX40)Threadripper 3990X, 3970X, 3960X
Row 1 - Cell 0 MSI Creator TRX40
Row 2 - Cell 0 4x 8GB G.Skill FlareX DDR4-3200 - Stock: DDR4-3200, OC: DDR4-3600 16-18-18-36
Intel Socket 2066 (X299)Core i9-10980XE
Row 4 - Cell 0 MSI Creator X299
Row 5 - Cell 0 4x 8GB G.Skill FlareX DDR4-3200 - Stock: DDR4-2933, OC: DDR4-3600 16-18-18-36
AMD Socket AM4 (X570)AMD Ryzen 9 3950X

MSI MEG X570 Godlike
Row 8 - Cell 0 2x 8GB G.Skill FlareX DDR4-3200 - Stock: DDR4-3200, OC: DDR4-3600 14-14-14-34
Intel LGA 3647 (C621)Intel Xeon W-3175X
Row 10 - Cell 0 ROG Dominus Extreme
Row 11 - Cell 0 6x 8GB Corsair Vengeance RGB DDR4-2666 - Stock: DDR4-2666, OC: DDR4-3600 14-14-14-34
AMD Socket SP3 (TR4)Threadripper 2990WX, 2970WX, 2950X
Row 13 - Cell 0 MSI MEG X399 Creation
Row 14 - Cell 0 2x 8GB G.Skill FlareX DDR4-3200 - Stock: DDR4-2933, OC: DDR4-3466 14-14-14-34
All SystemsNvidia GeForce RTX 2080 Ti

2TB Intel DC4510 SSD

EVGA Supernova 1600 T2, 1600W

Windows 10 Pro (1903 - All Updates)
CoolingCorsair H115i, Enermax Liqtech 360 TR4 II, Custom Loop

MORE: Best CPUs

MORE: Intel & AMD Processor Hierarchy

MORE: All CPUs Content

Paul Alcorn
Managing Editor: News and Emerging Tech

Paul Alcorn is the Managing Editor: News and Emerging Tech for Tom's Hardware US. He also writes news and reviews on CPUs, storage, and enterprise hardware.

  • atmapuri
    The benchmark is now optimized for the CPU, but the actual software not yet? What is that good for? And they even want to do more code which runs well for benchmark only. Maybe time for a new "dreamer" category of articles?
    Reply
  • PaulAlcorn
    atmapuri said:
    The benchmark is now optimized for the CPU, but the actual software not yet? What is that good for? And they even want to do more code which runs well for benchmark only. Maybe time for a new "dreamer" category of articles?

    The majority of the benchmarks do show massive performance improvements, though a few aren't scaling due to the nature of the program. If the program can't scale in real life, or isn't designed to, it doesn't benefit anyone to game the benchmark software to present unrealistic results.
    Reply
  • CerianK
    For those of us that write and run custom parallel code applications, these benchmarks shed some more light on the potential. However, there is no substitute for actually testing the applications you plan to use. I have already jumped ahead and finished testing the very applications I need to run on a 3800X to project the performance improvement over my dual E5-2690 workstation when moving to a single 3990X: > 5x.

    This is indeed a niche processor for those that know they need it, and I'm sold. But since I am a private researcher, maybe I should go after that $9000 prize mentioned to defray costs... I might have a little extra time available once I finish building new automation tools to keep the beast fed.

    The main down-side I see for my work is that, ideally, I would need to have access to 4GB per thread... AMD needs to get that mess straightened out, if not for this generation, then the next. As is, some of my workloads would take twice as long as otherwise necessary, and no, I will not build two 3970X systems, go with EPYC, or upgrade to yet more Xeons... too cost and/or power prohibitive (thought the 3990X is certainly pushing the line for cost here also).

    Question: Do some of the 3970X (and possibly 3960X) benchmarks need to be re-run also, or is the funny scaling for some of those benchmarks due to glitches in SPEC's new code adjustments?
    Reply
  • Stevemeister
    You seem to have completely missed the point of the article which was to point out that currently most applications ARE NOT optimized to take advantage of what this chip is capable of IF applications were to be optimized for it . . . . basically there is potential for applications to run 2-3 times faster than they currently do if the software get optimized.
    Reply
  • RodroX
    Nvidia launched the RTX gpus more than a 17 months ago, and the Ray Tracing tech is still not supported by many games and been optimized little by litte, and those games that do support it see an important drop in FPS when active (but they do look amazing).

    AMD launched a new category of HEDT cpu the TR 3990X less than a month ago, so I think is fair to give some time for the software industry to catch up.
    Other than that tumbs up! to TomsHardware to keep updating the info and benchmarks results as new software shows up. How knows what the cpu and gpu future brings when software gets tunned a bit more.

    Cheers
    Reply
  • Rob1C
    CerianK said:
    The main down-side I see for my work is that, ideally, I would need to have access to 4GB per thread... AMD needs to get that mess straightened out, if not for this generation, then the next. As is, some of my workloads would take twice as long as otherwise necessary, and no, I will not build two 3970X systems, go with EPYC, or upgrade to yet more Xeons... too cost and/or power prohibitive (thought the 3990X is certainly pushing the line for cost here also).

    On the basis of memory cost alone for your application (4GB / Thread) the Epyc CPU is well over $1000 less expensive when you add the price of a 64 core ThreadRipper with 4x128GB sticks versus the Epyc with 8x64GB sticks; while there's a difference in clock speed the 7H12 benefit for the additional cost is unlikely useful for your cost constraints. The extra 64 PCIe 4.0 lanes could allow a speedy RAID card which might be useful.

    Sometimes looking at total costs rather than focusing on the price, longevity, and capabilities of a single part is what's needed. It's a certainty that there's a much better selection of ThreadRipper MBs (PCIe 4.0) than what is available for the Epyc; and the TR MBs are more feature filled and capable for the price. The best TR MB won't add epic features to a ThreadRipper, nor is there an overclocked server MB for the Epyc (not counting the normal running speed for a dual 7H12, and the loss of arm, leg, and organs).

    But, buy as you wish.
    Reply
  • Makaveli
    RodroX said:
    Nvidia launched the RTX gpus more than a 17 months ago, and the Ray Tracing tech is still not supported by many games and been optimized little by litte, and those games that do support it see an important drop in FPS when active (but they do look amazing).

    AMD launched a new category of HEDT cpu the TR 3990X less than a month ago, so I think is fair to give some time for the software industry to catch up.
    Other than that tumbs up! to TomsHardware to keep updating the info and benchmarks results as new software shows up. How knows what the cpu and gpu future brings when software gets tunned a bit more.

    Cheers

    Don't think we will see a push for Ray Tracing in games until both next Gen consoles are out.
    Reply
  • CerianK
    Rob1C said:
    On the basis of memory cost alone for your application...
    Actually, it is applications (plural), where the 4GB/thread applies to only one of the applications, so not a deal-breaker if only 2GB/thread is available. The concern is more of a future-proofing issue, where you are right that EPYC might be a better choice in the long run, but would also require an additional 3950X (for example) to handle more lightly threaded time-sensitive workloads. So, not necessarily a less-expensive option.

    Still, my understanding is that if one were able to install 8x64GB on Threadripper, it would only see 256GB, which others have commented on as a means to artificially limit VM deployment, or other traditionally server (i.e. EPYC) workloads, on the 3990X. I am not sure if there is any more to it than that, but it makes sense from a marketing standpoint (regardless of my opinions on the subject).

    Regarding memory type, ECC is not a requirement for me. Something like G.SKILL F4-3200C16Q2-256GVK for $1200 US would likely be fine, from what I can tell.
    Reply