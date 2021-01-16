Threadripper Pro Memory Scaling

Memory Channels AIDA Memory Latency SiSoft Aggregate Multi-Core BW SiSoft Per-Core BW SiSoft Single-Thread BW TR Pro 3995WX 32GB Dual (2x16GB) 92.1ns 35 GB/s 560 MB/s 30.67 GB/s TR Pro 3995WX 64GB Quad (4x16GB) 102ns 70 GB/s 1.1 GB/s 35 GB/s TR Pro 3995WX 128GB Octo (8x16GB) 100ns 136 GB/s 2.13 GB/s 36 GB/s Intel Xeon W-3175X Sexa (6x8GB) 81.1ns 82 GB/s 2.93 GB/s 15.4 GB/s TR 3990X Quad (4x8GB) 84.68ns 51.58 GB/s 825 MB/s 35.93 GB/s

Here we can see the memory throughput advantages of running with eight memory channels as opposed to the four memory channels found on the consumer-class Threadripper models.

The quad- and octo-channel Threadripper Pro setups featured similar latency, but the dual-channel arrangement clocked in at a lower 92.1ns while the consumer-class Threadripper 3990X clocked in at 84.68ns. That could benefit some latency-sensitive workloads, as we'll see in the benchmarks below. Meanwhile, the Xeon W-3175X weighed in at 81.1ns.

We turned to SiSoft Sandra for bandwidth measurements. The first two SiSoft columns outline performance when all the cores are actively requesting data. With the Threadripper Pro chip, we can see the neat doubling in memory throughput from dual- to quad-channel, and then nearly another doubling to 126 GB/s with the octo-channel setup. You'll notice the per-core bandwidth scales here nicely, as well, when all cores are consuming bandwidth. Notably, the quad-channel 3990X offers superior memory bandwidth over the quad-channel 3995WX.

The final column outlines memory throughput when only a single core is active, meaning the core doesn't have to share any bandwidth with other cores. The jump from a dual-channel to quad-channel setup improves bandwidth to a single core by 15%. Meanwhile, the move to octo-channel memory has little benefit over quad-channel - the peak memory throughput to one core caps around 35 GB/s. That means the increased throughput of octo-channel memory won't provide additional performance in single-threaded workloads over the quad-channel configuration.

Threadripper Pro 3995WX Power Consumption and Efficiency

There are a few caveats to our power testing: The Lenovo ThinkStation P620 delivers all of its power directly through the motherboard, which prevents us from conducting CPU power measurements from the physical layer that we typically use to validate the results we log from the sensor loop. However, the results do fall within our general expectations - the chip often tops out right at AMD's prescribed 280W power limit.

In contrast, the Threadripper 3990X follows a typical trend we've seen in the past with AMD's core heavy chips - they often draw less power when all cores are fully loaded than when the chip is partially loaded (that's why the 3970X draws more power than the 3990X). These power management differences often occur at the behest of motherboard firmware, and the Lenovo system doesn't expose any information that we could use to tease out the difference in approaches.

The Dominus Extreme that we used for the W-3175X also presents power measurement challenges. In order to sidestep the CPU's power limits, Asus offers a secondary power reporting option in the BIOS. Intel's recommended setting (default) reports current by dividing the value by 1.25x, and the readings can at times be inaccurate. As such, we've only included measurements that we were able to verify at the physical layer. Those measurements of ~320W power draw during the AIDA power test easily eclipse the rest of the test pool.

As you can see, the Threadripper Pro chips consume much more power than their desktop PC counterparts, which is an unavoidable side effect of the tremendous core counts. As expected, most of the tests show that the 3995WX consumes a few more watts of power as more memory channels are utilized.

Here we take a slightly different look at power consumption by calculating the cumulative amount of energy required to perform x264 and x265 HandBrake workloads and two Blender renders. We plot this 'task energy' value in Kilojoules on the left side of the chart.

These workloads are comprised of a fixed amount of work, so we can plot the task energy against the time required to finish the job (bottom axis), thus generating a handy power chart. Bear in mind that faster compute times and lower task energy requirements are ideal.

AMD Threadripper Pro 3995WX Benchmark Test Setup

As expected, Lenovo's system doesn't support overclocking, even though AMD's Threadripper Pro chips do support the feature. That means we'll have to wait for the other motherboards to ascertain the benefits, and according to recent reports, those are on the cusp of release.

Lenovo's ThinkStation is unabashedly designed for 100% stability, and as such, features like DRAM frequencies and timings aren't alterable in the motherboard firmware. As a result, we had to test with 128GB of memory capacity spread across eight DIMMs. These DIMMs run off of SPD values, so we were limited to DDR4-3200 with JEDEC timings of 24-22-22-52-74.

That means we're forced to compare the Threadripper Pro to systems with disparate memory capacities and timings, which we would typically normalize as best we can between test subjects. That limitation prevents us from coming to firm overall conclusions on the finer aspects of performance relative to the consumer chips, but we can get a good-enough sense of what to expect from a Threadripper Pro system. All other hardware configurations, such as GPUs and SSDs, are identical between the systems in the tests below.

We tested the Threadripper Pro in the configurations in the next table (you'll also see the configurations marked in the charts) to compare performance with two, four, and eight memory channels populated. This will give us an interesting view of how Threadripper scales with improved memory throughput and capacity.

All of the normal caveats of Threadripper 3000 performance apply. Windows 10 splits cores up into 'processor groups' of 64 threads apiece, so some applications and benchmarks that aren't tuned to span across the groups don't benefit from the increased thread count. For applications that can't span processor groups, some professional users will run multiple instances of a program in VMs to extract the utmost in performance. Even without that type of arrangement, we see a marked uplift in several applications that benefit from the awesome parallelism of 128 threads, and the software ecosystem is quickly adjusting to embrace this type of design more fully.

AMD's Ryzen Master software, which allows you to tune consumer Threadripper processors, isn't available with the Threadripper Pro chips.

AMD Socket sWRX8 AMD Threadripper Pro 3995WX Lenovo ThinkStation P620 8x 16GB SK hynix ECC - DDR4-3200 Intel Socket 3647 Intel Xeon W-3175X ASUS ROG Dominus Extreme 6x 8GB Corsair Vengeance RGB DDR4-2466 Intel Socket 1200 (Z490) Core i7-10700K, Core i9-10900K Gigabyte Aorus Z490 Master 2x 8GB Trident Z Royal DDR4-3600 - Stock: DDR4-2933 AMD Socket AM4 (X570) AMD Ryzen 9 5950X, 5900X, 3950X

MSI MEG X570 Godlike 2x 8GB Trident Z Royal DDR4-3600 - Stock: DDR4-3200 Intel Socket 2066 (X299) Core i9-10980XE MSI Creator X299 4x 8GB Trident Z Royal DDR4-3600 - Stock: DDR4-2933 AMD Socket SP3 (TR4) Threadripper 3960X, 3970X, 3990X ASUS ROG Zenith II Extreme 4x 8GB Trident Z Royal DDR4-3600 - Stock: DDR4-3200 All Systems Gigabyte GeForce RTX 3090 Eagle - Gaming and ProViz applications Nvidia GeForce RTX 2080 Ti FE - Application tests

2TB Intel DC4510 SSD

EVGA Supernova 1600 T2, 1600W Open Benchtable

Windows 10 Pro version 2004 (build 19041.450) Workstation Tests - 4x 16GB Corsair Dominator - Corsair Force MP600 Cooling Corsair H115i, Custom loop

