Skip to main content

AMD Ryzen Threadripper 1950X Review

Game Modes & Architecture, Infinity Fabric Latency Testing

We've covered AMD's Zen architecture in depth, and also covered the Infinity Fabric at length. Head over to those articles for more coverage.

The Zeppelin Die Primer

Threadripper's massive package hides much complexity underneath, but we'll do our best to simplify and outline how it relates to AMD's innovative Creator and Game Mode features.

The Zen architecture employs a four-core CCX (CPU Complex) building block. AMD adorns each CCX with 8MB of L3 cache split into four slices; each core in the CCX accesses all L3 slices with the same average latency. Two CCXes come together to create an eight-core Ryzen 7 die (the large orange blocks in the second image below), and they communicate via AMD’s Infinity Fabric interconnect. The CCXes share the same dual-channel memory controller. This is basically two quad-core CPUs talking to each other over the Infinity Fabric pathway that also handles northbridge and PCIe traffic.

Image 1 of 2

Image 2 of 2

All Ryzen 7, 5, and 3 models feature the same single Zeppelin die. Although each core in a four-core CCX can access the local cache with the same average latency, trips to fetch data in adjacent CCXes incurs a latency penalty. Communication between threads on cores located in disparate CCXes also suffers, which is of particular importance for gaming. Many game engines split out various tasks to different threads, but they are reliant upon constant synchronization between them. Developers can defray some of the communication latency by tuning for the Ryzen architecture.

Building The Threadripper

The graphic below represents AMD's EPYC data center processor die, which shares Threadripper's basic design. We can see four separate Zeppelin dies connected via the Infinity fabric, and the two CCXes inside each die. This creates a 32-core Multi-Chip Module (MCM). Of course, Threadripper is "only" a 16-core processor. To create this configuration, AMD substitutes in two 'dummy dies,' which are non-functional fillers that ensure the heat spreader's structural integrity and consistent mating with the socket's pins. Without these dark dies, the IHS would either cave in when you tighten your cooling solution, or the chip would warp and not make full contact with the pins. AMD notes that Threadripper's functional dies are always placed diagonally from each other, which makes sense considering the fabric's design.

Image 1 of 2

Image 2 of 2

Remember, each Zeppelin die has its own memory and PCIe controllers. That means that if a workload executing on a die needs to access data resident in the memory of the other die (remote memory), it has to traverse a much larger gap. This introduces a level of latency we haven't seen from previous Ryzen models, and its effect on gaming performance is profound. The impact isn't as severe with most professional workloads, but some do suffer. 

The New Toggles

To defray the impact of remote memory access, AMD introduces a new memory access mode that you can toggle either in the BIOS or with the Ryzen Master software. The Local and Distributed settings flip between either NUMA (Non-Uniform Memory Access) or UMA (Universal Memory Access).

UMA (distributed) is pretty simple; it allows the dies to access all of the attached memory. NUMA mode (local) attempts to keep all data for the process executing on the die confined to its directly attached memory controller. It establishes one NUMA node per die (visible in the task manager). This reduces, and even possibly eliminates, data fetches from the remote memory connected to another die, though the die can still access it if needed. NUMA has deep roots in the enterprise, but the technique works best if programs are designed specifically to utilize it. It's a rarity on the desktop, but even though almost no desktop applications are designed to support it entirely, there can be performance advantages for non-NUMA applications.

Image 1 of 2

Image 2 of 2

AMD's Threadripper introduces more cores to the desktop than we've ever seen; some programs are caught ill-prepared. In fact, a few games like Far Cry Primal and the DiRT series won't even run when the full complement of Threadripper's threads are brought to bear. That's obviously a problem, so AMD created a Legacy Compatibility mode that disables half of the processor's cores by executing a "bcdedit /set numproc XX" command in Windows that effectively disables half of the processor. Luckily, due to the operating system's core assignments, the command disables all of the cores/threads on the second die. That has a side benefit of eliminating thread-to-thread communication between disparate die, serving as a great solution to the constant synchronization between threads during most gaming workloads.

Because the change is made in software, the "disabled" die still has power fed to it, so the system can still access the memory and PCIe controllers connected to the inactive die.

Game Mode And Creator Mode

So what do you do with all these knobs? There are four separate combinations that will impact each application or game differently, so you have to cycle through them to find the best possible combination for your workload. That's a godsend to tuners looking to squeeze out every last drop of performance, but an absolute nightmare for the other 99%.

AMD decided to simplify the process by specifying two combinations that will either work best for games or standard applications. Creator mode, which is the stock configuration, exposes the full might of 32 threads. It should naturally provide excellent performance for most productivity applications.

Game mode cuts half the threads via compatibility mode and reduces memory and die-to-die latency with the Local memory mode. We're going to test both configurations with our gaming suite, and try another configuration that also offers the full complement of threads.

Infinity Fabric Latency Testing

Die-to-die communication adds another layer of latency to Ryzen’s complicated architecture. As you can see, those same latency metrics don’t apply to the earlier Ryzen models. They also present challenges to some applications, such as those with synchronized threads or frequent fetches from remote memory, but have less impact on others.

ProcessorIntra-Core LatencyIntra-CCX Core-to-Core LatencyCross-CCX Core-to-Core LatencyCross-CCX Average LatencyDie-to-Die LatencyDie-To-Die Average LatencyAverage Transfer Bandwidth
TR 1950X Creator Mode DDR-266613.7 - 14.139.4 - 43.2ns157.6 - 171.3168ns180.6 - 256.7ns238.47ns90.26 GB/s
TR 1950X Creator Mode DDR4-320013.8 - 14.939.2 - 45.4ns144.9 - 167.2ns160.1ns213.1 - 227.8ns216.9ns91.67 GB/s
TR 1950X Game Mode DDR4-266613.9 - 14.2ns39.5 - 42.3ns149.2 - 164.1ns159.66nsXX46.58 GB/s
TR 1950X Game Mode DDR4-320014.3 - 14.9ns41.2 - 46.2ns123 - 150.6ns145.44nsXX45.52 GB/s
TR 1950X Local/SMT DDR4-266613.9 - 14.4ns39.6 - 43.1ns168.7 - 175.4ns171.48ns232.4 - 240.8235.38ns92.7 GB/s
TR 1950X Local/SMT DDR4-320013.9 - 14.4ns39.9 - 44.5ns146.7 - 159.4ns153.89ns209.3 - 220.9ns212.53ns91 GB/s
Ryzen 7 1800X14.8ns40.5 - 82.8ns120.9 - 126.2ns122.96nsXX48.1 GB/s
Ryzen 5 1600X 14.7 - 14.8ns40.6 - 82.8ns121.5 - 128.2ns123.48nsXX43.88 GB/s

The intra-core latency measurements represent communication between two logical threads resident on the same physical core, and they're unaffected by memory speed. Intra-CCX measurements quantify latency between threads that are on the same CCX but not resident on the same core. In the past, we observed slight performance variances, but intra-CCX latency is also largely unaffected by memory speed. However, we've seen a large decrease in cross-CCX latency, which denotes latency between threads located on two separate CCXes, by increasing the memory data transfer rate from DDR4-1333 to DDR4-3200 on Ryzen 5 and 7 models.

The same general trend continues with Threadripper. As we can see, toggling game mode removes the die-to-die latency for threads by effectively disabling one die, but it also reduces host processing resources. It’s an interesting feature that will benefit some workloads, but hamstring others.

We also notice that the Local/SMT combination, which consists of the local setting and leaves all cores active (legacy off), offers the best overall latency improvement via memory overclocking. We also recorded higher Cross-CCX latency with the Threadripper processors.

ProcessorIntra-Core LatencyCore-To-Core LatencyCore-To-Core Average LatencyAverage Transfer Bandwidth
Core i9-7900X14.5 - 16ns69.3 - 82.3ns75.56ns83.21 GB/s
Core i9-7900X @ 3200 MT/s16 - 16.1ns76.8 - 91.3ns83.93ns87.31 GB/s
Core i7-6950X13.5 - 15.4ns54.5 - 70.3ns64.64ns65.67 GB/s
Core i7-7700K 14.7 - 14.9ns36.8 - 45.1ns42.63ns35.84 GB/s

We are in the midst of a broader set of tests to quantify how these modes impact memory latency and bandwidth, among other factors. Stay tuned.


MORE: Best CPUs


MORE: Intel & AMD Processor Hierarchy


MORE: All CPUs Content

  • I just looked at gaming benchmark and stopped reading there because as i thought Intel CPUs are killing Thread Ripper in gaming. As far as content creation, naturally having 16/32 setup will be faster than Intel 10/20 but again do you really need more than 10/20 cores. I don't and i heavily use PC for gaming, programming, web design, video/audio encoding. Overall Intel 7900x is better value and all around CPU. But if you are just in gaming 7700k is just enough.

    Thanks for review, and hello x299 platform.

    Gaming vs. Content Creation mode through Software is just another big NO NO to me knowing how crappy AMD software is. I assume the most people will keep it in Game Mode and leave it as it is.

    I appreciate that AMD brought this CPU for $999 with so many cores, helps competition but again there is nothing to drool over here in my book. AMD didn't bring any significant performance bump core vs. core basis. In fact AMD single core performance still sucks which means when Intel releases 10+ core CPU it is going to fun to watch.

    Two things i am interested the most is Coffee Lake product and IPC improvement there and possible price adjustment with Core i9.

    Reply
  • Quaddro
    Hold up breath..
    Reply
  • Quaddro
    Hold up breath more...
    Reply
  • Kai Dowin
    I'm truly impressed to see 16 Zen cores consuming as much power as only 10 Skylake-X ones. Bravo, AMD!
    Reply
  • 20045233 said:
    I'm truly impressed to see 16 Zen cores consuming as much power as only 10 Skylake-X ones. Bravo, AMD!

    I am not knowing that Intel is running higher frequency.

    Reply
  • JamesSneed
    20045197 said:
    I just looked at gaming benchmark and stopped reading there because as i thought Intel CPUs are killing Thread Ripper in gaming. As far as content creation, naturally having 16/32 setup will be faster than Intel 10/20 but again do you really need more than 10/20 cores. I don't and i heavily use PC for gaming, programming, web design, video/audio encoding. Overall Intel 7900x is better value and all around CPU. But if you are just in gaming 7700k is just enough.

    Thanks for review, and hello x299 platform.

    Gaming vs. Content Creation mode through Software is just another big NO NO to me knowing how crappy AMD software is.

    I love Intel even more...all you have to do pop CPU in and shit works and it works well.

    I guess if gaming is why you were reading the Threadripper review then you are right it isn't as good as Intel's offerings but did you honestly expect any other result? I don't know why reviewers even do gaming tests on any CPU over 8 cores as it is mostly pointless. If you are doing scientific, encoding, professional tasks in just about every use case that is multi threaded it is blowing away every Intel offering. Of course that may change once there are 12-18 core Intel parts. However spending $1000 for a CPU is a bargain for those than can use it and never in history could you get a 16 core consumer part with this type of multi-threaded performance.
    Reply
  • Lyden
    Thank you for this review. I was seriously considering Threadripper. Looks like the 7700k is still the sensible choice for the price when gaming.
    Reply
  • Kai Dowin
    @FREAK777POWER And delivering higher multi-threaded performance with these lower clocked cores. Do you know what that's called? Efficiency.
    Reply
  • redgarl
    This chip is designed for heavy calculation multithreading, it is not made for gaming, however it is working well with 1440p and 2160p.

    By the way, who in their mind will buy a 16 core CPU and play at 1080p with a 1080 TI... seriously, these 1080p bench are a joke and don't represent reality...

    "A standard or point of reference against which things may be compared." Oxford

    1080p with 1080 TI with a 16 core processor is not a point of reference at all.
    Reply
  • Pompompaihn
    Who are you people that come here and <ModEdit> about gaming performance on these chips??

    Threadripper is the F250 of CPUs. It's not the fastest, but it's plenty fast for 99% of your tasks, and if you need to haul a 12,000 pound trailer it'll do that, too. This is for people who do a lot of WORK on their machine but also game on the side.

    <Moderator Warning: Watch your language in these forums>
    Reply