Why you can trust Tom's Hardware Our expert reviewers spend hours testing and comparing products and services so you can choose the best for you. Find out more about how we test.

Page 2 of 15:

Game Modes & Architecture, Infinity Fabric Latency Testing

We've covered AMD's Zen architecture in depth, and also covered the Infinity Fabric at length. Head over to those articles for more coverage.

The Zeppelin Die Primer

Threadripper's massive package hides much complexity underneath, but we'll do our best to simplify and outline how it relates to AMD's innovative Creator and Game Mode features.

The Zen architecture employs a four-core CCX (CPU Complex) building block. AMD adorns each CCX with 8MB of L3 cache split into four slices; each core in the CCX accesses all L3 slices with the same average latency. Two CCXes come together to create an eight-core Ryzen 7 die (the large orange blocks in the second image below), and they communicate via AMD’s Infinity Fabric interconnect. The CCXes share the same dual-channel memory controller. This is basically two quad-core CPUs talking to each other over the Infinity Fabric pathway that also handles northbridge and PCIe traffic.

Building The Threadripper

The graphic below represents AMD's EPYC data center processor die, which shares Threadripper's basic design. We can see four separate Zeppelin dies connected via the Infinity fabric, and the two CCXes inside each die. This creates a 32-core Multi-Chip Module (MCM). Of course, Threadripper is "only" a 16-core processor. To create this configuration, AMD substitutes in two 'dummy dies,' which are non-functional fillers that ensure the heat spreader's structural integrity and consistent mating with the socket's pins. Without these dark dies, the IHS would either cave in when you tighten your cooling solution, or the chip would warp and not make full contact with the pins. AMD notes that Threadripper's functional dies are always placed diagonally from each other, which makes sense considering the fabric's design.

Image 1 of 2

Remember, each Zeppelin die has its own memory and PCIe controllers. That means that if a workload executing on a die needs to access data resident in the memory of the other die (remote memory), it has to traverse a much larger gap. This introduces a level of latency we haven't seen from previous Ryzen models, and its effect on gaming performance is profound. The impact isn't as severe with most professional workloads, but some do suffer.

The New Toggles

To defray the impact of remote memory access, AMD introduces a new memory access mode that you can toggle either in the BIOS or with the Ryzen Master software. The Local and Distributed settings flip between either NUMA (Non-Uniform Memory Access) or UMA (Universal Memory Access).

UMA (distributed) is pretty simple; it allows the dies to access all of the attached memory. NUMA mode (local) attempts to keep all data for the process executing on the die confined to its directly attached memory controller. It establishes one NUMA node per die (visible in the task manager). This reduces, and even possibly eliminates, data fetches from the remote memory connected to another die, though the die can still access it if needed. NUMA has deep roots in the enterprise, but the technique works best if programs are designed specifically to utilize it. It's a rarity on the desktop, but even though almost no desktop applications are designed to support it entirely, there can be performance advantages for non-NUMA applications.

Image 1 of 2

AMD's Threadripper introduces more cores to the desktop than we've ever seen; some programs are caught ill-prepared. In fact, a few games like Far Cry Primal and the DiRT series won't even run when the full complement of Threadripper's threads are brought to bear. That's obviously a problem, so AMD created a Legacy Compatibility mode that disables half of the processor's cores by executing a "bcdedit /set numproc XX" command in Windows that effectively disables half of the processor. Luckily, due to the operating system's core assignments, the command disables all of the cores/threads on the second die. That has a side benefit of eliminating thread-to-thread communication between disparate die, serving as a great solution to the constant synchronization between threads during most gaming workloads.

Because the change is made in software, the "disabled" die still has power fed to it, so the system can still access the memory and PCIe controllers connected to the inactive die.

Game Mode And Creator Mode

So what do you do with all these knobs? There are four separate combinations that will impact each application or game differently, so you have to cycle through them to find the best possible combination for your workload. That's a godsend to tuners looking to squeeze out every last drop of performance, but an absolute nightmare for the other 99%.

AMD decided to simplify the process by specifying two combinations that will either work best for games or standard applications. Creator mode, which is the stock configuration, exposes the full might of 32 threads. It should naturally provide excellent performance for most productivity applications.

Game mode cuts half the threads via compatibility mode and reduces memory and die-to-die latency with the Local memory mode. We're going to test both configurations with our gaming suite, and try another configuration that also offers the full complement of threads.

Infinity Fabric Latency Testing

Die-to-die communication adds another layer of latency to Ryzen’s complicated architecture. As you can see, those same latency metrics don’t apply to the earlier Ryzen models. They also present challenges to some applications, such as those with synchronized threads or frequent fetches from remote memory, but have less impact on others.

Swipe to scroll horizontally

Processor	Intra-Core Latency	Intra-CCX Core-to-Core Latency	Cross-CCX Core-to-Core Latency	Cross-CCX Average Latency	Die-to-Die Latency	Die-To-Die Average Latency	Average Transfer Bandwidth
TR 1950X Creator Mode DDR-2666	13.7 - 14.1	39.4 - 43.2ns	157.6 - 171.3	168ns	180.6 - 256.7ns	238.47ns	90.26 GB/s
TR 1950X Creator Mode DDR4-3200	13.8 - 14.9	39.2 - 45.4ns	144.9 - 167.2ns	160.1ns	213.1 - 227.8ns	216.9ns	91.67 GB/s
TR 1950X Game Mode DDR4-2666	13.9 - 14.2ns	39.5 - 42.3ns	149.2 - 164.1ns	159.66ns	X	X	46.58 GB/s
TR 1950X Game Mode DDR4-3200	14.3 - 14.9ns	41.2 - 46.2ns	123 - 150.6ns	145.44ns	X	X	45.52 GB/s
TR 1950X Local/SMT DDR4-2666	13.9 - 14.4ns	39.6 - 43.1ns	168.7 - 175.4ns	171.48ns	232.4 - 240.8	235.38ns	92.7 GB/s
TR 1950X Local/SMT DDR4-3200	13.9 - 14.4ns	39.9 - 44.5ns	146.7 - 159.4ns	153.89ns	209.3 - 220.9ns	212.53ns	91 GB/s
Ryzen 7 1800X	14.8ns	40.5 - 82.8ns	120.9 - 126.2ns	122.96ns	X	X	48.1 GB/s
Ryzen 5 1600X	14.7 - 14.8ns	40.6 - 82.8ns	121.5 - 128.2ns	123.48ns	X	X	43.88 GB/s

The intra-core latency measurements represent communication between two logical threads resident on the same physical core, and they're unaffected by memory speed. Intra-CCX measurements quantify latency between threads that are on the same CCX but not resident on the same core. In the past, we observed slight performance variances, but intra-CCX latency is also largely unaffected by memory speed. However, we've seen a large decrease in cross-CCX latency, which denotes latency between threads located on two separate CCXes, by increasing the memory data transfer rate from DDR4-1333 to DDR4-3200 on Ryzen 5 and 7 models.

The same general trend continues with Threadripper. As we can see, toggling game mode removes the die-to-die latency for threads by effectively disabling one die, but it also reduces host processing resources. It’s an interesting feature that will benefit some workloads, but hamstring others.

We also notice that the Local/SMT combination, which consists of the local setting and leaves all cores active (legacy off), offers the best overall latency improvement via memory overclocking. We also recorded higher Cross-CCX latency with the Threadripper processors.

Swipe to scroll horizontally

Processor	Intra-Core Latency	Core-To-Core Latency	Core-To-Core Average Latency	Average Transfer Bandwidth
Core i9-7900X	14.5 - 16ns	69.3 - 82.3ns	75.56ns	83.21 GB/s
Core i9-7900X @ 3200 MT/s	16 - 16.1ns	76.8 - 91.3ns	83.93ns	87.31 GB/s
Core i7-6950X	13.5 - 15.4ns	54.5 - 70.3ns	64.64ns	65.67 GB/s
Core i7-7700K	14.7 - 14.9ns	36.8 - 45.1ns	42.63ns	35.84 GB/s

We are in the midst of a broader set of tests to quantify how these modes impact memory latency and bandwidth, among other factors. Stay tuned.

MORE: Best CPUs

MORE: Intel & AMD Processor Hierarchy

MORE: All CPUs Content

Current page: Game Modes & Architecture, Infinity Fabric Latency Testing

Prev Page Threadripper Makes An Entrance Next Page TR4 Socket, X399 Chipset & Test Setup

TOPICS

Paul Alcorn is the Editor-in-Chief for Tom's Hardware US. He also writes news and reviews on CPUs, storage, and enterprise hardware.

156 Comments Comment from the forums

I just looked at gaming benchmark and stopped reading there because as i thought Intel CPUs are killing Thread Ripper in gaming. As far as content creation, naturally having 16/32 setup will be faster than Intel 10/20 but again do you really need more than 10/20 cores. I don't and i heavily use PC for gaming, programming, web design, video/audio encoding. Overall Intel 7900x is better value and all around CPU. But if you are just in gaming 7700k is just enough.

Thanks for review, and hello x299 platform.

Gaming vs. Content Creation mode through Software is just another big NO NO to me knowing how crappy AMD software is. I assume the most people will keep it in Game Mode and leave it as it is.

I appreciate that AMD brought this CPU for $999 with so many cores, helps competition but again there is nothing to drool over here in my book. AMD didn't bring any significant performance bump core vs. core basis. In fact AMD single core performance still sucks which means when Intel releases 10+ core CPU it is going to fun to watch.

Two things i am interested the most is Coffee Lake product and IPC improvement there and possible price adjustment with Core i9.

Reply
Quaddro

Hold up breath..
Reply
Quaddro

Hold up breath more...
Reply
Kai Dowin

I'm truly impressed to see 16 Zen cores consuming as much power as only 10 Skylake-X ones. Bravo, AMD!
Reply
20045233 said:
I'm truly impressed to see 16 Zen cores consuming as much power as only 10 Skylake-X ones. Bravo, AMD!

I am not knowing that Intel is running higher frequency.

Reply
JamesSneed

20045197 said:
I just looked at gaming benchmark and stopped reading there because as i thought Intel CPUs are killing Thread Ripper in gaming. As far as content creation, naturally having 16/32 setup will be faster than Intel 10/20 but again do you really need more than 10/20 cores. I don't and i heavily use PC for gaming, programming, web design, video/audio encoding. Overall Intel 7900x is better value and all around CPU. But if you are just in gaming 7700k is just enough.

Thanks for review, and hello x299 platform.

Gaming vs. Content Creation mode through Software is just another big NO NO to me knowing how crappy AMD software is.

I love Intel even more...all you have to do pop CPU in and shit works and it works well.

I guess if gaming is why you were reading the Threadripper review then you are right it isn't as good as Intel's offerings but did you honestly expect any other result? I don't know why reviewers even do gaming tests on any CPU over 8 cores as it is mostly pointless. If you are doing scientific, encoding, professional tasks in just about every use case that is multi threaded it is blowing away every Intel offering. Of course that may change once there are 12-18 core Intel parts. However spending $1000 for a CPU is a bargain for those than can use it and never in history could you get a 16 core consumer part with this type of multi-threaded performance.

Reply
Lyden

Thank you for this review. I was seriously considering Threadripper. Looks like the 7700k is still the sensible choice for the price when gaming.
Reply
Kai Dowin

@FREAK777POWER And delivering higher multi-threaded performance with these lower clocked cores. Do you know what that's called? Efficiency.
Reply
redgarl

This chip is designed for heavy calculation multithreading, it is not made for gaming, however it is working well with 1440p and 2160p.

By the way, who in their mind will buy a 16 core CPU and play at 1080p with a 1080 TI... seriously, these 1080p bench are a joke and don't represent reality...

"A standard or point of reference against which things may be compared." Oxford

1080p with 1080 TI with a 16 core processor is not a point of reference at all.
Reply
Pompompaihn

Who are you people that come here and <ModEdit> about gaming performance on these chips??

Threadripper is the F250 of CPUs. It's not the fastest, but it's plenty fast for 99% of your tasks, and if you need to haul a 12,000 pound trailer it'll do that, too. This is for people who do a lot of WORK on their machine but also game on the side.

<Moderator Warning: Watch your language in these forums>
Reply

Show more comments