AMD Ryzen Threadripper 1950X Game Mode, Benchmarked

Testing Ryzen's Infinity Fabric & Memory Subsystem

Infinity Fabric Latency And Bandwidth

The 256-bit Infinity Fabric crossbar ties the resources inside of a Zeppelin die together. Tacking on a second Zeppelin die to create Threadripper introduces another layer of the fabric, though. Cache accesses remain local to each CCX, but a large amount of memory, I/O, and thread-to-thread traffic still flow across that second layer.

It didn't take long for enthusiasts to figure out that AMD's Infinity Fabric is tied into the same frequency domain as the memory controller, so a memory overclock reduces latency and increases bandwidth through the crossbar. Performance in latency-sensitive applications (like games) consequently improves.

SiSoftware Sandra's Processor Multi-Core Efficiency test helps us illustrate the Infinity Fabric's performance. We use the Multi-Threaded metric with the "best pair match" setting (lowest latency). The utility measures ping times between threads to quantify fabric latency in every possible configuration.

The intra-core latency measurements represent communication between two logical threads resident on the same physical core, and as we can see, disabling SMT eliminates that measurement entirely. For the remaining setups, tuning reduces latency by a few nanoseconds. But this is attributable to higher clock rates. As we've seen in the past, increased memory frequencies have little effect on intra-core latency.

Intra-CCX measurements quantify latency between threads on the same CCX that are not resident on the same core. Increasing the clock rate yields larger ~6ns latency reductions.

Cross-CCX quantifies the latency between threads located on two separate CCXes, and we see a similar reduction thanks to overclocking. Notably, the Ryzen 7 1800X features much lower Cross-CCX latency than the stock Threadripper and most overclocked configurations. This is likely due to some form of provisioning, possibly in the scheduling algorithms, for Threadripper's extra layer of fabric.

As we can see, the overclocked Threadripper CPU in Game mode, which doesn't have an active fabric link to the other die, has the lowest Cross-CCX latency.

Die-To-Die measures communication between the two separate Zeppelin dies. Game mode effectively disables the second Zeppelin die at an operating system level, eliminating die-to-die latency entirely. The second die's uncore is still active though, which is necessary to ensure its I/O and memory controllers are still accessible.

Creator mode suffers the worst die-to-die latency, but tuning reduces it considerably. The two SMT options (on and off) receive large reductions from our overclocking efforts as well.

The utility measures fabric bandwidth too, which is critical for performance since data fetches from the remote memory also flow across the fabric. As such, AMD over-provisions the fabric and memory subsystem to optimize the distributed memory architecture.

Both the Creator mode and Local/SMT configurations offer the best fabric bandwidth, enjoying big boosts from overclocking. The Ryzen 7 1800X falls into the middle of the chart alongside Threadripper's Game mode, which is logical considering they are both effectively 8C/16T processors. Disabling SMT but leaving both dies active (Local/SMT off) yields a unique profile that provides higher performance with larger accesses and lower performance with smaller accesses.

Cache And Memory Latency

We tested with DDR4-2666 memory at stock settings and increased to DDR4-3200 for our overclocked configurations.

The Translation Look Aside Buffer is a cache that reduces access times by storing recently accessed memory addresses. Like all caches, the TLB has a limited capacity, so address requests that land in the TLB are "hits," while requests that land outside of the cache are "misses." Of course, hits are more desirable, and solid prefetcher performance yields higher hit rates.

Sequential access patterns are almost entirely prefetched into the TLB, so the sequential test is a good measure of prefetcher performance. The in-page random test measures random accesses within the same memory page. It also measures TLB performance and represents best-case random performance (this is the measurement vendors use for official spec sheets). The full random test features a mix of TLB hits and misses, with a strong likelihood of misses, so it quantifies worst-case latency.

Regardless of the memory access pattern, the smallest data chunks fit into the L1 cache. And as the size of the data increases, it populates the larger caches.


L1
L2
L3
Main Memory
Range
2KB - 32KB
32KB - 512KB
512KB - 8MB
8MB - 1GB

Threadripper 1950X features better L2 and L3 latency than the Ryzen 7 1800X with every type of access pattern. Also, we spot notable latency reductions via overclocking for Threadripper's L1, L2, and L3 caches.

That changes as the workload flows out to main memory. Threadripper's Creator mode (the default setting) has the highest latency with every access pattern. This is a direct result of memory accesses landing in the remote memory. Our in-page measurements mirror AMD's 86.9ns specification, but worst-case full random access exceeds 120ns. Overclocking the processor and memory lowers latency, but Creator mode still doesn't overtake any of the configurations we compare it to. 

Switching into NUMA mode with the Local setting improves main memory access dramatically for the other configurations. We measure ~60ns for in-page near memory access, again in line with AMD's specifications, while worst-case latency weighs in at 100ns.

Cache Bandwidth

Each CCX has its own caches, so a Threadripper CPU features four distinct clusters of L1, L2, and L3 memory. Our bandwidth benchmark illustrates the aggregate performance of these tiers. 

During the single-threaded test, Ryzen 7 1800X demonstrates lower throughput than the Threadripper processors. The other configurations clump together in familiar stock and overclocked groups.

The multi-threaded tests are more interesting; we see Ryzen 7 1800X and the two Threadripper Game modes fall to the bottom of the chart. Because Game mode disables the cores on one die, it effectively takes the corresponding cache out of commission.

MORE: Best CPUs

MORE: Intel & AMD Processor Hierarchy

MORE: All CPUs Content

This thread is closed for comments
25 comments
    Your comment
  • beshonk
    "extra cores could enable more performance in the future as software evolves to utilize them better"

    I can't believe we're still saying this in 2017. Developers suck at their job.
  • sztepa82
    "AMD aims Threadripper at content creators, heavy multitaskers, and gamers who stream simultaneously. It also says the processors are ideal for gaming at high resolutions. Ryzen Threadripper 1950X isn't intended for playing around at low resolutions, particularly in older, lightly-threaded titles. ____Still, we tested at 1920x1080____ ...."

    Thank you for being out there for us, Tom's, no other website has ever done that. The only other thing we can hope for is that you'll also do a 2S Epyc 7601 review playing Project Cars in 320x240.
  • shrapnel_indie
    Quote:
    Each change requires a reboot, chewing up precious time as you save open projects, halt conversations, and try to remember which web browser tabs to relaunch.


    not if you're running the right browser with the right options active. Firefox can remember the last tabs you had open and reopen them upon startup... of course this is within the last Firefox window closed, and you have to properly exit. (no killing the thread(s).)
  • soulg1969
    When I go to pause the video (ad) your site takes me to another tab. Bye, bye.
  • Yuka
    Since this CPU (and Intel's X and XE line) are aimed for big spenders, when are you guys going to test multi GPU in these CPUs?

    Also, you mentioned streaming as part of the big CPU charisma, but there was no actual test with it. Why not just run OBS with the same software encoding settings for each platform and run a game? It's not that hard to do, is it?

    Cheers!
  • Dyseman
    Quote- 'When I go to pause the video (ad) your site takes me to another tab. Bye, bye.'

    It's easy enough to disable the JW Player with ublock. Those videos are not considered ads but adblockers, but you can tell it to block anything that uses JW Player, then whitelist any other site that needs to use it for NON-ADs.
  • rhysiam
    Thanks for this investigation Toms, really thorough and interesting article.

    It's interesting and a little disappointing that an OC to 3.9Ghz seems to pretty consistently achieve a small but measurable bump in gaming. The 1950X can use XFR to get to 4.2Ghz on lightly threaded workloads. Obviously in well-threaded games the CPU isn't going to be able to sustain 4.2Ghz, but it's a bit disappointing it can't manage 3.9-4ghz across the 4-6 cores used in gaming workloads. In fact, judging from the results it seems to be sitting around 3.7-3.8Ghz or so in most games. That seems low to me. There should be plenty of thermal and power headroom available to to get 4-6 cores up to nice high clocks, which should be enough cores for pretty much every game in the suite (except perhaps AOTS). If that was happening we'd see the OC making no difference, or even perhaps causing a slight performance regression in games (like it does in synthetic single-threaded tests). But clearly that's not the case.

    It seems to me that AMD's power management implementation is resulting in some pretty conservative clock speeds in the 4-6 core workload range. That has implications outside of gaming as well, because 4-6 thread workloads are quite common even in the productivity and content creation space. It's hardly a deal breaker (we're only looking a couple of hundred mhz), but I'm curious whether others think AMD is giving up a little more performance than they should be here? Or am I missing something?
  • jdwii
    1287211 said:
    Thanks for this investigation Toms, really thorough and interesting article. It's interesting and a little disappointing that an OC to 3.9Ghz seems to pretty consistently achieve a small but measurable bump in gaming. The 1950X can use XFR to get to 4.2Ghz on lightly threaded workloads. Obviously in well-threaded games the CPU isn't going to be able to sustain 4.2Ghz, but it's a bit disappointing it can't manage 3.9-4ghz across the 4-6 cores used in gaming workloads. In fact, judging from the results it seems to be sitting around 3.7-3.8Ghz or so in most games. That seems low to me. There should be plenty of thermal and power headroom available to to get 4-6 cores up to nice high clocks, which should be enough cores for pretty much every game in the suite (except perhaps AOTS). If that was happening we'd see the OC making no difference, or even perhaps causing a slight performance regression in games (like it does in synthetic single-threaded tests). But clearly that's not the case. It seems to me that AMD's power management implementation is resulting in some pretty conservative clock speeds in the 4-6 core workload range. That has implications outside of gaming as well, because 4-6 thread workloads are quite common even in the productivity and content creation space. It's hardly a deal breaker (we're only looking a couple of hundred mhz), but I'm curious whether others think AMD is giving up a little more performance than they should be here? Or am I missing something?


    Ryzen hits a certain point in return pretty darn fast for example CPU might only use 1.15V to get 3.6ghz stable but 3.9ghz needs like 1.3V way to much.
  • papality
    506869 said:
    "extra cores could enable more performance in the future as software evolves to utilize them better" I can't believe we're still saying this in 2017. Developers suck at their job.


    Intel's billions had a lot to say in this.
  • tacobravo
    You need to label your graphs
  • facts.seeker.2020
    The only processor of AMD thread ripper family worth buying is the 16 core 1000$, it's good to finally see amd make high end that outperforms intel processor, the 12 core or the 8 core thread ripper don't seem worthy, if you're gonna pay 400$ for motherboard and 250$ for memory and have the money to pay 800$ for 12 core processor you might as well pay extra 200$ to get 16 cores, 8 core processor is kind of a joke why would someone pay AMD 200$ over ryzen 7 to get PCI-E lanes and quad channel when intel offering them with no extra fee, 6800K priced at 300$ 6850K priced at 410$ on amazon, amd made good 1000$ processor and competes well against intel's mainstream and entry level processor, but somehow failed to fill 350-900 price range, also AMD's 1000$ 16 core processor worth considering if you're truly need it for professional heavily multithreaded programs, but at gaming intel high end processors still the fastest and even cheaper, we all know games aren’t optimized for new intel skylake x mesh architecture or AMD's ryzen processors, and games still perform better on intel's broadwell-E/Haswell-E processors, therefore an overclocked broadwell-E/Haswell-E processor are the best choice for high end gaming machine, broad well-E processors became very tempting after the price cut, 28 to 40 PCI-E lanes, quad channel memory, 6 cores overclocks better than AMD processors, more efficient compared to thread ripper and intel's skylake x, games optimization, the mother board cheaper than x399 and x299 within 230$ price range which is kind of affordable, honestly never been interested to spend over 400$ on processor for gaming machine, you better spend the extra cash on m.2 or second graphics card, though that might change when intel release 8 core processor on z390.
  • mitch074
    2554400 said:
    [...]the 12 core or the 8 core thread ripper don't seem worthy,[...]ryzen 7 to get PCI-E lanes and quad channel when intel offering them with no extra fee[...]at gaming intel high end processors still the fastest and even cheaper,[...]more efficient compared to thread ripper and intel's skylake x

    Hold on and catch your breath.
    12- and 8-core at AMD's are for those who need a lot of PCI-E lanes who don't want to pay full price for the 16-core version; for much cheaper, a B350 based board with Ryzen 7 will do the deal very well, in a price area where Intel is barely now announcing 6-core processors.
    If you only play current games, then yes, Intel CPUs still lead - albeit by much less than right after Ryzen came out as now game developers cater to Zen and Intel's noticeable lead is nowadays in the 1-2% range (if that).
    As for more efficient, I guess you mean Instruction Per Clock; careful though, Ryzen pretty much matches up with Haswell. Because power-wise, if your application is using the cpu's full capabilities, AMD's 8-core design destroys anything Intel offers as of now in a similar price bracket ($300-$350) except in AES-heavy computing.
  • killerchickens
    It would be nice to see some multitasking benchmarks.
  • ramkrishna2910
    "Some games simply won't load up when presented with Threadripper's 32 threads. That's right, AMD's flagship broke a few titles. The same thing will happen to Intel when its highest-end Skylake-X chips surface shortly. "
    Can you please indicate which are these games that crash because of higher core counts.
  • Thom457
    Every time I see another gaming benchmark with these high dollar, high core count CPUs I wonder what the relevance of this kind of testing is? Since FPS is the only real measure of "game" performance ever shown ( or possible I guess ) it is obvious from such testing that no game actually takes much advantage of all these Cores. Its obvious because the FPS shown simply doesn't scale with the increase in Cores from the lower end to the higher end models of the same CPU base (4, 6, 8, 12, 16 etc Cores). Cutting the 1950X in half in the Game mode makes that point rather well. If the actual CPU utilization levels of these benchmarks were shown with the FPS rating what would be seen is where the real problem is.

    It is no secret that in such a competitive market (games) the code base has been optimized for a very long time for Intel stuff. The bias this brings is not unknown to someone with the back ground in trying to code to get what the CPU can deliver for the buck. Every time a 7700K is thrown in the mix it is demonstrated that what is coded for is very limited in Core Count, not well threaded because the market to sell to for "games" is simply not supportive of much beyond 2 Cores most of the time. If you code for a 7700K quad for reasonable performance as your base you exclude the overwhelming bulk of CPU models sold which happen to be Intel models too. If the 7700K gives you beyond detectable FPS rates for the human eye to appreciate what value is adding more Cores to that to give you more? If a high end I5 will give you more FPS rates than you can detect the difference in why buy the high end I7 to gain little to no difference in game FPS?

    For a metric to have broad value it must be pegged against something of value. Which has more value to a job holder, 150,000,000 40 hour jobs or 300,000,000 20 hour jobs paying the same hourly rate? Same for games using FPS as a metric. When does increasing FPS become moot at a given res and color depth? A game that needs anything beyond a nominal 4 Core I5 today has a tiny market to work with. Who needs a 6, 8, 10, 12, 14, 16, 18, 20, 22 Core CPU from Intel or AMD to play commercial FPS games today? No one.

    Not to make too fine a point here but Tom's may not like synthetic 3D benchmarks but what these popular gaming benchmarks really show is how inefficient the coding is in these 3D accelerated games. The cost of the game is almost nominal when you consider what it takes to lift FPS a meaningful amount between the CPU model and Graphic Card cost and then there is that can you actually see the difference?

    The articles are interesting. They provide some value but on balance the thrust of the testing doesn't serve the interests of these high Core Count CPU models much if your primary reason for being interested in this is playing a very select list of "games". I spend most days at work trying to raise CPU utilization in a large multi-Core highly threaded environment running a mix of processes that don't play nice together and the Law of Diminishing returns sets in as soon as you add one more Core to the thread count. Four Cores don't cut your run times 75% over a single Core. Eight Cores doesn't cut your Four Core time in half. My 4790K is overkill for everything I run on it but the Threadripper I'll build next year will do things it can't approach.

    Enough please demonstrating that no game needs any of these high end multi-core CPU models. I'm with the person that said he wanted to see a two socket Epyc 7601 (64 Cores) running Project Cars at 320 x 240 (color depth 8) but instead PAC MAN will really make my day. My second choice would be to see a really exhaustive list of desktop applications that actually take advantage of what there is vs. 2-4 Cores optimized for single purpose application running alone with diminishing return for the effort.
  • Olle P
    * Why not test some of the games that won't run with 32 logical cores?
    - "Game mode" better than SMT off for those?

    * Please present the actual number of threads actively used by each of the tested games. (Should be a standard feature for all reviews including games nowadays, I think.)
  • zippyzion
    I'm just going to say that since the days of the < 15 second boot times has come, rebooting has become far less objectionable.

    Think what you might about it, but rebooting takes most of my systems about a little over a minute. Back in the day a reboot would put you out of commission for 10 minutes in some cases. It just isn't a big deal anymore.
  • spikester
    "I can't believe we're still saying this in 2017. Developers suck at their job."

    Most, but not all. Radian is fully CPU and GPU parallel at everything. It will use all 16 cores and 32 hyperthreads. The free Viewer version is fully CPU parallel at http://manifold.net/viewer.shtml There are some nice speed demo videos of what you get with full parallelism in the Gallery.
  • Olle P
    663669 said:
    ... Back in the day a reboot would put you out of commission for 10 minutes in some cases. ...
    Just starting Windows NT 3.4 on a 386 took more like 25 minutes from "power on" to "ready for use". A reboot would add a couple of minutes for the initial shutting down on top of that...
  • kiniku
    Just using one of these for home use, with mostly gaming is overkill, switching between the different modes is overkill, and splicing minute performance differences between them is too. Seriously. AMD's claims on who would need/use this are very overstated. What this really is is for deep pockets gaming nerds to gain bragging rights. The cost/benefit delta of a higher end Threadripper system is negligible for realistic uses. Even self-proclaimed "power users". I recall Toms CPU articles that would say that for gaming an Intel 4 series CPU was high end. OK, let's go to a 7 series quad core to stay current. 16 cores? People that would effectively use that kind of CPU power don't have time for gaming.
  • neosematic
    Can you please start bench marking highly CPU intensive games like Arma 3, or other sims. My 3570K @ 4.5ghz doesn't hold a candle to my brother's 7700k or whatever he has even though we both use a 1080ti.
  • kogashuko
    Why the heck are you all testing BF1 in DX11 mode? Why would anyone run it in that. Also, the whole point of DX12 is to let you use more cores. Why would you test threadripper in DX11. Multiple sites are doing this and it pisses me off. Stop running DX11.
  • kogashuko
    Stop running DX11 on high core count processors when they support DX12. Let DX11 die. Stop being bent because your vulkan didnt make it.
  • YoAndy
    Of course AMD want us to use their CPU's at higher resolutions because at low resolutions, the Intel advantage is significant, and it trails off as you run games at higher resolutions and detail settings only (because the graphics card becomes the performance bottleneck) duh. The intel core i7 7700K outpaced the 1700 by around 40fps in the CPU-intensive game Ashes of the Singularity test.
    The 7700K’s good performance continued to single-threaded tests: its 472-point result in POVRay was easily ahead of the Ryzen chip’s 315-point result, and it is nearly 60 points better in Cinebench. The Core i7 is a better overclocker than AMD, too, and its power consumption isn't much higher than the Ryzen 7 1700.The only area where the Core i7-7700K falls behind is in multi-threaded benchmarks

    Threadripper makes no sense. If you want to build a high-end gaming PC get the i77700K and if you want to fancy yourself as a game streamer, the 1800X is a superb choice. With loads of cores available for video encoding and gaming simultaneously, it’s an excellent option. It’s pricier than the Core i7-7700K, sure, but you get extra versatility from those extra cores alongside top-notch speeds in many tests. For Gaming and streaming of course Intel has way better options than AMD, like the 8 core i7 7820X or the 6 core i7 7800X, but when price is considered, the 1800X is one of the best high-performance processor on the market right now.


    For gaming you can also check--- http://www.trustedreviews.com/guide/best-cpu-for-gaming#5R5FzfVG0BBpt4wQ.99