Part 1: DirectX 11 Gaming And Multi-Core CPU Scaling

Interview With Slightly Mad Studios & Project CARS

We like to vary the genres we benchmark with, and simulators are up there as one of the most important. Project CARS is controversial for the performance discrepancy that appears between AMD and Nvidia graphics hardware. However, it’s one of the best-looking racing games out there. And for our comparison, the GPU wars don’t matter—CPUs are our variable today.

CARS was introduced back in 2015, making it one of the newer sims at our disposal. We knew it was optimized for multi-core CPUs, but turned to Ged Keaveney of Slightly Mad Studios for a deeper understanding of the developer’s approach.

Tom’s Hardware: Can you explain what SMS has done to optimize for multi-core CPUs? Where do the gains come from?

Ged: Our engine is built with the scalable use of multi-core CPUs in mind. We can break down code into discrete chunks at a number of different levels. At the higher level, we have code managers that can be assigned to controllers, which in turn can be assigned to any thread. At the lower level, we have task managers that are used for smaller chunks of logic/data processing. These are used to execute lots of smaller tasks and they can be spread over any number of cores.

On consoles, when it comes to thread assignments for full threads, controllers, or task managers, we assign them to specific cores, while on the PC we leave it to the OS scheduler to decide. There is more variance in thread topology on a PC and any number of other background tasks/processes running, so it's best to avoid specific thread affinity on the PC in our experience.

The performance gains come from splitting up tasks that can be done in parallel with each other. For example, the scene graph gather tasks, which read through the scene to find the parts visible to a certain camera, are ideal candidates for running alongside a more logic-intensive task, and because we need to gather scene data for a large number of different cameras every frame (for main scene, rear-view mirror, 4 x shadow buffers, 2 x environment maps, and reflection buffer) there is a lot of scope for parallelization.

The larger the display being rendered, the more work the GPU does–-but with no change in the CPU workload. This is why you find the benefit of multiple cores evaporates as you increase the display resolution.

Tom’s Hardware: Is there a cap to the number of physical/logical cores you can take advantage of in CARS?

Ged: In theory, no. However, the benefit of more cores will level out due to factors like being GPU-bound or when thread dependencies become a factor. There is a saturation point eventually.

It’s also worth considering this on a purely theoretical basis. If you have eight cores, you could reduce your processing time to 12.5% if everything shared perfectly, saving you 87.5% compared to one core. But if you add another eight cores, that only takes you down to 6.25%, only saving you a tiny amount. In fact, the biggest saving comes from the first few cores you add, because there will always be work for them to do.

It’s also important to remember that you never get 100% efficiency. Suppose you want to build a house, you have four brick-layers, and it takes 20 days. Then you get 20 brick-layers. It won’t take four days though, because they will get in the way of each other and not work efficiently. The same applies to processor cores—maybe even more so—because they all share the same limited resources, such as memory bandwidth.

Tom’s Hardware: How is what you’re working on now (next-gen engine/project) different in this regard compared to Project CARS? If you’re working with DX 12/Vulkan, does the API change the impact of host processing performance?

Ged: We're breaking down some logic into smaller chunks to benefit from the finer-grained task manager. Some areas that could not be split up before because we supported DX 9 are now able to be split into smaller work tasks, and we're also adding some new features to the game, which take advantage of the small task system to spread the workload.

Tom’s Hardware: Given the choice between a quad-core Skylake CPU operating at higher clocks and slightly better IPC or 10-core Broadwell-E thermally limited to lower frequencies, which do you choose for gaming and why? Are we destined to always be graphics-bound at the resolutions enthusiasts play at? Even in our 2560x1440 graphs, with all processors normalized at 3.9 GHz, quad-core Skylake is slightly faster than six-core Broadwell-E.

Ged: All tests we've run (and seen) show Skylake to be better than Broadwell-E for gaming, including our games. We don't see that changing any time soon. Most games have relatively limited CPU requirements compared to GPU. We're probably one of the more intensive on CPU (physics mainly, of course, but also AI), but still not enough to seriously stress a well-clocked quad-core Skylake-based i5/i7.

Tom’s Hardware: Are there any other informational bits you’d like to throw in that readers might like to know about your work?

Ged: Just to say that we're working with all hardware vendors to get the best performance we can across all PC configurations.

MORE: Best Graphics Cards

MORE: Desktop GPU Performance Hierarchy Table

MORE: All Graphics Content


Project CARS

The GeForce GTX 1080 is fast enough that, at 1920x1080, Intel’s 10-core -6950X can stretch its legs and push average frame rates almost 16% higher than a Skylake-based -6700K. The eight-core -6900K is just eight percent faster. By the time we drop to six-core Broadwell-E, Skylake’s IPC advantages let the four-core CPU capture a small lead.

Two cores don’t look bad in a comparison of FPS, but you can see they clearly limit what Nvidia’s Pascal architecture can do. Moreover, a look at frame times over the benchmark run show how smoothness is negatively affected compared to the four-core-plus processors.

The finishing order doesn’t change as we step up to 2560x1440, but the delta between contenders definitely shrinks. Now the 10-core config is just three percent quicker, on average, than the quad-core -6700K. Frame rate over time suggests there’s little reason to go with one solution over another, though the 10- and eight-core Broadwell-E processors outmode the six-core -6850X in certain places.

Notably, the dual-core setup barely budges compared to its performance at 1920x1080. Despite the extra graphics workload, two cores continue limiting what a GTX 1080 can do.  

Finally, at 4K, all five CPUs demonstrate similar average frame rates. The dual-core chip’s lower minimum suggests the story isn’t so cut and dried, though.

Sure enough, flipping over to frame rate over time reveals that processor’s lower performance through the start of our run. More problematic, however, are the frame time charts, which show serious variation versus the smoother cadence of every other processor. Although 3840x2160 is an FPS equalizer as a result of its demanding graphics workload, digging deeper reveals that host processing remains an influential factor. Don’t handicap your next PC with an imbalanced CPU/GPU combination.

This thread is closed for comments
55 comments
    Your comment
  • ledhead11
    Awesome article! Looking forward to the rest.

    Any chance you can do a run through with 1080SLI or even Titan SLi. There was another article recently on Titan SLI that mentioned 100% CPU bottleneck on the 6700k with 50% load on the Titans @ 4k/60hz.
  • Nolonar
    Wouldn't it have been a more representative benchmark if you just used the same CPU and limited how many cores the games can use?
  • Traciatim
    Looks like even years later the prevailing wisdom of "Buy an overclockable i5 with the best video card you can afford" still holds true for pretty much any gaming scenario. I wonder how long it will be until that changes.
  • nopking
    Your GTA V is currently listing at $3,982.00, which is slightly more than I paid for it when it first came out (about 66x)
  • TechyInAZ
    111219 said:
    Looks like even years later the prevailing wisdom of "Buy an overclockable i5 with the best video card you can afford" still holds true for pretty much any gaming scenario. I wonder how long it will be until that changes.


    Once DX12 goes mainstream, we'll probably see a balanced of "OCed Core i5 with most expensive GPU" For fps shooters. But for CPU the more CPU demanding games it will probably be "Core i7 with most expensive GPU you can afford" (or Zen CPU).
  • avatar_raq
    Great article, Chris. Looking forward for part 2 and I second ledhead11's wish to see a part 3 and 4 examining SLI configurations.
  • problematiq
    I would like to see an article comparing 1st 2nd and 3rd gen I series to the current generation as far as "Should you upgrade?". still cruising on my 3770k though.
  • Brian_R170
    Isn't it possible use the i7-6950X for all of 2-, 4-, 6-, 8-, and 10-core tests by just disabling cores in the OS? That eliminates the other differences between the various CPUs and show only the benefit of more cores.
  • TechyInAZ
    1696401 said:
    Isn't it possible use the i7-6950X for all of 2-, 4-, 6-, 8-, and 10-core tests by just disabling cores in the OS? That eliminates the other differences between the various CPUs and show only the benefit of more cores.


    Possibly. But it would be a bit unrealistic because of all the extra cache the CPU would have on hand. No quad core has the amount of L2 and L3 cache that the 6950X has.
  • filippi
    I would like to see both i3 w/ HT off and i3 w/ HT on. That article would be the perfect spot to show that.
  • littleleo
    I think the price for GTA V is setting the gold standard in game pricing $3982, and it is a little... okay a it's a lot lot lot more then I would ever pay for a game. I've bought cars for less money, ouch!
  • littleleo
    I've sold more i5 gaming systems since the 1st iCore CPUs came out up to today. It would have been nice to have at least 1 i5 I don't think we needed 4 i7s. Since the ratio to i3s and especially i5s they are a much much smaller segment.
  • artk2219
    It would be nice to see a run with AMD's FX's in the mix since they give you threads, but its at the cost of IPC, and since you can get an FX 8320e for $89.99 (or an FX 6300, but why would you bother at that price) at Microcenter, for those of us lucky enough to be near one. You can spec out the main components of your build (mobo, cpu, mem, and cooler) for $200 to $220. Or a full build without a great graphics card for $350 to $400. With a good graphics card it can be a great value, atleast once you bump the clocks on the 8320e (4.0 ghz or so).
  • footman
    Great article, very important to add the results of the dual core cpu when hyperthreading was enabled. For all of the current games requiring quad core, i believe that a dual core that has hyperthreading will work just as well then.....
  • littleleo
    387420 said:
    It would be nice to see a run with AMD's FX's in the mix since they give you threads, but its at the cost of IPC, and since you can get an FX 8320e for $89.99 (or an FX 6300, but why would you bother at that price) at Microcenter, for those of us lucky enough to be near one. You can spec out the main components of your build (mobo, cpu, mem, and cooler) for $200 to $220. Or a full build without a great graphics card for $350 to $400. With a good graphics card it can be a great value, atleast once you bump the clocks on the 8320e (4.0 ghz or so).
    Microcenter is evil I tell you, EVIL!!! Plus they are a 2 hour drive in traffic from my house, yuk! Their CPU in store specials are awesome. I bought my CPU there back in the day cheaper then I could get it at cost wholesale.
  • TerryLaze
    Seeing GPU being bottlenecked at lower resolutions and going on to test up to 4k ... genius!
    Also agree that the i3 should have been tested with both HT on and off.
  • whtfish
    Great article, but I too would like to see where the i3 with HT on would slot in.
  • AlistairAB
    Bizarre to not take the opportunity to show the i3 with HT on and off in each graph.
  • none12345
    I wouldnt touch a 2 core at this point for a gaming computer. Sure you can get away with it, but no thanks.

    4 core is enough today, but it wont be tomorrow. Its not even enough today if you do something else besides 1 thing at a time on a computer. Ie if you are playing a game and doing something else on a 2nd monitor, or video capture while playing the game, or anyhting else. You need more cores if you multitask.

    Im in the market for a new cpu, but i will not consider a quad core at this point either. Quad core has been milked by intel for WAY too long; its 6+ core or nothing for me at this point. And seeing as how intel loves to ream you for its enthusiast platform, i guess its nothing for now.

    Help me zen kanobe, your my only hope! (assuming it doesnt suck, and assuming its priced reasonably)
  • iam2thecrowe
    1519327 said:
    Wouldn't it have been a more representative benchmark if you just used the same CPU and limited how many cores the games can use?


    Good point. I think it would have been an even better idea, to under-clock the cpu's to say 2-2.5ghz, and use the lowest possible resolution. This way you completely remove any bottlenecks, and the focus would be purely on the number of cores to determine core scaling. In most of these cases, an 8 core could possibly be just bottle-necked vs a 4 core, and that is the reason you don't see the performance increase.
  • bit_user
    134065 said:
    We test five theoretical Intel CPUs in 10 different DirectX 11-based games to determine what impact core count has on performance.
    Useful data, but the interview segments make this a real gem. Thanks!!

    2229740 said:
    Great article, but I too would like to see where the i3 with HT on would slot in.
    Definitely agree. HT scaling (at least up to 4c/8t) should be the next article.
  • bit_user
    111219 said:
    Looks like even years later the prevailing wisdom of "Buy an overclockable i5 with the best video card you can afford" still holds true for pretty much any gaming scenario. I wonder how long it will be until that changes.
    Perhaps, but didn't you see this?
    Quote:
    As a side note - measuring only throughput/framerate is not the right thing to do for gaming. Framerate stability/smoothness is of equal priority. For example, a higher-clocked i5 can give higher average framerate, but lower-clocked i7 can deliver more even framerate, depending on the machine config of course.
  • spentshells
    F1 2015 Pretty good above 1080P? I wasn't impressed with it at full settings 1080p
  • bit_user
    This struck me as a rather silly argument to make, in the context of gaming:
    Quote:
    It’s also worth considering this on a purely theoretical basis. If you have eight cores, you could reduce your processing time to 12.5% if everything shared perfectly, saving you 87.5% compared to one core. But if you add another eight cores, that only takes you down to 6.25%, only saving you a tiny amount. In fact, the biggest saving comes from the first few cores you add, because there will always be work for them to do.
    It's technically correct, but nobody is going to consider gaming with a single core. So, using that as the baseline is ridiculous. Secondly, it's not like this is some render which could either take 10 minutes or 5 minutes, and you just have to decide whether it's worth the $ to save that extra 5 minutes. What we're talking about is up to 2x the throughput. So, if a game is CPU-bottlenecked, then doubling the core count could mean up to 2x framerate improvement.

    That said, he's right that the benefit of adding cores decreases as a function of the number of cores, but more by virtue of the fact that scaling is always sub-linear (assuming well-written software). To his credit, he acknowledges this in his brick-layer analogy.

    BTW, the success of the i7-6700K, on Project Cars, suggests their load-balancing isn't great. Skylake cores simply aren't that much faster than Broadwell, per clock.

    Quote:
    Remember those huge pauses that plagued the i3? They’re ironed out when The Witcher 3 has four threads to work with.
    Those actually suggest lock contention or races involving lock-free data structures. Either way, I'd chalk it up to deficiencies in the software's design. That might also go some ways towards explaining why enabling HT caused the average framerate to increase quite so much.