Skip to main content

Part 1: DirectX 11 Gaming And Multi-Core CPU Scaling

Interview With Slightly Mad Studios & Project CARS

We like to vary the genres we benchmark with, and simulators are up there as one of the most important. Project CARS is controversial for the performance discrepancy that appears between AMD and Nvidia graphics hardware. However, it’s one of the best-looking racing games out there. And for our comparison, the GPU wars don’t matter—CPUs are our variable today.

CARS was introduced back in 2015, making it one of the newer sims at our disposal. We knew it was optimized for multi-core CPUs, but turned to Ged Keaveney of Slightly Mad Studios for a deeper understanding of the developer’s approach.

Tom’s Hardware: Can you explain what SMS has done to optimize for multi-core CPUs? Where do the gains come from?

Ged: Our engine is built with the scalable use of multi-core CPUs in mind. We can break down code into discrete chunks at a number of different levels. At the higher level, we have code managers that can be assigned to controllers, which in turn can be assigned to any thread. At the lower level, we have task managers that are used for smaller chunks of logic/data processing. These are used to execute lots of smaller tasks and they can be spread over any number of cores.

On consoles, when it comes to thread assignments for full threads, controllers, or task managers, we assign them to specific cores, while on the PC we leave it to the OS scheduler to decide. There is more variance in thread topology on a PC and any number of other background tasks/processes running, so it's best to avoid specific thread affinity on the PC in our experience.

The performance gains come from splitting up tasks that can be done in parallel with each other. For example, the scene graph gather tasks, which read through the scene to find the parts visible to a certain camera, are ideal candidates for running alongside a more logic-intensive task, and because we need to gather scene data for a large number of different cameras every frame (for main scene, rear-view mirror, 4 x shadow buffers, 2 x environment maps, and reflection buffer) there is a lot of scope for parallelization.

The larger the display being rendered, the more work the GPU does–-but with no change in the CPU workload. This is why you find the benefit of multiple cores evaporates as you increase the display resolution.

Tom’s Hardware: Is there a cap to the number of physical/logical cores you can take advantage of in CARS?

Ged: In theory, no. However, the benefit of more cores will level out due to factors like being GPU-bound or when thread dependencies become a factor. There is a saturation point eventually.

It’s also worth considering this on a purely theoretical basis. If you have eight cores, you could reduce your processing time to 12.5% if everything shared perfectly, saving you 87.5% compared to one core. But if you add another eight cores, that only takes you down to 6.25%, only saving you a tiny amount. In fact, the biggest saving comes from the first few cores you add, because there will always be work for them to do.

It’s also important to remember that you never get 100% efficiency. Suppose you want to build a house, you have four brick-layers, and it takes 20 days. Then you get 20 brick-layers. It won’t take four days though, because they will get in the way of each other and not work efficiently. The same applies to processor cores—maybe even more so—because they all share the same limited resources, such as memory bandwidth.

Tom’s Hardware: How is what you’re working on now (next-gen engine/project) different in this regard compared to Project CARS? If you’re working with DX 12/Vulkan, does the API change the impact of host processing performance?

Ged: We're breaking down some logic into smaller chunks to benefit from the finer-grained task manager. Some areas that could not be split up before because we supported DX 9 are now able to be split into smaller work tasks, and we're also adding some new features to the game, which take advantage of the small task system to spread the workload.

Tom’s Hardware: Given the choice between a quad-core Skylake CPU operating at higher clocks and slightly better IPC or 10-core Broadwell-E thermally limited to lower frequencies, which do you choose for gaming and why? Are we destined to always be graphics-bound at the resolutions enthusiasts play at? Even in our 2560x1440 graphs, with all processors normalized at 3.9 GHz, quad-core Skylake is slightly faster than six-core Broadwell-E.

Ged: All tests we've run (and seen) show Skylake to be better than Broadwell-E for gaming, including our games. We don't see that changing any time soon. Most games have relatively limited CPU requirements compared to GPU. We're probably one of the more intensive on CPU (physics mainly, of course, but also AI), but still not enough to seriously stress a well-clocked quad-core Skylake-based i5/i7.

Tom’s Hardware: Are there any other informational bits you’d like to throw in that readers might like to know about your work?

Ged: Just to say that we're working with all hardware vendors to get the best performance we can across all PC configurations.


MORE: Best Graphics Cards


MORE: Desktop GPU Performance Hierarchy Table


MORE: All Graphics Content

Project CARS

Image 1 of 5

Image 2 of 5

Image 3 of 5

Image 4 of 5

Image 5 of 5

The GeForce GTX 1080 is fast enough that, at 1920x1080, Intel’s 10-core -6950X can stretch its legs and push average frame rates almost 16% higher than a Skylake-based -6700K. The eight-core -6900K is just eight percent faster. By the time we drop to six-core Broadwell-E, Skylake’s IPC advantages let the four-core CPU capture a small lead.

Two cores don’t look bad in a comparison of FPS, but you can see they clearly limit what Nvidia’s Pascal architecture can do. Moreover, a look at frame times over the benchmark run show how smoothness is negatively affected compared to the four-core-plus processors.

Image 1 of 5

Image 2 of 5

Image 3 of 5

Image 4 of 5

Image 5 of 5

The finishing order doesn’t change as we step up to 2560x1440, but the delta between contenders definitely shrinks. Now the 10-core config is just three percent quicker, on average, than the quad-core -6700K. Frame rate over time suggests there’s little reason to go with one solution over another, though the 10- and eight-core Broadwell-E processors outmode the six-core -6850X in certain places.

Notably, the dual-core setup barely budges compared to its performance at 1920x1080. Despite the extra graphics workload, two cores continue limiting what a GTX 1080 can do.  

Image 1 of 5

Image 2 of 5

Image 3 of 5

Image 4 of 5

Image 5 of 5

Finally, at 4K, all five CPUs demonstrate similar average frame rates. The dual-core chip’s lower minimum suggests the story isn’t so cut and dried, though.

Sure enough, flipping over to frame rate over time reveals that processor’s lower performance through the start of our run. More problematic, however, are the frame time charts, which show serious variation versus the smoother cadence of every other processor. Although 3840x2160 is an FPS equalizer as a result of its demanding graphics workload, digging deeper reveals that host processing remains an influential factor. Don’t handicap your next PC with an imbalanced CPU/GPU combination.

Project CARSView Deal
  • ledhead11
    Awesome article! Looking forward to the rest.

    Any chance you can do a run through with 1080SLI or even Titan SLi. There was another article recently on Titan SLI that mentioned 100% CPU bottleneck on the 6700k with 50% load on the Titans @ 4k/60hz.
    Reply
  • Nolonar
    Wouldn't it have been a more representative benchmark if you just used the same CPU and limited how many cores the games can use?
    Reply
  • Traciatim
    Looks like even years later the prevailing wisdom of "Buy an overclockable i5 with the best video card you can afford" still holds true for pretty much any gaming scenario. I wonder how long it will be until that changes.
    Reply
  • nopking
    Your GTA V is currently listing at $3,982.00, which is slightly more than I paid for it when it first came out (about 66x)
    Reply
  • TechyInAZ
    18759076 said:
    Looks like even years later the prevailing wisdom of "Buy an overclockable i5 with the best video card you can afford" still holds true for pretty much any gaming scenario. I wonder how long it will be until that changes.

    Once DX12 goes mainstream, we'll probably see a balanced of "OCed Core i5 with most expensive GPU" For fps shooters. But for CPU the more CPU demanding games it will probably be "Core i7 with most expensive GPU you can afford" (or Zen CPU).
    Reply
  • avatar_raq
    Great article, Chris. Looking forward for part 2 and I second ledhead11's wish to see a part 3 and 4 examining SLI configurations.
    Reply
  • problematiq
    I would like to see an article comparing 1st 2nd and 3rd gen I series to the current generation as far as "Should you upgrade?". still cruising on my 3770k though.
    Reply
  • Brian_R170
    Isn't it possible use the i7-6950X for all of 2-, 4-, 6-, 8-, and 10-core tests by just disabling cores in the OS? That eliminates the other differences between the various CPUs and show only the benefit of more cores.
    Reply
  • TechyInAZ
    18759510 said:
    Isn't it possible use the i7-6950X for all of 2-, 4-, 6-, 8-, and 10-core tests by just disabling cores in the OS? That eliminates the other differences between the various CPUs and show only the benefit of more cores.

    Possibly. But it would be a bit unrealistic because of all the extra cache the CPU would have on hand. No quad core has the amount of L2 and L3 cache that the 6950X has.
    Reply
  • filippi
    I would like to see both i3 w/ HT off and i3 w/ HT on. That article would be the perfect spot to show that.
    Reply