Relevance of the CPU PhysX solution
Let’s first examine the fact that Nvidia currently only allows GPU-accelerated PhysX on its own graphics cards, thus forcing everyone else to calculate the PhysX instructions implemented in games using the CPU. The result for non-Nvidia gamers is usually an unplayable game when you turn PhysX on without a GeForce card installed. Obviously, the goal of this article is not to judge business decisions, but rather to understand the lack of performance experienced on systems not equipped with Nvidia graphics cards.
Why is CPU PhysX so much slower than GPU PhysX in modern games?
Assuming that a calculation can be parallelized, a GPU with its multiple shader units is faster than a conventional CPU with two, three, four, or even six cores. According to Nvidia, physics calculations are two to four times faster on GPUs than CPUs. That’s just half of the truth, though, because there are no physics features that couldn’t be implemented solely on the CPU. Quite often, games use a combined CPU + GPU approach, with the highly parallelizable calculation,s such as particle effects, performed by the GPU and the more static, non-parallelizable calculations, such as ragdolls, performed by the CPU. This is the case in Sacred 2, for example. In theory, the ratio of highly parallelizable calculations should in many cases be too low to really take noticeable advantage of the immense GPU speed.
But then why is the difference often so drastic in practice?
There are at least two reasons for this. The first one is that, in almost all of the games tested, CPU-based PhysX uses just a single thread, regardless of how many cores are available. The second one is that Nvidia seems to be intentionally not optimizing the CPU calculations in order to make the GPU solution look better. We’ll have to investigate multithreading at a later time with a suitable battery of benchmarks. Right now, we want to explore Nvidia deliberately leaving its code in a state where CPUs just can’t compete with GPUs.