Nvidia PhysX Software is Ancient, Slow for CPUs
PhysX for CPUs is built on x87. Not the best choice on modern day CPUs, it seems.
Nvidia's acquisition of Ageia in 2008 was a strategic move to boost the marketability of its GPU offerings. With the discontinuation of the dedicated PhyX boards, the acceleration moved to the GeForce GPU as a differentiation factor that set it apart from AMD's ATI cards.
If a PhysX game detected the presence of an Nvidia GPU, it would move the hardware physics to the video card. Without an Nvidia board, the physics would hit the CPU, which in all cases is slower than what a GPU can do.
It's expected that Nvidia would like to do everything it can to distance itself from the CPU and the GPUs of its competitors, but closer looks at the PhysX software implementation have shown that there could be some shadiness going on.
An excellent investigation by David Kanter at Real World Technologies found that Nvidia's PhysX software implementation for use by CPUs still uses x87 code, which has been deprecated by Intel in 2005 and now has been fully replaced by SSE. Intel supported SSE since 2000, and AMD implemented it in 2003.
The x87 code is slow, ugly, and remains supported on today's modern CPU solely for legacy reasons. In short, there is no technical reason for Nvidia to continue running PhysX on CPUs using such terrible software when moving to SSE would speed things considerably – unless that would make the GeForce GPGPU look less mighty compared to the CPU.
Ars Technica's Jon Stokes confronted Nvidia about deficient PhysX code and we are just as surprised as he was that Mike Skolones, product manager for PhysX, said "It's a creaky old codebase, there's no denying it."
Nvidia defends its position that much of the optimization is up to the developer, and when a game is being ported from console to PC, most of the time the PC's CPU will already run the physics better than the console counterpart. The 'already-better' performance from the port could lead developers to leave the code as-is, without pursuing further optimizations.
"It's fair to say we've got more room to improve on the CPU. But it's not fair to say, in the words of that article, that we're intentionally hobbling the CPU," Skolones said. "The game content runs better on a PC than it does on a console, and that has been good enough."
Another problem is that the current PhysX 2.7 codebase is very old; so old, in fact, that it goes back to before 2005, when x87 was deprecated. Skolones said that Nvidia is working on version 3.0 which should bring things up to date a little bit, though we don't doubt that the GPGPU functions will still be faster.
This isn't the first time that we've heard that Nvidia's PhysX software is less than well-optimized. AMD accused Nvidia of disabling multi-core support in CPU PhysX earlier this year.

DirectCompute / OpenCL >>> PhysX
But with multi-core CPUs the norm now, why would I want to dedicate part of my graphics computing power for physics? Rhetorical question; I wouldn't. I'd rather have the game written to be optimized for multiple cores, and perhaps dedicate one core to physics, if it would help. Seems silly to have CPU cores idling while the GPU does double duty.
seriously need better standardization and a massive spring clean with the x86 code base and all extensions since, it could bring a lot of performance improvements with very little expenditure on cpu resources much like optimizing bad code.
Right. You are soo motivated to make it work well even without nVidia's mini-ovens.
I really wouldn't be surprised if AMD eventually brought about and advocated the development of a new "Open-PL" physics standard or something.
DirectCompute / OpenCL >>> PhysX
DirectCompute/OpenCL/CUDA != PhysX. They serve different purposes.
Silly to say nvidia is deliberately hobbling physx on the cpu, though; given that ALL the physx code is old (by the author's admission) it's just old code. I'm an nvidia fan but the truth is that physx is so poorly adopted among game developers it wouldn't bother me in the least to just see it go away.
- Pentium Pro
- Pentium II
- early K7: Athlon (up to 1200 MHz) and Duron (up to 950 MHz)
So, at the time the software-only implementation of PhysX was written (around 2005, I'd say), there were still some SSE-less machines around.
Of course, Nvidia has no excuse about not making an SSE-optimized build until now, will x87 fallback.
I do agree with your premise, but not the language you use. Optimizing your code to run the SSE instructions can give you about 3x to 4x speedup. However, it takes effort. NVIDIA would rather spend the resources on the parts of PhysX that sell their GPUs than to maintain the CPU compatibility code.
This strategy may backfire, since they still make money by licensing the PhysX technology to game developers. If they cripple their physics engine for non-NVIDIA setups, they will lose revenue and market share to Havok. Developers want their games to work on as many configurations as possible. If they can't have what they want from NVIDIA, they will go to someone who will provide it.
But with multi-core CPUs the norm now, why would I want to dedicate part of my graphics computing power for physics? Rhetorical question; I wouldn't. I'd rather have the game written to be optimized for multiple cores, and perhaps dedicate one core to physics, if it would help. Seems silly to have CPU cores idling while the GPU does double duty.
When there is one open standard it will become mainstream! Until then it is still sideline to the main game!
Nvidia could do better here by working to get their PhysX as main standard used by correct code development and open licensing with incentive!
I see PhysX going away in the long run if they continue to deal with it so close like they are now being replaced with a more open implementation.
I was talking about their usefulness in game engines for physics simulations.
And DirectCompute/OpenCL/CUDA can do everything PhysX can, but not the other way around, that's why they are better than PhysX.
Also, CUDA is not as good as OpenCL and DirectCompute for games because they are excusive to nVidia cards which don't have the same performance/price ratio as ATI cards.
Havoc for me.
That Id like, but if AMD is gonna be as slow as in the past, and with the HD 6xxx series marked for last half of 2011, I dont think its gonna happen anytime soon
Taking the two primary points of the investigation:
1: PhysX isn't multithreaded by default
2: X87 is old and depriciated
My response:
1: DirectX, OpenGL, C++, JAVA, etc are not multithreaded by default
2: While NVIDIA's implementation isn't the best, it should be realativly simply for developers to replace the offending code with SSE instructions. In short: Implementation is up to the developer [which is the same design concept that DirectX follows]
Nothing to see here.