Let’s take a trip back in time – way back to 2003 when Intel and AMD became locked in a fierce struggle to offer increasingly powerful processors. In just a few years, clock speeds increased quickly as a result of that competition, especially with Intel’s release of its Pentium 4.
But the clock speed race would soon hit a wall. After riding the wave of sustained clock speed boosts (between 2001 and 2003 the Pentium 4’s clock speed doubled from 1.5 to 3 GHz), users now had to settle for improvements of a few measly megahertz that the chip makers managed to squeeze out (between 2003 and 2005 clock speeds only increased from 3 to 3.8 GHz).
Even architectures optimized for high clock speeds, like the Prescott, ran afoul of the problem, and for good reason: This time the challenge wasn’t simply an industrial one. The chip makers had simply come up against the laws of physics. Some observers were even prophesying the end of Moore’s Law. But that was far from being the case. While its original meaning has often been misinterpreted, the real subject of Moore’s Law was the number of transistors on a given surface area of silicon. And for a long time, the increase in the number of transistors in a CPU was accompanied by a concomitant increase in performance – which no doubt explains the confusion. But then, the situation became complicated. CPU architects had come up against the law of diminishing returns: The number of transistors that had to be added to achieve a given gain in performance was becoming ever greater and was headed for a dead end.
- Introduction
- Meanwhile...
- Vive le GeForce FX!
- The advent of GPGPU
- BrookGPU
- The CUDA APIs
- A Few Definitions
- The Theory: CUDA from the Hardware Point of View
- Hardware Point of View, Continued
- The Theory: CUDA from the Software Point of View
- In Practice
- Performance
- Analysis
- Conclusion
- Conclusion, Continued


Nvidia's CUDA: The End of the CPU? : Read more
Anyways, NV wants to sell cuda, so why would they change to DX ,-)
Agreed. Also I predict in a few years we will have a Linux distro that will run mostly on a GPU.
http://arstechnica.com/journals/apple.ars/2008/06/18/apple-joins-working-group-to-hammer-out-opencl-spec
I just have a question, and someone might answer it (the TH is full with smart guys). My problem is that there are too many misconceptions floating around in the net regarding CUDA and overall the whole GPGU businnes.
I have seen somewhere, that these GPU's are able to do Double Precision floating point calculations, but personally i find this unlikely.
Others say that you can take directly your parallel code writen in C or Fortran90, and adopt it to CUDA, because the standard stuff can run serial on the CPU and the most computationally expensive part parallel on the GPU. On top of that you can 'adress' or cummunicate with your GPU directly from a Fortran code with sort of system calls (i think this is BS).
Quiet frankly, i have not found a site on which i can really rely on, where they show an example (source code and explanation) of how something like this could be done.
Intel is wasting time ray-tracing on a CPU and NVidia is wasting frames by folding proteins on their GPU.
"You're doing it wrong!"
It would be better for a neutral party composed of GPGPU experts from different IHVs to initiate something like what you propose, more like what the OpenGL ARB creates, a specification.
IHVs and other companies could then implement this standard on their own hardware, thus decentralizing development from the ISV. If you leave development of this type of technology up to Microsoft (or any other single developer) you'll end up with vendor lock-in, which is a Bad Thing, for all of us.
Anyway, CUDA is great but not cross-platform compatible (Intel, AMD/ATI, etc.) which makes it impossible to implement in commercial software, unless a CPU-bound alternative is provided, which would defeat the purpose of the architecture.
On a similar note: think of the choice between the PhysX SDK and Havok Physics. Do you want partial GPU accelerated physics supported by one brand (PhysX, NVIDIA G80+) or do you want to stay CPU-bound but have the same feature set regardless of the hardware (Havok)?
Tom's also forgot to point out that development is possible via emulation (emuDebug build setting, I think, with the .vcproj they give you), so anyone can get their hands dirty with the API. You don't get the satisfaction of seeing cool speedups, but it's just as educational, and easier to debug. No screen flickers