The platform, which enables developers to exploit Nvidia GPUs (as well as x86 CPUs) for general-purpose GPU computing purposes was introduced on November 15, 2006 with the GeForce 8 series. Since then, Nvidia claims (opens in new tab)to have sold more than 350 million CUDA enabled GPUs. The CUDA toolkit has been downloaded more than 1 million times and more than 500 universities around the globe are teaching CUDA classes.
CUDA was, from the very beginning, designed to drive GPUs into high-performance computing applications in military, academic and industrial environments. While it was somewhat slow to start, Nvidia has been successful as, for example, three of the five fastest supercomputers in the world now integrate Tesla acceleration cards, the primary delivery vehicle for CUDA-based accelerators. CUDA apps, which are basically created via C++ like-code with specific extensions was the first generally available high-level language to easily access the processing horsepower in widely available and relatively affordable GPUs.
CUDA, which is still positioned against open high-level platforms, especially OpenCL, survived a looming battle with Intel's canceled Larrabee graphics card and floating point accelerator, but has been frequently criticized that it is not as easy to deploy as Nvidia claims. For example, while basic access to the GPU via CUDA is considered to be relatively easy, the remaining 5 to 10 percent of performance that is hidden in a GPU can only be accessed via detailed knowledge of the architecture of the GPU, especially its memory architecture.
In June of this year, Nvidia rolled out, with a delay of more than two years, multi-CPU x86 CUDA compilers that runs CUDA code on Intel and AMD processors.