Ambition killed Caesar, and it almost killed Nvidia. Nvidia's next-generation product, GF100, is late. And it remains unclear how much additional performance it will offer over AMD’s Radeon HD 5800-series. With its Fermi architecture, Nvidia was too ambitious with its goals. If it weren’t for a strong G92 core, the company might have followed in the footsteps of S3 or Matrox. The last time this happened was NV30, and the first time it happened was NV1; Nvidia nearly went out of business.
With that said, the aggressive pursuit of high-precision computation in its products, starting with NV30’s 32-bit shaders and now Fermi’s CUDA capabilities, is probably going to pay off. At the core is CUDA.
CUDA is a marketing term that encompasses all of Nvidia’s hardware and software technology allowing non-graphics computing to be performed on the GPU. There is the core CUDA hardware architecture and then an entire ecosystem of technologies enabling developers to work with GPUs using C, Fortran, OpenCL, and DirectCompute.
If you talk to the average tech enthusiast, he’ll see C for CUDA and Fortran for CUDA as an outdated business model. Why would software developers limit themselves to supporting a single manufacturer’s product line when they could be using something such as OpenCL or even DirectCompute? After all, we’re not seeing any proprietary 3D graphics APIs in practice anymore. However, if you talk to software developers, the answer is very different.
The majority of GPU-compute applications today are built around CUDA as a result of Nvidia’s multi-year lead in terms of the software tools available. Not only do those tools support GPU-computing across the widest range of programming languages, but Nvidia has also worked on the integrated development environments required for debugging CUDA applications. One of Nvidia’s strongest wins is the Mercury Playback Engine in Adobe CS5. This is particularly important, as Adobe’s next version of its Creative Suite is expected to be the best-selling version ever due to the first implementation of 64-bit native code.
Whereas previous versions of Adobe’s software utilized OpenGL for acceleration of certain elements, the Mercury Engine is technology built on top of Nvidia CUDA to enable real-time editing of multiple high-resolution clips, including five simultaneous RED 4K clips and support for complex, temporal-based codecs such as H.264 and AVCHD. Then, when it comes to encoding the final output, Nvidia has exclusive CUDA support in Elemental Accelerator. On a dual Quadro FX 3800 setup (equivalent to a first-generation GeForce GTX 260 with 192 cores, but a 256-bit memory interface), encoding an AVCHD 1080p source to H.264 720p can be done at 40 fps. Time is money for this industry. Think about a wedding cinematographer who wants to have a "same day edit" covering the ceremony ready for an evening reception. This isn’t something they can process overnight. The faster the encode, the more time available for editing.
At the time of CS5’s development, AMD’s Stream SDK was not up to the level needed by Adobe. Though Adobe would like to support OpenCL from the philosophical standpoint of being vendor neutral, the development environment is not robust enough for its work. The jury is still out on when OpenCL will reach Adobe’s Creative Suite, and if it will happen before Nvidia can capture iPod-like market share. Additionally, though AMD offers a beta plug-in for Stream accelerated encoding of H.264 encoding, the software requires an AMD CPU and will not work with Intel processors, which represent a majority of the market.
On the scientific computing side, Nvidia enjoys considerable dominance due to its C for CUDA and Fortran for CUDA. Fortran is a dominant programming language used in scientific computing applications. Importantly, while Nvidia inked a deal with PGI to develop a GPU-accelerated Fortran compiler almost a year after AMD did (June 2009 versus November 2008), PGI’s support for CUDA is actually shipping, while AMD’s is yet to be seen. Nvidia's customers have another option in F2C-ACC, a Fortran-to-CUDA compiler developed by the National Oceanic and Atmospheric Administration (NOAA). AMD users have HMPP for Fortran, which incidentally also supports CUDA.
In addition to wider compiler support, Nvidia GPUs have the benefit of optimized math libraries that are in development or already available.These include GPU LAPACK, developed by John Humphrey and his team at EM Photonics, in partnership with NASA Ames Research Center (CULAtools). The team offers a single-precision "CULA Basic" package to everyone who is interested for free, and sells "Premium" and "Commercial" versions with more functions, double precision support, and the option to redistributable it. In addition, Jack Dongarra, who carries the titles of Distinguished Professor of Computer Science at the University of Tennessee, Distinguished Research Staff at Oak Ridge National Laboratory, Adjunct Professor at Rice University, and holds the Turing Fellowship at the University of Manchester, is working on a mixed precision GPU/CPU implementation of these math libraries to extract even more performance.Commercially-available software, such as Jacket for MATLAB, leverages Nvidia s scientific libraries to enable high-performance computation.
Remember when ATI rendered a scaled-down version of Lord of the Rings in real-time at SIGGRAPH 2002? That was an awesome tech demo. What Nvidia has done with CUDA goes to another level. Nvidia and WETA worked together to develop custom software for the movie Avatar, dubbed PantaRay. This pre-computational tool ran 25 times faster than its CPU server and was itself four times more effective than traditional renderers. This allowed the company to work with billions of polygons per frame. Not a tech demo, but a bona fide contribution to visual effects work. We’ll be seeing PantaRay in use in the upcoming Steven Spielberg/Peter Jackson film, Tintin.
The bottom line is that Nvidia has a considerable lead over both Intel and AMD when it comes to high-performance parallel computing. The investments it has made in creating viable commercial tools for GPGPU are already paying off with exclusive Adobe Creative Suite 5 support and broader adoption of CUDA among scientific professionals. If the company continues its momentum and aggressively grows the GF100-based product line, it has a chance to obtain iPod-like dominance in the market and at the very least, I think Nvidia has established itself firmly in the GPGPU world. Third place will have to go to either AMD or Intel.