Sign in with
Sign up | Sign in

The Evolution Of GPGPU Computing

Exclusive Interview: Nvidia's Ian Buck Talks GPGPU
By

Tell me about CUDA, the "architecture," versus the "CUDA for C" compiler.

CUDA is the name of Nvidia’s parallel computing hardware architecture. Nvidia provides a complete toolkit for programming the CUDA architecture that includes the compiler, debugger, profiler, libraries, and other information developers need to deliver production-quality products that use the CUDA architecture. The CUDA architecture also supports standard languages (C with CUDA extensions and Fortran with CUDA extensions) and APIs for GPU computing, such as OpenCL and DirectX Compute.

This diagram may help:

With OpenCL, you gain the advantage of cross-platform support, but lose automated tools, such as memory management, that are found with CUDA. It seems that as a scientist, you'd want to decrease your startup development costs, but at the same time, you'd want support for multiple platforms. What's the best way to reconcile this challenge?

There are certainly compromises that have to be made to provide a cross vendor/platform solution. Nvidia has worked from the beginning with Apple and the OpenCL working group to make sure OpenCL provides a great driver-level API layer for GPU computing, especially for Nvidia hardware. Furthermore, we will certainly provide extensions to further enable Nvidia GPU’s with OpenCL.

Nvidia is also constantly improving our C compiler and development environment for Nvidia GPUs. We have a few simple extensions to C in order to enable our GPUs. If Fortran is more your preference, there is a Fortran compiler also available. With the introduction of Microsoft’s Windows 7 this fall, users and developers will have access to the DirectCompute API, which shares many concepts with our C extensions. Nvidia seeded a DirectCompute driver to key developers last December. These are the added advantages to choosing Nvidia hardware; we support all major languages and API’s.

Your work with GPUs started in the GeForce 5 era, and we're now several generations later. Obviously, the newer stuff is faster, but what new capabilities have been introduced over this time period (i.e. IEEE-754 compliance)? What can we do now that we couldn't do before?

Early programmable GPUs were basic floating point-only programmable processors. No integer or bit operations, no general access to GPU memory, no communication between neighboring processors. The first main innovation was to provide the hardware needed for supporting C, which includes full pointer support and native data types.

Another key innovation was the addition of dedicated, on-chip shared memory, which allows processors to intercommunicate and share results, greatly improving the efficiency of the algorithms. In addition, it offered programmers a place to temporarily store and process data close to the processor, rather than going all the way out to off-chip DRAMs. Shared memory improved our signal processing library by 20x over a similar OpenGL implementation.

Finally the addition of double-precision floating point hardware also signified a key step toward GPUs as a true high performance computing product, enabling applications that required extended precision numerics. It should also be noted that memory speed and on-board memory size improvements (up to 4GB per processor and 16GB for our Tesla 1U server) has increased the scale of problems an Nvidia GPU can tackle.

What about the compiler? What kind of optimizations and innovations have been added over time?

Very early on, we recognized that we needed to build a world-class compiler solution. GPU computing programs tend to be much larger, more complex, and benefit from more complex optimization. Our competition (the CPU) had almost 40 years to get it right. Our C compiler is based on technologies from the Edison Design Group, who has been making C compilers for 20 years, and the Open64 compiler core, which was originally designed for the Itanium processor. Our compiler technology, combined with the world-class compiler team we’ve assembled, is a key part of Nvidia’s success.

Currently, most GPUs are very fast with single precision math, but less quick with double precision math. Will GPUs still provide "better than CPU" cost/performance if it weren't for the economies of scale? That is, could you make a special double precision-optimized GPU while still keeping costs low?

As the market for GPU computing clearly continues to grow, I think you will see more areas invested in double precision arithmetic. Our double precision hardware released last year was only the starting point for what I imagine will be a growing investment in GPU computing from both the industry and Nvidia.

What is your impression of Intel's Larabee? AMD's Stream Architecture? Cell? Zii?

My view of Larabee is that it is a great validation of what the GPU has achieved and an acceptance of the limitations the CPU. Where CPUs have tried to take a legacy sequential programming model and squeeze out every last bit of parallelism, GPUs were created for 3D rendering, an embarrassingly parallel application. Massive parallelism is a part of the GPU’s core programming model. 

In the end, it is the accessibility and productivity of a programming model that will take an architecture from a novelty to a success. We are all competing against a mountain of legacy code. We’ve focused on making Nvidia GPUs extremely easy to obtain orders of magnitude speedups with a familiar and simple programming model.

Display all 23 comments.
This thread is closed for comments
Top Comments
  • 10 Hide
    matt87_50 , September 3, 2009 7:21 AM
    I ported my simple sphere and plane raytracer to the gpu (dx9) using Render Monkey, it was soo simple and easy, only took a few hours, nearly EXACTLY the same code. (using hlsl, which is basically c) and it was so beautiful, what took seconds on the cpu was now running at over 30fps at 1680x1050.

    a monumental speed increase with hardly any effort (even without a gpgpu language)

    its going to be nothing short of a revolution when gpgpu goes mainstream (hopefully soon, with dx11)
    computer down on power? don't replace the cpu, just dump another gpu in one of the many spare pci16x slots and away you go, no fussing around with sli or crossfire and the compadibillity issues they bring. it will just be seen as another pile of cores that can be used!

    even for tasks that can't be done easily on the gpu architecture, most will still probably run faster than they would on the cpu, because the brute power the gpu has over the cpu is so immense, and as he kinda said, most of the tasks that aren't suited to gpgpu don't need speeding up anyway.
Other Comments
  • 8 Hide
    Anonymous , September 3, 2009 7:00 AM
    How about some GPU acceleration for linux! I'd love blue-ray and HD content to be gpu accelerated by VlC or Totem. Nvidia?
  • 10 Hide
    matt87_50 , September 3, 2009 7:21 AM
    I ported my simple sphere and plane raytracer to the gpu (dx9) using Render Monkey, it was soo simple and easy, only took a few hours, nearly EXACTLY the same code. (using hlsl, which is basically c) and it was so beautiful, what took seconds on the cpu was now running at over 30fps at 1680x1050.

    a monumental speed increase with hardly any effort (even without a gpgpu language)

    its going to be nothing short of a revolution when gpgpu goes mainstream (hopefully soon, with dx11)
    computer down on power? don't replace the cpu, just dump another gpu in one of the many spare pci16x slots and away you go, no fussing around with sli or crossfire and the compadibillity issues they bring. it will just be seen as another pile of cores that can be used!

    even for tasks that can't be done easily on the gpu architecture, most will still probably run faster than they would on the cpu, because the brute power the gpu has over the cpu is so immense, and as he kinda said, most of the tasks that aren't suited to gpgpu don't need speeding up anyway.
  • 3 Hide
    Anonymous , September 3, 2009 9:49 AM
    shuffman37: Nvidia does have gpu accelerated video on linux. Link to wikipedia http://en.wikipedia.org/wiki/VDPAU about VDPAU. Its gaining support by a lot of different media players.
  • -1 Hide
    Anonymous , September 3, 2009 3:15 PM
    NVIDIA, saying that "spreadsheet is already fast enough" may be misleading. Business users have the money. Spreadsheets are already installed (huge existing user base). Many financial spreadsheets are very complicated 24 layers, 4,000 lines, with built in Monte Carlo simulations.

    Making all these users instantly benefit from faster computing may be the road for success for NVIDIA.

    Dr. Drey
    Bloomington, IN
  • 3 Hide
    raptor550 , September 3, 2009 8:44 PM
    Although I appreciate his work... I had to AdBlock his face. Sorry, its just creepy.
  • -1 Hide
    techpops , September 3, 2009 8:55 PM
    While I can't get enough of GPGPU articles, it really saddens me that Nvidia is completely ignoring Linux and not because I'm a Linux user. Ignoring Linux stops the GPU from being the main source for rendering in 3D software that also is available under Linux. So in my case, where I use Cinema 4D under Windows, I'll never see the massive speedups possible because Maxon would never develop for a Windows and Mac only platform.

    It's worth pointing out here that I saw video of Cuda accelerated global illumination from a single Nvidia graphics card, going up against an 8 core CPU beast. Beautiful globally illuminated images were taking 2-3 minutes to render, just for a single image on the 8 core PC. The Cuda one, rendering to the same quality was rendering at up to 10 frames per second! That speed up is astonishing and really makes an upgrade to a massive 8 core PC system seem pathetic in the face of that kind of performance.

    One can only imagine what would be possible with multiple graphics cards.

    I also think the killer app for the GPU is not ultimately going to be graphics at all, while in the early days it will be, further down the line, I think it will be augmented reality that takes over for the main GPU use. Right now, it's pretty shoddy using a smart phone for augmented reality applications, everything is dependent on GPS, and that's totally unreliable and will remain so. What's needed for silky smooth AR apps is a lot of processing power to recognize shapes and interpret all that visual data you get through a camera to work with the GPS. So if you're standing in front of a building, an arrow can point on the floor leading into the buildings entrance because the GPS has located the building and the gpu has worked out where the windows and doors are and made overlaid graphics that are motion locked to the video.

    I think AR is going to change everything with portable computers, but only when enough compute power is in a device to make it a smooth experience, rather than the jerky unreliable experimental toy it is on today's smart phones.
  • 0 Hide
    pinkzeppelin97 , September 4, 2009 12:09 AM
    zipzoomflyhighIf my forehead was that big due to a retreating hairline, I would shave my head.


    amen to that
  • -2 Hide
    tubers , September 4, 2009 12:25 AM
    cpu and gpu combined? will that bring more profit to each of their respective companies? hmm
  • 0 Hide
    jibbo , September 4, 2009 1:24 AM
    shuffman37How about some GPU acceleration for linux! I'd love blue-ray and HD content to be gpu accelerated by VlC or Totem. Nvidia?


    There is GPU acceleration for Linux. I believe NVIDIA's provided a CUDA driver, compiler, toolkit, etc for Linux since day 1.

  • -6 Hide
    Anonymous , September 4, 2009 5:06 AM
    linux is almost as gay as its users, stfu noobs
  • 0 Hide
    matt87_50 , September 4, 2009 6:13 AM
    surely there is nothing stopping openCL going to linux?
  • 2 Hide
    jibbo , September 4, 2009 7:01 AM
    Matt87_50surely there is nothing stopping openCL going to linux?


    NVIDIA released Windows and Linux OpenCL drivers in June.
  • 0 Hide
    Soul_keeper , September 4, 2009 8:22 AM
    Excellent article, thank you.
  • 0 Hide
    techpops , September 4, 2009 8:40 AM
    @jibbo It's my understanding that you need directx11 to really make use of CUDA and if you wanted a cross platform app to work in Windows and Linux you'd have major hurdles adapting CUDA to work with both. Enough of a hassle that at least one huge 3D company has turned its nose up at CUDA and is just waiting for however many years until OpenCL is something worth looking at.
  • -2 Hide
    void_pointer , September 4, 2009 5:30 PM
    Quote:
    GPU computing programs tend to be much larger, more complex, and benefit from more complex optimization


    Larger and more complex than what, exactly? Please. What a load of egotistical bollocks.
  • -2 Hide
    jasperjones , September 4, 2009 6:37 PM
    Thanks THW for this one. Really interesting stuff!
  • 2 Hide
    Anonymous , September 4, 2009 8:03 PM
    Quote:
    linux is almost as gay as its users, stfu noobs

    How does using something besides Windows dictate my sexual preference? If you wouldn't mind I'd like to know your logic behind that statement.
  • 3 Hide
    sumitg , September 4, 2009 11:24 PM
    There is a GPU (CUDA) accelerator plug-in for Excel from SciComp
    http://www.scicomp.com/parallel_computing/GPU_OpenMP/
  • 0 Hide
    Anonymous , September 7, 2009 7:30 AM
    I wish he would have confirmed that 570 times faster comment from the big honcho. 570x faster in 6 years. Sounds like a load of marketing garbage to me. Then he said cpus will only be 3 times faster in 6 years.

    I agree parallel computing is looking like the wave of the future. Claiming that in 6 years you will be producing a processor that is nearly 600 times faster than the competition is ridiculous. The fact that the head of this company said that, makes anything coming out of these guys mouths sound like marketing instead of solid science.

    The fact that you had this interview and didn't feel the need to put that question to him suggests this interview is old, or you just didn't feel like asking a very pertinent question.

    Guess that's why I don't read this site much anymore. Amateur hour
  • -1 Hide
    JohnnyLucky , September 7, 2009 5:30 PM
    I read the article but I didn't understand the technobabble.
Display more comments