Thanks for taking the time to chat today. Let's start with some basics. Why don't you tell our readers a little bit about yourself and what you currently do at Nvidia?
I’m the Software Director for GPU Computing here at Nvidia. My main focus is to build and evolve a complete GPU computing platform, which includes system software, developer tools, language and compiler direction, libraries, and targeted applications and algorithms. With the help of a great team, we develop both the end-user software as well as set the direction for GPU computing within Nvidia.
Why don't we start from the beginning? I imagine that your interest in GPGPU didn't start when you were 5 years old--what were the events going on at Princeton or Stanford that really led you to discover an interest in GPGPU?
I did dabble in GPU computing in my Princeton days, experimenting with thermal convection and fluid simulation on graphics hardware, which, at the time was the SGI O2. Though things were so very constrained, it was hard to make a case for it.
I seriously started looking at GPU computing during my PhD research at Stanford. At Stanford, I, along with others in the research community realized that the natural progression of programmable graphics was the evolution of the GPU into a more general purpose processor. We wrote one of the first SIGGRAPH papers on ray tracing with DX9-class GPU hardware to help prove the point. What was so motivating about the work was that this commodity processor, which was available in everyone’s PC, was following a Moore’s law cubed performance growth rate, way faster than the CPU. This begged the question: what could a PC do if it had multiple orders magnitude more computing horsepower than today? A total game-changer for the computational sciences as well as computer vision, AI, data mining, and graphics.
What was your role with Brook?
After working on the ability to ray trace on the GPU, my research focus at Stanford switched to understanding the right programming model for GPU computing. At the time, many others had shown that the GPU was good at a variety of different applications, but there wasn’t a good framework or programming model on how one should think about the GPU as a compute device. At the time, it required a PhD in computer graphics to be able to port an application to the GPU. So I started the Brook project with the goal of defining a programming language for GPU computing, which abstracted the graphics-isms of the GPU into more general programming concepts. Brook’s fundamental programming concept was the “stream,” which was a collection of data elements requiring similar work. Brook eventually became my PhD thesis at Stanford.
Your work started with Merrimac, the Stanford Streaming Super Computer. How is this different from something like a Tesla?
Brook’s programming model concepts were applicable to more than just GPUs. At Stanford we worked on two different implementations of the Brook programming model: one for GPUs, the other for Merrimac which was a research architecture developed at Stanford. Many of the ideas pioneered as part of Merrimac did influence how GPUs could be improved for general purpose computing. It should also be noted that Bill Dally who was the principle investigator of Merrimac at Stanford, is now the Chief Scientist at Nvidia.
Did CUDA have any roots in Gelato? What was the first academic exploration of GPGPU? What about the first commercialized use?
I started CUDA while completing my research at Stanford. Nvidia was already very supportive of my research and clearly saw the potential to better enable GPU computing on the hardware side of things. I joined Nvidia to start the CUDA project in 2005. At the time, it was just myself and one other engineer. We’ve now grown the project into the organization it is today, and a central component to Nvidia’s GPUs today.
www.gpgpu.org provides a nice history of GPU computing, dating back to 2002.
Currently, AMD pushes Brook as the programing language of choice for GPGPU, whereas Nvidia has C with CUDA extensions. How would you compare the strengths/weaknesses of both?
Starting at Nvidia, we had an opportunity to revisit some of the fundamental design decisions of Brook, which were largely based on what DX9-class hardware could achieve. One of key limitations was the constraints of the memory model, which required the programmer to map their algorithm around a fairly limited memory access pattern. With our C with CUDA extensions, we relaxed those constraints. Fundamentally, the programmer was simply given a massive pool of threads and could access memory any way he or she wished. This improvement, as well as a few others, allowed us to implement full C language semantics on the GPU.