GPGPU Programming, Where Is It?
Question: The success of hybrid CPU/GPU designs like Sandy Bridge and Llano is closely tied to GPGPU programming. In the last major tech cycle, system integrators and consumers successfully adopted x86-64 processors and operating systems. Yet, potential benefits have been delayed because programmers, even today, are slow to adopt 64-bit programming. Do you think Intel and AMD can cause a major shift towards general purpose GPU programming within a year of their product launches?
- AMD will introduce its integrated graphics-equipped CPU in 2011. Intel will do so even earlier than AMD. But users still needs more time to be educated about GPU; only then can they really demand it. Consumers still think discrete graphics provide more performance and functionality.
- The 64-bit transition has been very slow and gradual. Software is always behind hardware, so we don't believe GPGPU will see any quantum leaps in the next year.
- Honestly, we have no clue. We are at the mercy of the big three: Intel, AMD, and Nvidia.
- Typically, our collaboration focuses mainly on implementing compatible hardware designs. We help drive demand through marketing, but driving the direction of demand is not within our scope.
- Hardware is always faster than software. I think that is what we are seeing with GPGPU [programming].
- I'm not really sure that it is necessarily slow. We are seeing more 64-bit programming about two years after full x86-64 adoption. If GPGPU [programming] follows suite, we should see more in 2012 or perhaps 2013.
The rise of hybrid processors brings new possibilities. Even on a system armed with integrated graphics, it is possible to see enhanced performance through the addition of some GPGPU programming. Specific tasks can be optimized on the graphics core, and even though systems with the most to gain will be those with powerful discrete graphic solutions, additional processing power can be a boon in environments that benefit most pointedly from parallelism.
By design, our question was meant to solicit the opinions on the speed of GPGPU programming adoption. Lately, progress seems to have ground to a halt (or at least, we're not hearing as much momentum behind apps optimized for CUDA and DirectCompute). Frankly, it is frustrating to see this occur. Reading through the comments from our last survey, readers seem to be in agreement. We are at a point where we have a lot of computer power, but much of the time, we aren't using it.
We also mentioned in the last survey how frustrating it was to see the slow pick-up of 64-bit programming. If you recall the emergence of 64-bit as a feature, both Intel and AMD were actively leveraging that capability as a differentiating feature. Fast forward to today. We are still lacking a concerted effort by the software development community to adopt 64-bit programming--perhaps due to a perceived lack of benefit. We still don't have a 64-bit version of Firefox, and there is no ETA on a 64-bit Flash plug-in. While the benefits of 64-bit in these two scenarios may in fact be negligible, it shows how slow the software community has been in contrast to what today’s hardware provides. Only recently did Adobe update its suite of apps to support a 64-bit architecture, and we’ve already shown the effect of that decision to be massive.
One of the key problems has been a standardized programming layer. Nvidia went with Compute Unified Device Architecture (CUDA). AMD went with Stream. And Microsoft is in the middle with DirectCompute--an attempt to standardize general purpose GPU computing across dissimilar architectures. Similar to the 64-bit extension war, this has delayed GPGPU programming adoption. CUDA was a fairly robust interface from the get-go. If you wanted to do any sort of scientific computational work, Nvidia's CUDA was the library to use. It set the standard. Unfortunately, as with many technologies in the PC industry kept proprietary, this has also limited CUDA's appeal beyond specialized scientific applications, where the software is so niche that it can demand a certain piece of hardware. That's not the case with a transcoding app or a playback utility. Even Adobe seems to have made a brave move by limiting its Mercury Playback Engine to a handful of CUDA-based GeForce and Quadro cards.
Nvidia no doubt wants to keep stressing the GPGPU capabilities rolled up into its Fermi architecture. It even hired the guy (Dr. Mark Harris) who coined the term GPGPU, which stands for "General Purpose computation on Graphics Processing Units." Unfortunately, mainstream adoption isn't going to happen without support from Intel and AMD, who probably have the biggest ability to help augment support for DirectCompute and OpenCL through large development budgets.
We have been playing with some of the CUDA framework and would love to see more mainstream adoption, but we understands the lack of progress. Looking at the big picture, a software developer would have to justify months (maybe even years) of extra programming in CUDA to get some of the GPGPU enhancements. And even then, gains are going to depend on the application.
A single GPGPU coding framework does a lot for adoption, since it allows developers to target any properly-enabled graphics card, and not just one from Nvidia. Again, this makes much more sense in the context of broad adoption. For the moment, CUDA remains the best solution if you have a lot of money, a very specific task able to benefit from parallelism, and the resources to develop with GPGPU in mind. Personally, we are enjoying Jacket for MATLAB. OpenCL and DirectCompute come close, but both give up lower-level hooks into the architecture in favor of compatibility.
Intel and AMD both need to get with the program--particularly AMD. Its much-hyped APUs are right around the corner, and it unquestionably has the advantage with regard to graphics. Intel's solution, at first blush, looks more like an evolutionary afterthought than anything that'll be capable of augmenting its processors. And to be frank, Intel's CPUs are its first priority.