Benchmark Results: GPU Compute
Last week, I got my first look at AMD’s Trinity-based APU. The story kicked off with x86-based benchmarks. The architecture’s Piledriver-based cores didn’t disappoint, but they won’t be Trinity’s biggest strength, either. Then we hit graphics—an AMD forte, no question. Finally, we dipped our toes into OpenCL with WinZip 16.5 and witnessed the potential of GPU acceleration in a productivity-oriented title.
Or so it seemed.
Reader mayankleoboy1 asked me to look into the resource utilization of WinZip 16.5, and it turns out that enabling OpenCL puts minimal load on a Radeon HD 7970. In the screenshots below, our Core i7-3960X is only at a 14% duty cycle with OpenCL disabled and activity on six of its available threads. Turn the feature on, though, and much more of our host processor is utilized. While I managed to catch GPU-Z showing 5% utilization on the Radeon HD 7970, it spent far more time at 0 and 1%.
With that said, you’re only able to turn OpenCL support on with an AMD graphics card, so it remains an exclusive feature for now. And it is effective, cutting our workload in more than half. It just doesn't seem like a very "GPU-accelerated" capability.
The Radeon HD 7970’s good result is improved upon with the GHz Edition card in LuxMark 2.0, which centers on the LuxRender ray tracing engine. The new board nearly doubles the performance of Nvidia’s dual-GPU GeForce GTX 690. Although WinZip doesn’t seem to owe its speed-up to AMD graphics, LuxMark certainly does.
So does MuseMage, a photo editing application with a number of OpenCL-accelerated filters. With scaling that corresponds to shader processors, clock rate, and, clearly, architecture, the new GHz Edition board finishes in first place.
There were a number of additional tests we wanted to run, including Photoshop CS6, a beta build of HandBrake, and a beta build of GIMP. However, retesting all of our cards using the latest drivers in games took precedence. But the expanding list of OpenCL-optimized titles makes it clear that, finally, all of the talk about heterogeneous computing is giving way to real software we can test. More important than that self-serving state of affairs, apps that people actually use on a daily basis are being affected.
Consequently, we’ll be including a lot more compute-oriented testing in our graphics card reviews moving forward. Depending on how Nvidia responds, this could remain a real bastion for AMD, which is already betting so much of its success on the Heterogeneous System Architecture as Nvidia largely ignores client workloads in favor of its Quadro/Tesla business.
As developers get better about identifying the aspects of their workloads able to benefit from a GPU’s parallelized architecture, and then maximize the performance they’re able to extract from the hardware, Sandra helps demonstrate that the potential of AMD’s GCN far outstrips Kepler as it is implemented in GK104. And with GK110 not expected on the desktop until 2013, our placing isn’t expected to change this year.
Interesting also is that Nvidia's latest drivers still don't allow for PCI Express 3.0 support on Sandy Bridge-E platforms, resulting in less interface bandwidth than what AMD's cards achieve. However, you can now manually force it on through a patch from the company.