GPGPU Benchmarks: This Time, With A Preface
It is our goal to be as thorough as possible and to include as many real-world applications as we can, instead of simply relying on synthetic performance metrics. Unfortunately, by pulling its launch date up before Christmas, AMD ran out of time to square away the details. So, the company had to focus its development efforts on gaming (admittedly, the most important subject for an introduction like this one).
As a consequence, other areas didn’t get any development love, and we're missing the ability to test a lot of what AMD is claiming as features. The blame certainly doesn’t fall on the company's software partners, since they were working on a different timeline than what actually ended up transpiring. Regardless, while some of the general-purpose compute applications work to some degree, most don’t. The ones that do don’t show much of a benefit over their predecessors, or even the unaccelerated code path. Ironically, video acceleration is one of the casualties, so we can’t even highlight one of Tahiti’s marquee features: VCE. In short, this is meant to be a fixed-function block of logic not unlike Intel’s Quick Sync.
So, this time around, we are forced to rely more heavily on synthetics. We will follow up with more real-world applications as soon as compatible versions of the software and supporting drivers are available.
Bitmining is one of the few real-world applications we're able to run, although it's a bit one-sided. Since the server would not let us verify, we had to use solo mode.
Now, Radeons have traditionally been very strong in Bitmining. However, efficiency is really almost more important than sheer performance, and that’s where things are less clear. Sure, the Radeon HD 7970 is the fastest single-GPU card in this group, but it’s comparatively small performance improvement comes with a steep increase in power consumption. For reference, while the aging Radeon HD 5870 attains its respectable performance using 190 W, the new Radeon HD 7970 guzzles down 254 W.
LuxMark is based on the freeware application LuxRender, making it the second real-world application in our GPGPU suite. The results are nothing short of spectacular, as the Radeon HD 7970 returns an almost twofold improvement over its predecessor, which takes third place to the older Radeon HD 5870.
Meanwhile, Nvidia's GeForce GTX 580 trails the pack and comes in last. Granted, this benchmark doesn’t seem to like the GeForce cards to begin with, but the fact that the Radeon HD 7970 is almost three times as fast as Nvidia’s current single-GPU flagship is a bit of an embarrassment. It also goes to show that a little optimization can go a long way.
GPU Caps Viewer
That brings us to our synthetic benchmarks. GPU Caps Viewer uses a combination of OpenCL computations, post-processing, and normal graphics output without anti-aliasing, letting us draw some interesting conclusions.
The Post-FX test is a direct implementation of Nvidias own demo for oclPostprocessGL from the Nvidia GPU Computing SDK. A blur effect is added to the image output during post-processing. Interestingly, the Radeon HD 7970 is able to beat the GeForce GTX 580, even though the demo was originally developed by Nvidia. The older Radeons fall behind by a sizeable margin.
In the particle test, the GeForce GTX 580 chalks up a clear win. Meanwhile, none of the Radeons can keep up, although the Radeon HD 7970 is able to close the gap a little.
The N-Queen puzzle (also known as the eight queens puzzle) is a complex mathematical problem from the world of chess. The goal is to arrange eight queens on a chess board in such a way that no two queens can attack each other according to the rules of chess. The color of the piece is irrelevant, so any queen may attack any other queen. In the end, the point is to find the number of possible solutions as quickly as possible.
This problem forms the basis of this benchmark, and the NQueen test proves once more that AMD's Radeon HD 7970 tremendously benefits from leaving behind the VLIW architecture in complex workloads. Both the HD 7970 and the GTX 580 are nearly twice as fast as the older Radeons. So, while the VLIW-based cards are great for crunching numbers, they’re not as well suited to this sort of task.
Since this is one of the few benchmarks out there that can test DirectCompute performance as well, that was originally on our list too. However, the result we got for the Radeon HD 7970 was far too high to be plausible. Until we can prove otherwise, we’ll discard that result and chalk it up to a bug in the benchmark. The result of the OpenCL benchmark was more believable:
Interestingly, the Radeons rule this benchmark, with the HD 5870 taking the top spot ahead of the HD 6970 and the HD 7970. Despite the fact that it uses an architecture similar to that of the HD 7970, Nvidia’s GeForce GTX 580 trails the AMD group by a wide margin.
While these results hold great promise, it’s certainly too early for the AMD fans out there to celebrate. Due to the distinct lack of usable real-world apps and the beta state of the drivers we had at our disposal, it’s hard to come to any conclusion about Tahiti’s real compute performance. There is definitely a very positive trend, though, so we can hope to see some compelling performance in real-world applications once they surface.
Moving away from the previous VLIW architecture doesn’t hurt the Radeon HD 7970 too much (if at all, as borne out in Bitmining) in areas where Radeons have traditionally ruled the roost, while simultaneously helping it gain ground in disciplines that Fermi-based cards dominated in the past. Thus, the card appears to be a potent solution able to leave behind the previous generation's limitations. Of course, the drivers and third-part apps have to come around, too. AMD certainly has its work cut out for it in this department.