OpenCL: General-Purpose Computing
Measuring the general-purpose compute performance of multi-GPU solutions is a challenge because not every app knows how to exploit more than one graphics processor at a time. We also have to strike CUDA- or Stream/APP-only software from our list. That doesn’t leave many options, which is why we’re limiting our search to OpenCL-accelerated applications.
The most obvious benefit to OpenCL is that both vendors’ cards compete on a playing field that is as level as we can make it. Besides, a comparison using real-world metrics covering floating-point (FP32) and double-precision (FP64) math is much more interesting than a huge field of synthetic benchmarks. As usual, we also include a number of current workstation-class cards to see how they fare relative to their consumer siblings.
We chose two different renderers that take almost opposing approaches to optimization. On one hand, we have the well-known LuxMark benchmark based on the LuxRender engine. On the other, we use the integrated benchmark of RatGPU, an application that tends to favor Nvidia cards but isn’t really optimized for either architecture. LuxMark reports its result in samples per second, while RatGPU measures the time per run.
There’s really not much to say about LuxMark that the chart doesn’t already tell us. AMD’s GCN architecture dominates, and an OpenCL-optimized application able to exploit two Tahiti GPUs simply screams.
Meanwhile, RatGPU shows us what many CUDA-enabled renderers have proven in the past, namely none of the Kepler-based GeForce cards can keep up with the Fermi-based GeForce GTX 580 in compute-heavy software. It’s a little strange that the VLIW4-based Radeon HD 6970 is faster than Radeon HD 7970 GHz Edition, though.
The software we’re using for this test treats the multi-chip cards as if they have one GPU, so performance scales very well. AMD’s Radeon HD 7990, which seems to excel in integer-based hashing operations, performs really well, followed by a number of other GCN-based boards.
Financial Analysis Performance (Float/FP32)
We see the same sort of near-ideal scaling from the Radeon HD 7990 in our four financial analysis benchmarks (two benchmarks with two levels of precision each). Indeed, AMD’s flagship almost delivers two times the performance of the single-GPU Radeon HD 7970 GHz Edition, despite slightly lower clock rates. Meanwhile, the GeForce GTX Titan and 690 can’t even compete.
Financial Analysis Performance (Double/FP64)
Repeating those two benchmarks using double-precision math makes the differences even more apparent. While Nvidia’s other cards struggle with FP64, the Titan actually does quite decently, especially compared to the GK104-based GeForce GTX 690 and GTX 680. The trick is to activate CUDA’s dual-precision mode in the card’s driver, which also extends functionality to OpenCL. Although this negatively affects clock rates, the card is faster overall in FP64-based workloads.
Meanwhile, the Radeon HD 7990 doesn’t need any tweaking to achieve its impressive and chart-topping performance.