OpenCL: Compute, Cryptography, and Bandwidth
Shader Performance: FP32 vs. FP64
Let’s start with an OpenCL benchmark, which should push the theoretical ceiling of 32- and 64-bit compute performance.
Sandra's Cryptography module is next. It’s remarkable how well the older FirePro W9000 keeps up. The distance between the W9100 and the W8100 is right around where it should be based on each card's specifications.
Although these benchmarks are synthetic in nature, they still illustrate Nvidia’s half-hearted support of OpenCL. Yes, the company offers its proprietary CUDA API, and there are plenty of applications that support it. Increasingly, though, ISVs looking for a broader customer base don't want to support two languages, and OpenCL is gaining traction as a result. Even long-time bastions of CUDA support like Adobe are adopting OpenCL.
Folding It Up: Folding@Home
Let’s run the Folding@Home benchmark on this card. Even though few professionals would use a workstation-class board for this (or cryptocurrency mining), the test does give us a more real-world look at compute performance.
Once again, the gap between each card's performance is what we'd expect in light of their specifications. This chart demonstrates nicely how well AMD's architecture scales in scenarios without overhead.
Memory Bandwidth
Conversely, when comparing memory bandwidth under OpenCL to Direct3D 11, Nvidia demonstrates that putting more effort into optimizing drivers makes a quantifiable difference.
As we move on to our application benchmarks, keep these synthetics tests in mind. They help decipher the performance results of real-world metrics, which are subject to influence from other platform subsystems.
At least for now, we have to question whether Nvidia's lackluster support for OpenCL and emphasis on CUDA is the best strategy. Only time will tell.