Compute Performance And Striking A Balance
As I was testing Nvidia’s GeForce GTX Titan, but before the company was able to talk in depth about the card’s features, I noticed that double-precision performance was dismally low in diagnostic tools like SiSoftware Sandra. Although it should have been 1/3 the FP32 rate, my results looked more like the 1/24 expected from GeForce GTX 680.
It turns out that, in order to maximize the card’s clock rate and minimize its thermal output, Nvidia purposely forces GK110’s FP64 units to run at 1/8 of the chip’s clock rate by default. Multiply that by the 1:3 ratio of double- to single-precision CUDA cores, and the numbers I saw initially turn out to be correct.
But Nvidia claims this card is the real deal, capable of 4.5 TFLOPS single- and 1.5 TFLOPS double-precision throughput. So, what gives?
It’s improbable that Tesla customers are going to cheap out on gaming cards that lack ECC memory protection, the bundled GPU management/monitoring software, support for GPUDirect, or support for Hyper-Q (Update, 3/5/2013: Nvidia just let us know that Titan supports Dynamic Parallelism and Hyper-Q for CUDA streams, and does not support ECC, the RDMA feature of GPU Direct, or Hyper-Q for MPI connections). However, developers can still get their hands on Titan cards to further promulgate GPU-accelerated apps (without spending close to eight grand on a Tesla K20X), so Nvidia does want to enable GK110’s full compute potential.
Tapping in to the full-speed FP64 CUDA cores requires opening the driver control panel, clicking the Manage 3D Settings link, scrolling down to the CUDA – Double precision line item, and selecting your GeForce GTX Titan card. This effectively disables GPU Boost, so you’d only want to toggle it on if you specifically needed to spin up the FP64 cores.
We can confirm the option unlocks GK110’s compute potential, but we cannot yet share our benchmark results. So, you’ll need to look out for those in a couple of days.