Markham (ON) - More and more we see graphics cards manufacturers touting the GFLOPs capability of their cards, hinting to the potentially enormous processing power that is hiding in those graphics processors. But those numbers, which recently hit 1 TFLOPs, aren't exactly comparable with TFLOPs rankings on the Top500.org list, since there are different instructions and there are different ways to calculate these numbers. An example for this dilemma is AMD's RV670.
Graphics cards manufacturers come up with their peak numbers are derived when GPUs are confronted with the most simple instruction, which can run through all of AMD's (ATI's) 320 number-crunching processors and Nvidia's G80 and G90 all 256 units.
When AMD launched its most recent GPGPU part, the FireStream 9170, there were a few of questions floating around, especially the one about its capability to support the double-precision FP64 format. (AMD's official product page is located here) The product details for the FireStream 9170:
AMD states that it can achieve a peak of around 500 GFLOPs for single-precision FP32 format. However, with a general demand of double-precision FP64 support in academia and science and AMD's claim that the new Firestream can support this format, teh obvious question was how quick this card would be.
We spent some time with professors and developers over the past weeks, and heard that they would be perfectly happy if the GPGPU chip would be able to perform DP FP64 calculations 10x slower than FP32, just to be able to have results in double-precision FP64. The potential performance of a multi-GPGPU box would still be a good value for many applications, at least in applications that do not require significant memory amounts only a traditional supercomputer installation can offer.
AMD's Dave "Wavey" Baumann (of ex-Beyond3D fame) told us that while AMD's RV670 chip is supporting double-precision units, it does not feature individual units for FP64, but uses the FP32 units to do FP64 calculations over a number of cycles. And yes, this process takes time. Depending on complexity of operation, the best case scenario is around half the original SP FP32 performance about 250 GFLOPs; in a worst case, the performance should be about a quarter of its FP32 performance - or about 125 GFLOPs. Dave told us that the chip usually averages out somewhere in between, which is actually quite a feat for a chip that does not feature native FP64 units.
At the end of the day, if you're running double-precision FP64 on AMD's FireStream 9170 board, you should expect to get between 100 and 250 GFLOPs (realistically, expect the former number). It will be interesting to see how AMD and Nvidia will implement FP64 handling in near future, but for now, expected performance numbers should prove more than tasty to take the plunge and start development of accelerated applications on GPGPU hardware.