Nvidia's CUDA: The End of the CPU?


Finally, in spite of what we said earlier about this not being a horserace, we couldn’t resist the temptation of running the program on an 8800 GTX, which proved to be three times as fast as the mobile 8600, independent of the size of the blocks. You might think the result would be four or more times faster based on the respective architectures: 128 ALUs compared to 32 and a higher clock frequency (1.35GHz compared to 950MHz), but in practice that wasn’t the case. Here again the most likely hypothesis is that we were limited by the memory accesses. To be more precise, the initial image is accessed like a CUDA multidimensional array – a very complicated term for what’s really nothing more than a texture. There are several advantages:

  • accesses get the benefit of the texture cache;
  • we have a wrapping mode, which avoids having to manage the edges of the image, unlike the CPU version.

We could also have taken advantage of free filtering with normalized addressing between [0,1] instead of [0, width] and [0, height], but that wasn’t useful in our case. As you know as a faithful reader, the 8600 has 16 texture units compared to 32 for the 8800GTX. So there’s only a two-to-one ratio between the two architectures. Add the difference in frequency and we get a ratio of (32 x 0.575) / (16 x 0.475) = 2.4 – in the neighborhood of the three-to-one we actually observed. That theory also has the advantage of explaining why the size of the blocks doesn’t change much on the G80, since the ALUs are limited by the texture units anyway.

In addition to the encouraging results, our first steps with CUDA went very well considering the unfavorable conditions we’d chosen. Developing on a Vista laptop means you’re forced to use CUDA SDK 2.0, still in its beta stage, with the 174.55 driver, which is also in beta. Despite all that, we have no unpleasant surprises to report – just a little scare when the first execution of our program, still very buggy, tried to address memory beyond the allocated space.

The monitor blinked frenetically, then went black … until Vista launched the video driver recovery service and all was well. But you have to admit it was surprising when you’re used to seeing an ordinary Segmentation Fault with standard programs in cases like that. Finally, one (very small) criticism of Nvidia: In all the documentation available for CUDA, it’s a shame not to find a little tutorial explaining step by step how to set up the development environment in Visual Studio. That’s not too big a problem since the SDK is full of example programs you can explore to find out how to build the skeleton of a minimal project for a CUDA application, but for beginners a tutorial would have been a lot more convenient.

  • CUDA software enables GPUs to do tasks normally reserved for CPUs. We look at how it works and its real and potential performance advantages.

    Nvidia's CUDA: The End of the CPU? : Read more
  • pulasky
  • Well if the technology was used just to play games yes, it would be crap tech, spending billions just so we can play quake doesnt make much sense ;)
  • MTLance
    Wow a gaming GFX into a serious work horse LMAO.
  • dariushro
    The Best thing that could happen is for M$ to release an API similar to DirextX for developers. That way both ATI and NVidia can support the API.
  • dmuir
    And no mention of OpenCL? I guess there's not a lot of details about it yet, but I find it surprising that you look to M$ for a unified API (who have no plans to do so that we know of), when Apple has already announced that they'll be releasing one next year. (unless I've totally misunderstood things...)
  • neodude007
    Im not gonna bother reading this article, I just thought the title was funny seeing as how Nvidia claims CUDA in NO way replaces the CPU and that is simply not their goal.
  • LazyGarfield
    I´d like it better if DirectX wouldnt be used.

    Anyways, NV wants to sell cuda, so why would they change to DX ,-)
  • I think the best way to go for MS is announce to support OpenCL like Apple. That way it will make things a lot easier for the developers and it makes MS look good to support the oen standard.
  • Shadow703793
    Mr RobotoVery interesting. I'm anxiously awaiting the RapiHD video encoder. Everyone knows how long it takes to encode a standard definition video, let alone an HD or multiple HD videos. If a 10x speedup can materialize from the CUDA API, lets just say it's more than welcome.I understand from the launch if the GTX280 and GTX260 that Nvidia has a broader outlook for the use of these GPU's. However I don't buy it fully especially when they cost so much to manufacture and use so much power. The GTX http://en.wikipedia.org/wiki/Gore-Tex 280 has been reported as using upwards of 300w. That doesn't translate to that much money in electrical bills over a span of a year but never the less it's still moving backwards. Also don't expect the GTX series to come down in price anytime soon. The 8800GTX and it's 384 Bit bus is a prime example of how much these devices cost to make. Unless CUDA becomes standardized it's just another niche product fighting against other niche products from ATI and Intel.On the other hand though, I was reading on Anand Tech that Nvidia is sticking 4 of these cards (each with 4GB RAM) in a 1U formfactor using CUDA to create ultra cheap Super Computers. For the scientific community this may be just what they're looking for. Maybe I was misled into believing that these cards were for gaming and anything else would be an added benefit. With the price and power consumption this makes much more sense now. Agreed. Also I predict in a few years we will have a Linux distro that will run mostly on a GPU.