Nvidia's CUDA: The End of the CPU?

The Theory: CUDA from the Software Point of View

From a software point of view, CUDA consists of a set of extensions to the C language, which of course recalls BrookGPU, and a few specific API calls. Among the extensions are type qualifiers that apply to functions and variables. The keyword to remember here is __global__, which when prefixed to a function indicates that the latter is a kernel – that is, a function that will be called by the CPU and executed by the GPU. The __device__ keyword designates a function that will be executed by the GPU (which CUDA refers to as the “device”) but can only be called from the GPU (in other words, from another __device__ function or from a __global__ function). Finally, the __host__ keyword is optional, and designates a function that’s called by the CPU and executed by the CPU – in other words, a traditional function.

There are a few restrictions associated with __device__ and __global__ functions: They can’t be recursive (that is, they can’t call themselves) and they can’t have a variable number of arguments. Finally, regarding __device__ functions resident in the GPU’s memory space, logically enough it’s impossible to obtain their address. Variables also have new qualifiers that allow control of the memory area where they’ll be stored. A variable preceded by the keyword __shared__ indicates that it will be stored in the streaming multiprocessors’ shared memory.The way a __global__ function is called is also a little different. That’s because the execution configuration has to be defined at the time of the call – more concretely, the size of the grid to which the kernel is applied and the size of each block. Take the example of a kernel with the following signature:

__global__ void Func(float* parameter);

which will be called as follows:

Func<<< Dg, Db >>>(parameter);

where Dg is the grid dimension and Db the dimension of a block. These two variables are of a new vector type introduced by CUDA.

The CUDA API essentially comprises functions for memory manipulation in VRAM: cudaMalloc to allocate memory, cudaFree to free it and cudaMemcpy to copy data between RAM and VRAM and vice-versa.

We’ll end this overview with the way a CUDA program is compiled, which is interesting: Compiling is done in several phases – first of all the code dedicated to the CPU is extracted from the file and passed to the standard compiler. The code dedicated to the GPU is first converted into an intermediate language, PTX. This intermediate language is like an assembler, and so enables the generated source code to be studied for potential inefficiencies. Finally, the last phase translates this intermediate language into commands that are specific to the GPU and encapsulates them in binary form in the executable.

  • CUDA software enables GPUs to do tasks normally reserved for CPUs. We look at how it works and its real and potential performance advantages.

    Nvidia's CUDA: The End of the CPU? : Read more
  • pulasky
  • Well if the technology was used just to play games yes, it would be crap tech, spending billions just so we can play quake doesnt make much sense ;)
  • MTLance
    Wow a gaming GFX into a serious work horse LMAO.
  • dariushro
    The Best thing that could happen is for M$ to release an API similar to DirextX for developers. That way both ATI and NVidia can support the API.
  • dmuir
    And no mention of OpenCL? I guess there's not a lot of details about it yet, but I find it surprising that you look to M$ for a unified API (who have no plans to do so that we know of), when Apple has already announced that they'll be releasing one next year. (unless I've totally misunderstood things...)
  • neodude007
    Im not gonna bother reading this article, I just thought the title was funny seeing as how Nvidia claims CUDA in NO way replaces the CPU and that is simply not their goal.
  • LazyGarfield
    I´d like it better if DirectX wouldnt be used.

    Anyways, NV wants to sell cuda, so why would they change to DX ,-)
  • I think the best way to go for MS is announce to support OpenCL like Apple. That way it will make things a lot easier for the developers and it makes MS look good to support the oen standard.
  • Shadow703793
    Mr RobotoVery interesting. I'm anxiously awaiting the RapiHD video encoder. Everyone knows how long it takes to encode a standard definition video, let alone an HD or multiple HD videos. If a 10x speedup can materialize from the CUDA API, lets just say it's more than welcome.I understand from the launch if the GTX280 and GTX260 that Nvidia has a broader outlook for the use of these GPU's. However I don't buy it fully especially when they cost so much to manufacture and use so much power. The GTX http://en.wikipedia.org/wiki/Gore-Tex 280 has been reported as using upwards of 300w. That doesn't translate to that much money in electrical bills over a span of a year but never the less it's still moving backwards. Also don't expect the GTX series to come down in price anytime soon. The 8800GTX and it's 384 Bit bus is a prime example of how much these devices cost to make. Unless CUDA becomes standardized it's just another niche product fighting against other niche products from ATI and Intel.On the other hand though, I was reading on Anand Tech that Nvidia is sticking 4 of these cards (each with 4GB RAM) in a 1U formfactor using CUDA to create ultra cheap Super Computers. For the scientific community this may be just what they're looking for. Maybe I was misled into believing that these cards were for gaming and anything else would be an added benefit. With the price and power consumption this makes much more sense now. Agreed. Also I predict in a few years we will have a Linux distro that will run mostly on a GPU.