The idea of using graphics accelerators for mathematical calculation is not recent. The first traces of it go back to the 1990s. Initially it was very primitive – limited mostly to the use of certain hard-wired functions like rasterization and Z-buffers to accelerate tasks like pathfinding or drawing Voronoi diagrams.
In 2003, with the appearance of highly evolved shaders, a new stage was reached – this time performing matrix calculations using the hardware then available. That was the year when an entire section of the SIGGRAPH (“Computations on GPUs”) was dedicated to this new IT fringe area. This early initiative was to take on the name GPGPU (for General-Purpose computation on GPUs). One early turning point in this area was the appearance of BrookGPU.
To really understand the role of Brook, you need to see how things were done before it made its appearance. The only way to get access to the GPU’s resources in 2003 was to use one of the two graphics APIs – Direct3D or OpenGL. Consequently, researchers who wanted to harness the GPU’s processing power had to work with these APIs. The problem was that those individuals weren’t necessarily experts in graphics programming, which seriously complicated access to the technology. Where 3D programmers talk in terms of shaders, textures and fragments; specialists in parallel programming talk about streams, kernels, scatter, and gather. So, the first difficulty was to find analogies between two distinct worlds:
- a stream – that is, a flow of elements of the same type – can be represented on the GPU by a texture. To give you an idea of this, consider that the equivalent in classic programming languages is simply an array.
- a kernel – the function that will be applied independently to each element of the stream – is the equivalent of a pixel shader. Conceptually, it can be seen as an internal loop in a classic program – the one that will be applied to the largest number of elements.
- to read the results of the application of a kernel to a stream, it has to be rendered in a texture. Obviously there’s no equivalent on a CPU, which has total access to the memory.
- to control the location where a memory write is to take place (in a scatter operation), it has to be done in a vertex shader, since a pixel shader can’t modify the coordinates of the pixel currently being processed.
- Vive le GeForce FX!
- The advent of GPGPU
- The CUDA APIs
- A Few Definitions
- The Theory: CUDA from the Hardware Point of View
- Hardware Point of View, Continued
- The Theory: CUDA from the Software Point of View
- In Practice
- Conclusion, Continued