High-Tech And Vertex Juggling - NVIDIA's New GeForce3 GPU

Vertex Shader Details

The above diagram shows that the vertex shader is able to compute vertices with up to 16 data entries. Each entry consists of 4 32 bit floating-point numbers. 16 entries are quite a lot. It easily fits an average vertex with its position coordinates, weight, normal, diffuse and specular color, fog coordinate and point size information, leaving plenty of space for the coordinates of several textures.

Inside the vertex shader, the data is computed in form of entries. We just learned that each entry is a set of four 32 bit numbers. This makes the vertex shader a SIMD (single instruction multiple data) processor, as you are applying one instruction and affect a set of four variables. This makes perfect sense, because most transform and lighting operations are using 4x4 or 3x3 matrix operations. Each data is treated as floating point value, which shows that all computations executed by the vertex shader are actual floating-point calculations. Basically, the vertex shader is a very powerful SIMD FPU, barely touched by Pentium 4's SSE2 unit.

The next important feature of the vertex shader is its 12 SIMD-registers that can also contain four 32 bit floating-point values. Those 12 registers are what the vertex processor can juggle with. Besides the 12 registers, which can be used for load as well as store, the vertex shader offers a set of 96 4 x 32 bit SIMD constants that are loaded with parameters defined by the programmer before the program starts. Those constants can be applied within the program and they can even be addressed indirectly, but only one constant can be used per instruction, which is a bit of a bummer. If an instruction should require more than one constant, one has to be loaded in one of the registers with a previous load-instruction. The typical use of this large set of constant data would be things like matrix data for the transform (usually a 4x4 matrix), light characteristics, procedural data for special animation effects, vertex interpolation data (for morphing/key frame interpolation), time (for key frame interpolation or particle systems) and more. There is a special kind of vertex programs called 'vertex state program', which is actually able to write to the parameter block. Normal vertex programs are only able to read from it.

The instructions itself are very simple, but therefore also easily understandable. The vertex shader does not allow any loops, jumps or conditional branches, which means that it executes the program linearly one instruction after the other. The maximal length of a vertex program is 128 instructions. After that the vertex should be changed to what the developer intended and it's got to be transformed and lit. If more instructions should be required the vertex can enter the vertex shader once more.

The final result that comes out of the vertex shader is yet another vertex, transformed to the 'homogenous clip space' and lit. It is important to note that the vertex shader is not able to create vertices or to destroy them. One vertex goes in and one vertex comes out.