Vertex Shaders and Pixel Shaders

The Problems

As I mentioned already, there are good reasons to never use the constrained vertex processing system outlined above. The main one being that you've already got the fixed-function pipeline. For most types of vertex processing, this is perfectly sufficient. Admittedly, the game engine won't be able to take advantage of the most complex types of material model, but where specific special effects are needed, it's easy enough to write constrained vertex shaders to carry out that processing.

There's another good reason not to use vertex shaders on more recent hardware. The Radeon and GeForce1/2 graphics cards can all accelerate the fixed-function pipeline in hardware. However, if you want to use vertex shaders on these cards, you'll have to process the vertex shader in software. Some people will tell you that on fast CPU's you might as well process vertices on the CPU anyway, because the graphics card itself is only running at ~200 MHz. However, the graphics card is carrying out some very specific operations in a highly optimized way, and it's unlikely that anything less than a 1 GHz processor is going to be able to keep up with it. Also, for games which make heavy use of the CPU, any work moved from the GPU to the CPU is going to slow the game down. Having said that, there's no reason not to process complex vertex shaders on the CPU if they can't be emulated in the fixed-function pipeline on this type of hardware.

There is good evidence that carrying out some complex processing in software, and then leaving the transformation and lighting to the hardware can be substantially quicker than carrying out everything on the CPU. But if you've already programmed an effect using a vertex shader it's generally going to be easier to process that entirely on the CPU, rather than to cut the code up into a software processing portion that produces vertices for the fixed function processor on the graphics card.

The fragment-based processing system also suffers from these problems when running on older graphics cards. Even though it can perfectly emulate the fixed-function pipeline, it's quicker to use a fixed-function setup for T&L cards. On top of this vertex shaders have some limits which can make life far more complicated for the fragment processor. First, you can only have no more than 128 instuctions in a vertex shader, if you exceed this number of instructions the vertex shader will just fail to run on the hardware. You're also limited to 96 constants in the current versions of vertex shaders. Once again, if you exceed this number of constants your automatic system is probably going to fail when it tries to set the 97th constant.

Although it's uncommon for these types of situations to occur, it's still possible to hit these limits in relatively real-world situations. For example, different types of lights require a different number of instructions, so if you have a scene with four spotlights and a heavyweight material, it's more than likely that you'll hit the instruction limit for the vertex shader. You can imagine that these kinds of scenes might occur rarely, but you certainly don't want someone to accidentally point a torch at a surface illuminated by two car headlights, and for the surface to suddenly disappear! There is another case which is far more likely to occur, and this involves skinning. The problem is, that skinning will take up as much of the constant memory as it can for animation matrices. This means that an increase in the number of lights will easily make us hit the constant memory limit.

Using the fixed-function pipeline, it's far rarer for a certain type of vertex transformation to limit the type of lighting you can apply to geometry. Even if you exceed the number of lights that your hardware can process, the driver has to try to process these vertices, even if performance drops.

And that is the issue that is generally at the heart of the problems with vertex shaders. When you're using a processor you don't have to worry about the size of the level one cache. If you exceed that amount of data in a loop, you'll just lose performance, the loop still gets processed. If you exceed the amount of memory in a vertex shader, you simply can't process that data. This does make the hardware implementation easier, but it makes the software implementation harder. If vertex shaders didn't have instruction limits or constant limits, the fragment based processing system would be infallible. It's a shame that underlying hardware constraints have passed the whole way up the API to the game programmer.

In the future, these limits will increase, but as long as the new limits are within an order of magnitude, it will still be possible to reach the constant limit with existing animation methods.