HLSL's, Cg and the RenderMonkey

With Old Techniques Come Old Problems

One of the most interesting aspects of using high level languages on graphics cards is that we're going to come across all the same problems that high level languages have on CPUs. In a field like computer graphics, the most important problems will be those that affect performance, and with high level languages, the biggest performance problem is poorly compiled code.

This problem comes about because a statement in an HLSL can be compiled into a lot of different streams of assembly code. Something as simple as a dot product can be carried out in many different ways. For example, imagine the piece of code from our HLSL shader above, which carries out the dot product.

The expression itself looks something like this:

float lighting = dp3(IN.Normal, lightdirection);

The fastest way to write this in vertex shader code looks like this:

dp3 r0.x, c0, v2

But it is equally possible to compile to this:

mov r1, c0

mov r2, v2
dp3 r0.x, r1, r2

Or, even more insanely:

mov r1, c0

mov r2, v2
mul r0.x, r1.x, r2.x
mul r0.y, r1.y, r2.y
mul r0.z, r1.z, r2.z
add r0.x, r0.x, r0.y
add r0.x, r0.x, r0.z

They all have the same result, namely that the dot product of the normal and the constant value is placed into register r0. However, the second example takes up three instructions, and the third example takes a whopping seven.

The third example above is a fairly stupid thing to do (obviously), but the second example is a common problem with human code as well as with the initial output from compilers. In vertex shader code, mov instructions are almost always unnecessary because we can specify where we write the result of the previous operation.

Another common optimisation occurs where a mul (multiply) and an add instruction occur in sequence. In this case, the vertex shader language already has a mad (multiply and add) instruction. So, the following code:

mul r0, r1, r2

add r0, r0, r3

can be optimised to:

mad r0, r1, r2, r3

So, in an environment where every instruction counts, it is fairly important that our high level language compiler spots these optimisations and gives us back the instructions.

Having said all that, there is no real reason why a compiler couldn't do the same or better than a human every single time it compiles a piece of code. All it needs are some seriously good programmers, and a lot of time.