GeForceFX: NVIDIA goes Hollywood?

NVIDIA 2.0+

The programmability of the pixel and vertex shaders clearly surpasses the requirements found in the DirectX 9 specification. This is why NVIDIA emphasizes this by adding a "+" to their denotation. Introducing all of the features and possibilities of these new shaders is well beyond the focus of this article, and in the end, only shader programmers would be able to come to a real conclusion about the value of these features.

A screenshot of Ogre, the character in the Spellbound movie YEAH. NVIDIA used it in a real-time demo.

An overview of the GeForceFX's extensions:

GeForceFX Vertex Shader 2.0+

Extensions compared to DirectX 9 (standard):

  • 256 instructions of stored program (was 128);
  • 256 constants (was 96);
  • Vector address register (was a scalar);
  • Maximum number of instructions that can be executed per shader is now 65,536.

The highlights, as presented by NVIDIA:

  • Up to 65,536 vertex instructions executed per vertex (up to 256 static instructions per shader)
    The CineFX shading engine exposes an unprecedented amount of vertex processing capabilities. In addition to doubling the instruction storage, the addition of control flow dramatically increases the amount of actual computation that can occur for each vertex. This flexibility reduces the total number of vertex shaders required by an application.
  • Up to 256 vector constants
    The number of constant registers available in the CineFX vertex shader has more than doubled - from 96 up to 256 quad words! This increase allows for substantially more bone matrices for matrix palette skinning and lots more simultaneous light sources.
  • Sixteen temporary vector registers
    Temporary register storage has increased by 33% from 12 to 16. This temporary storage is particularly helpful with the larger programs supported by the CineFX engine.
  • Up to 64 separate loops
    The CineFX vertex shading engine makes for simpler programs by supporting fully dependent looping and branching (including nested loops and branches) with up to 64 unique branch targets in a single shader program. Looping over all light sources and then branching to the appropriate light type is now a breeze.
  • Per-component conditional codes and write masks
    Condition codes are the machinery behind data-dependent branching, but they can also improve the performance and simplify the code for conditional assignments.
  • Call and return (subroutines)
    In addition to the CineFX branching capabilities, the vertex processor supports full subroutine CALL/ RETURN semantics, with an up-to-four-deep call stack.
  • Loops and branching for both static and dynamic control flow
    Fully general looping and branching (along with dependent data reference) are what make the CineFX vertex shading engine so flexible and powerful.