GeForceFX: NVIDIA goes Hollywood?

GeForceFX Pixel Shader 2.0+

Extensions compared to DirectX 9 (standard):

  • Registers and instructions can be 12 bit fixed point, 16 bit floats, or 32 bit floats;
  • Any number of texture fetches from up to 16 unique textures;
  • 1,024 instructions per rendering pass;
  • Eight texture coordinates (up to 16 active textures);
  • If the target is a float surface, then the float value gets "width converted" to match the render target, and then gets stored. (No blending is allowed to float surfaces.);
  • If the source is a float surface, no filtering can be performed on the way into the pixel unit (no bi-linear filter float values);
  • Type and width conversions are all free;
  • All pixels in a batch execute in the same number of clock cycles.

The highlights, as presented by NVIDIA:

  • Introduces new instruction set for pixel shading: instructions previously reserved for vertex processing are now available for pixel shading, extended with instructions necessary for pixel processing.
  • Removes existing limitations: programs can be longer (up to 1,024 instructions) with up to 16 textures per pixel and unlimited levels of dependent texture lookups.
  • Vastly expands the number of pixel operations: up to 1,024 pixel operations; per-component swizzling; per-component conditional write masks; arbitrary texture filters; and other advanced instructions. DirectX 8 supported eight instructions. Using the latest API functionality enabled by the newest generations of competitive hardware, this improves somewhat with support for up to 64 instructions. The NVIDIA CineFX engine, with up to 1,024 instructions, supports truly long shader programs to achieve stunning effects and shader possibilities.
  • Enhances fragment program storage: stored in video memory, unlike vertex programs, bringing costs down for managing lots of fragment programs.
  • Up to 16 texture maps. The NVIDIA CineFX engine allows fetching from up to 16 unique texture maps in a single pixel shader program. These textures can be anything that defines the underlying surface properties; examples include bump maps, gloss/ specularity maps, environment maps, shadow maps, and albedo maps.
  • Up to 1024 texture instructions per shader. Previous architectures tightly bound the number of unique texture maps with the number of texture fetches available. The NVIDIA CineFX engine relaxes this restriction, and allows up to 1024 texture fetch instructions in a single shader, sourcing up to 16 unique textures. This enables a host of new effects which rely on multiple texture accesses:
  • Soft shadows. Soft shadows can now be created by taking an arbitrarily large number of samples from a shadow map and using them to generate a filtered shadow result.Framebuffer post-processing effects. A number of interesting effects can be created by taking multiple texture samples from the framebuffer. Blurs, halos, and non-photorealistic rendering effects such as toon shading and painterly effects are now possible.Complex filters. Higher-quality filtering can be performed on texture lookups. A bicubic filter, for example, requires 16 samples from the same texture.

We'll refrain from assessing each of these extensions at this point. ATi is of the opinion that current hardware is unable to process longer and more complex shaders for games in an acceptable amount of time. Of course, NVIDIA claims the opposite is true. In the end it is up to game developers to shine some light on this issue by utilizing the new possibilities offered by the new GeForceFX - or by not doing so.