Performance Leap: NVIDIA GeForce 6800 Ultra

Pixel Processing Engine

This diagram shows the chip layout of the GeForce 6800 graphics processor.

The architecture of the NV40's pixel shader engine is practically a complete redesign and shares almost no similarities with that of its predecessor. After the lengthy discussion about the NV35/38 pixel pipeline specification during the past year, NVIDIA has decided to define it as a 4x2 / 8x0 architecture. The NV40, on the other hand, is geared toward a 16x1 (16 pixels per clock color & Z) / 32x0 (32 pixel per clock Z-only) design. That means NV40 sports 16 real pixel pipelines.

For clarification, here are some examples of what this means in the real world. Doom II, for example, makes extensive use of stencil shadows. To render shadow volumes, only the Z-stencil is used. As a result, NV40 can calculate 32 pixels per clock in those situations, where NV35/38 could only render 8. 3DMark 2003's Game Test 1 is mostly single-textured. In this case, the NV40 can render 16 pixels per clock (NV35 / 38 = 4). Lastly, in Quake III, most objects are dual-textured, meaning NV40 can render 8 pixels per clock (NV35 / 38 = 4).

Unlike its forerunner, the NV40's pixel shader pipelines are actually geared toward full 32 Bit floating point precision. While the chip also supports the half-precision modes of the NV3x series, it is no longer dependent upon them to attain its peak performance.

Game developers were often forced to reduce the shader precision of their games to FP16 or FX12 to reach playable performance levels on NV35 / 38 hardware. This is now history with the advent of the NV40. Not unlike ATi's R360, which always calculates shaders at FP24 precision, NV40 delivers full shader performance in FP32. While FP16 shaders may still offer a slight performance advantage in some special cases, the performance delta will be much less pronounced than on NV35 / 38.

Swipe to scroll horizontally
Pixel Shader Summary
Pixel Shader Model2.02.0a2.0b3.0
Dependent Texture Limit4No Limit4No Limit
Texture Instruction Limit32unlimitedunlimitedunlimited
Position Register---Yes
Instruction Slots32+64512512>= 512
Executed Instruction32+6451251265535
Interpolated Registers2+82+82+810
Instruction Predication-Yes-Yes
Indexed Input Registers---Yes
Temp Registers12223232
Constant Registers323232224
Arbitrary Swizzling-Yes-Yes
Gradient Instructions-Yes-Yes
Loop Count Register---Yes
Face Register (2-sided lighting)---Yes
Dynamic Flow Control Depth---24

NVIDIA's new chip completely fulfills the requirements of Microsoft's DirectX 9.0c spec, which demands a 32 Bit floating point shader precision.