Performance Leap: NVIDIA GeForce 6800 Ultra
Pixel Processing Engine
This diagram shows the chip layout of the GeForce 6800 graphics processor.
The architecture of the NV40's pixel shader engine is practically a complete redesign and shares almost no similarities with that of its predecessor. After the lengthy discussion about the NV35/38 pixel pipeline specification during the past year, NVIDIA has decided to define it as a 4x2 / 8x0 architecture. The NV40, on the other hand, is geared toward a 16x1 (16 pixels per clock color & Z) / 32x0 (32 pixel per clock Z-only) design. That means NV40 sports 16 real pixel pipelines.
For clarification, here are some examples of what this means in the real world. Doom II, for example, makes extensive use of stencil shadows. To render shadow volumes, only the Z-stencil is used. As a result, NV40 can calculate 32 pixels per clock in those situations, where NV35/38 could only render 8. 3DMark 2003's Game Test 1 is mostly single-textured. In this case, the NV40 can render 16 pixels per clock (NV35 / 38 = 4). Lastly, in Quake III, most objects are dual-textured, meaning NV40 can render 8 pixels per clock (NV35 / 38 = 4).
Unlike its forerunner, the NV40's pixel shader pipelines are actually geared toward full 32 Bit floating point precision. While the chip also supports the half-precision modes of the NV3x series, it is no longer dependent upon them to attain its peak performance.
Game developers were often forced to reduce the shader precision of their games to FP16 or FX12 to reach playable performance levels on NV35 / 38 hardware. This is now history with the advent of the NV40. Not unlike ATi's R360, which always calculates shaders at FP24 precision, NV40 delivers full shader performance in FP32. While FP16 shaders may still offer a slight performance advantage in some special cases, the performance delta will be much less pronounced than on NV35 / 38.
Pixel Shader Summary | ||||
---|---|---|---|---|
Pixel Shader Model | 2.0 | 2.0a | 2.0b | 3.0 |
Dependent Texture Limit | 4 | No Limit | 4 | No Limit |
Texture Instruction Limit | 32 | unlimited | unlimited | unlimited |
Position Register | - | - | - | Yes |
Instruction Slots | 32+64 | 512 | 512 | >= 512 |
Executed Instruction | 32+64 | 512 | 512 | 65535 |
Interpolated Registers | 2+8 | 2+8 | 2+8 | 10 |
Instruction Predication | - | Yes | - | Yes |
Indexed Input Registers | - | - | - | Yes |
Temp Registers | 12 | 22 | 32 | 32 |
Constant Registers | 32 | 32 | 32 | 224 |
Arbitrary Swizzling | - | Yes | - | Yes |
Gradient Instructions | - | Yes | - | Yes |
Loop Count Register | - | - | - | Yes |
Face Register (2-sided lighting) | - | - | - | Yes |
Dynamic Flow Control Depth | - | - | - | 24 |
NVIDIA's new chip completely fulfills the requirements of Microsoft's DirectX 9.0c spec, which demands a 32 Bit floating point shader precision.
Stay On the Cutting Edge: Get the Tom's Hardware Newsletter
Get Tom's Hardware's best news and in-depth reviews, straight to your inbox.