NVIDIA Gets Shady With CineFX 4.0
From a layman's point of view, the block diagrams of G70's data paths do not look that different from those of the NV40, other than the extra shading units. But when you take a deeper look, it is clear to see that the G70 was designed to handle more mathematical calculations per clock.
Part of the goal of the 7800 GTX was to raise minimum frame rates to make the gaming experience fun consistently, not just some of the time. To accommodate the higher overhead needed for many of today's and tomorrow's games, additional Arithmetic Logic Units (ALU) were required to boost the performance in the 3D rendering pipeline.
From this block diagram, you can see where the ALUs were added to the pixel shading pipeline. Each mini-ALU contains a multiply-add (MADS) instructions set. NVIDIA claims that vertex shader performance has increased "up to 30%" in the scalar ops because of the single-cycle MADs. With each clock, four floating point MADS can be performed at full speed.
|Pixel Shader||Vertex Shader|
|Architecture||2x Vector-4 + Scalar + Norm||Vector-4 + Scalar|
|Vector||4 MAD / 8 flops|
|Instructions / ALU||5||2|
|Operations / ALU||10||5|
|Instructions / Clock||120||16|
|Flops / Clock||648||80|
|Clock Frequency||430 MHz||430 MHz|
|Instructions / Second||51.6B||6.88B|
|Operations / Second||103.2B||17.2B|
|Floating point operations / Second||278.6B||34.4B|
|Bilinear Filtered Textures per clock||24|
|Bilinear Texel Fill Rate||10.3B|
|Texture Bandwidth (FB + PCI-E)||44.4 GB/s|
Additionally, NVIDIA says it has made many other improvements throughout the lower-level pipeline. Two of these improvements show up on the bottom line in benchmark scores and real world performance - lowering latencies and increasing computational capabilities per clock. If everything works together with less delay, then the whole system will benefit.