24 Pipelines of Power! NVIDIA 7800 GTX

NVIDIA Gets Shady With CineFX 4.0

From a layman's point of view, the block diagrams of G70's data paths do not look that different from those of the NV40, other than the extra shading units. But when you take a deeper look, it is clear to see that the G70 was designed to handle more mathematical calculations per clock.

Part of the goal of the 7800 GTX was to raise minimum frame rates to make the gaming experience fun consistently, not just some of the time. To accommodate the higher overhead needed for many of today's and tomorrow's games, additional Arithmetic Logic Units (ALU) were required to boost the performance in the 3D rendering pipeline.

From this block diagram, you can see where the ALUs were added to the pixel shading pipeline. Each mini-ALU contains a multiply-add (MADS) instructions set. NVIDIA claims that vertex shader performance has increased "up to 30%" in the scalar ops because of the single-cycle MADs. With each clock, four floating point MADS can be performed at full speed.

Swipe to scroll horizontally
Header Cell - Column 0 Pixel ShaderVertex Shader
ALU248
Architecture2x Vector-4 + Scalar + NormVector-4 + Scalar
VectorRow 2 - Cell 1 4 MAD / 8 flops
ScalarRow 3 - Cell 1 2 flops
Instructions / ALU52
Operations / ALU105
Flops /ALU2710
Instructions / Clock12016
Operations/ Clock24040
Flops / Clock64880
Clock Frequency430 MHz430 MHz
Instructions / Second51.6B6.88B
Operations / Second103.2B17.2B
Floating point operations / Second278.6B34.4B
Bilinear Filtered Textures per clock24Row 14 - Cell 2
Bilinear Texel Fill Rate10.3BRow 15 - Cell 2
Texture Bandwidth (FB + PCI-E)44.4 GB/sRow 16 - Cell 2

Additionally, NVIDIA says it has made many other improvements throughout the lower-level pipeline. Two of these improvements show up on the bottom line in benchmark scores and real world performance - lowering latencies and increasing computational capabilities per clock. If everything works together with less delay, then the whole system will benefit.