GeForce 8600: DirectX 10 For The Masses

TMU Tweaking

Nvidia has not been asleep behind the wheel. It was the first to introduce a DX10 capable graphics processor. The phrase "what have done for me lately" is not misplaced as we should expect a lot from the people we shell out hundreds of dollars to in computer hardware. While AMD/ATI has yet to release a single card to the market, Nvidia now has its second delivery in the form of 169 square millimeter G84 and G86 processors.

The answer to my first question is yes. This new offering delivers upgrades to the previous card. Other than the die shrink to an 80 nm process and the utilization of less silicon, the GeForce 8600 has two primary changes. One was to the 3D engine and a second was a complete overhaul to Nvidia's PureVideo processing engine. The combination of both advances suggests that Nvidia has a specified consumer target in mind for these products. On one hand, there is the introduction of DX10 to more consumers as this brings the price down $70 for the average gaming enthusiast and as much as $150 less for the budget gamer. Additionally the home theater market consumer has a much better card and at an attractive price.

Now looking past the marketing, the core difference to the 3D engine is a "tweak" to the amount of texture processing the graphics core can deliver per clock cycle. Each texture mapping unit (TMU) on the GeForce 8800 (G80) could deliver up to four texture addresses and eight filtering operations per clock. Each TMU on the GeForce 8600 can deliver twice the number of texture addresses while maintaining the same number of filtering ops (8 and 8 vs. 4 and 8). What does this mean? G80 (GeForce 8800) can deliver 64 filtering operations per clock but only 32 texture addresses per clock. G84 and G86 were built to match the texture addressing operations per clock with the existing filtering operations per clock.

Generally textures are a two-dimensional color arrays whose values are called a texture elements, or texels. Each texel has its own unique address in the texture with a numeric column and row value. This is similar to when you graphed equations in Quadrant I of a Cartesian coordinate system during geometry class.

Texture coordinates are in texture space. When a texture is applied to a primitive, the texel address is mapped to the object. These coordinates are then translated to screen coordinates or pixel location. For the Direct3D API, the mapping process is actually an inverse of this mapping where texels are mapped from texture space directly to pixels in screen space. From each pixel in screen space, the corresponding texel position in texture space can be calculated and the color at or near that point is sampled via texture filtering (Linear, Bilinear, Trilinear and Anisotropic). Consequently the coordination of building a TMU that can directly handle addressing and filtering at a 1:1 ratio could prove to be beneficial. (We already have some ideas of how to test if it is but is beyond the scope of this article.)

Below is a table containing key specifications for the existing G80 graphics processor as well as the new G84 and G86 processors.

Swipe to scroll horizontally
SpecificationGeForce 8800 GTXGeForce 8800 GTSGeForce 8600 GTSGeForce 8600 GTGeForce 8500 GTGeForce 8400 GSGeForce 8300 GS
Fabrication Process90 nm90 nm80 nm80 nm80 nm80 nm80 nm
Number of Transistors (millions)681681289289210210210
Core Clock (Including dispatch, texture units, and ROP units)575 MHz500 MHz675 MHz540 MHz450 MHz450 MHz450 MHz
Shader Clock (Stream Processors)1.35 GHz1.20 GHz1.45 GHz1.19 GHz900 MHz900 MHz900 MHz
Stream Processors (#)12896323216168
Memory Clock (MHz / data rate)900/1800800/16001000/2000700/1400400/800400/800400/800
Memory Interface384 Bits320 Bits128 Bits128 Bits128 Bits64 Bits64 Bits
Memory Bandwidth (GB/sec)86.4 GB/s64.0 GB/s32.0 GB/s22.4 GB/s12.8 GB/s6.4 GB/s6.4 GB/s
Frame Buffer Size768 MB640 MB256 MB256 MB256 MB128 MB or 256 MB128 MB or 256 MB
ROPs (#)242088844
Texture filtering rate (texels per clock)64481616888
Texture Fill Rate (Billions of bilinear filtered texels/sec)36.80 GT/s24.00 GT/s10.80 GT/s8.64 GT/s3.60 GT/s3.60 GT/s3.60 GT/s
HDCP SupportYesYesYesOptionalOptionalOptionalOptional
RAMDACs400 MHz400 MHz400 MHz400 MHz400 MHz400 MHz400 MHz
Bus TechnologyPCI Express 1.1aPCI Express 1.1aPCI Express 1.1aPCI Express 1.1aPCI Express 1.1aPCI Express 1.1aPCI Express 1.1a
Available FromRetailRetailRetailRetailRetailOEMOEM