Nvidia has not been asleep behind the wheel. It was the first to introduce a DX10 capable graphics processor. The phrase "what have done for me lately" is not misplaced as we should expect a lot from the people we shell out hundreds of dollars to in computer hardware. While AMD/ATI has yet to release a single card to the market, Nvidia now has its second delivery in the form of 169 square millimeter G84 and G86 processors.
The answer to my first question is yes. This new offering delivers upgrades to the previous card. Other than the die shrink to an 80 nm process and the utilization of less silicon, the GeForce 8600 has two primary changes. One was to the 3D engine and a second was a complete overhaul to Nvidia's PureVideo processing engine. The combination of both advances suggests that Nvidia has a specified consumer target in mind for these products. On one hand, there is the introduction of DX10 to more consumers as this brings the price down $70 for the average gaming enthusiast and as much as $150 less for the budget gamer. Additionally the home theater market consumer has a much better card and at an attractive price.
Now looking past the marketing, the core difference to the 3D engine is a "tweak" to the amount of texture processing the graphics core can deliver per clock cycle. Each texture mapping unit (TMU) on the GeForce 8800 (G80) could deliver up to four texture addresses and eight filtering operations per clock. Each TMU on the GeForce 8600 can deliver twice the number of texture addresses while maintaining the same number of filtering ops (8 and 8 vs. 4 and 8). What does this mean? G80 (GeForce 8800) can deliver 64 filtering operations per clock but only 32 texture addresses per clock. G84 and G86 were built to match the texture addressing operations per clock with the existing filtering operations per clock.
Generally textures are a two-dimensional color arrays whose values are called a texture elements, or texels. Each texel has its own unique address in the texture with a numeric column and row value. This is similar to when you graphed equations in Quadrant I of a Cartesian coordinate system during geometry class.
Texture coordinates are in texture space. When a texture is applied to a primitive, the texel address is mapped to the object. These coordinates are then translated to screen coordinates or pixel location. For the Direct3D API, the mapping process is actually an inverse of this mapping where texels are mapped from texture space directly to pixels in screen space. From each pixel in screen space, the corresponding texel position in texture space can be calculated and the color at or near that point is sampled via texture filtering (Linear, Bilinear, Trilinear and Anisotropic). Consequently the coordination of building a TMU that can directly handle addressing and filtering at a 1:1 ratio could prove to be beneficial. (We already have some ideas of how to test if it is but is beyond the scope of this article.)
Below is a table containing key specifications for the existing G80 graphics processor as well as the new G84 and G86 processors.
|Specification||GeForce 8800 GTX||GeForce 8800 GTS||GeForce 8600 GTS||GeForce 8600 GT||GeForce 8500 GT||GeForce 8400 GS||GeForce 8300 GS|
|Fabrication Process||90 nm||90 nm||80 nm||80 nm||80 nm||80 nm||80 nm|
|Number of Transistors (millions)||681||681||289||289||210||210||210|
|Core Clock (Including dispatch, texture units, and ROP units)||575 MHz||500 MHz||675 MHz||540 MHz||450 MHz||450 MHz||450 MHz|
|Shader Clock (Stream Processors)||1.35 GHz||1.20 GHz||1.45 GHz||1.19 GHz||900 MHz||900 MHz||900 MHz|
|Stream Processors (#)||128||96||32||32||16||16||8|
|Memory Clock (MHz / data rate)||900/1800||800/1600||1000/2000||700/1400||400/800||400/800||400/800|
|Memory Interface||384 Bits||320 Bits||128 Bits||128 Bits||128 Bits||64 Bits||64 Bits|
|Memory Bandwidth (GB/sec)||86.4 GB/s||64.0 GB/s||32.0 GB/s||22.4 GB/s||12.8 GB/s||6.4 GB/s||6.4 GB/s|
|Frame Buffer Size||768 MB||640 MB||256 MB||256 MB||256 MB||128 MB or 256 MB||128 MB or 256 MB|
|Texture filtering rate (texels per clock)||64||48||16||16||8||8||8|
|Texture Fill Rate (Billions of bilinear filtered texels/sec)||36.80 GT/s||24.00 GT/s||10.80 GT/s||8.64 GT/s||3.60 GT/s||3.60 GT/s||3.60 GT/s|
|RAMDACs||400 MHz||400 MHz||400 MHz||400 MHz||400 MHz||400 MHz||400 MHz|
|Bus Technology||PCI Express 1.1a||PCI Express 1.1a||PCI Express 1.1a||PCI Express 1.1a||PCI Express 1.1a||PCI Express 1.1a||PCI Express 1.1a|