3D Benchmarking - Understanding Frame Rate Scores

3. The Huge Impact Of The Memory Bandwidth

In the past, and that means up to fairly recently, the memory bandwidth of the local graphics memory didn't use to be much of an issue. Hardly any 3D-chip before NVIDIA's GeForce256 was ever really limited by its memory. When GeForce256 was released in October 1999 it came with SDR memory at 166 MHz clock. The release of the famous 'GeForceDDR' cards, which were nothing else than the same chip, but with faster memory, showed how much a fast 3D chip can be stalled by slow memory. Things have become even worse with NVIDIA's latest high-end chip GeForce2 GTS. 3dfx's latest Voodoo5 5500 card is suffering from the same problem even a bit harder.

I am showing you this diagram once again to point out under how much threat the local memory of a modern 3D-card really is. Each red arrow is stealing a bit more of the available memory bandwidth.

  • First of all the local memory hosts the frame buffer , which consists of a front and a back buffer and in case of triple buffering even a third one. Those buffers have exactly the size of the screen resolution times the color depth. The frame buffer needs to be accessed by the rendering unit for each pixel several times.
  • The Z-buffer is also as big as the screen resolution times the Z-buffer depth. It gets accessed like crazy. You get an idea how hefty Z-buffer puts a threat on memory bandwidth when you realize that Intel added the 'display cache' option to the integrated 3D-graphics of i810, which is only supposed to host the Z-buffer of i810. This 'display cache'=external Z-buffer improves 3D-performance of i810 considerably, because the Z-buffer is the most accessed part of graphics memory.
  • Then there is the texture buffer , which holds compressed or uncompressed textures that can then be accessed faster by the rendering unit than if the rendering unit would have to fetch it from main system memory through the AGP. Again, textures need to be read for each pixel several times, depending on the filtering option and the amount of textures applied per pixel.
  • I am not quite in the picture of how much impact a T&L-unit has on memory bandwidth, but you can be sure that it is taking at least a small part of it as well.
  • Last but not least there is the RAMDAC, which needs to read the front frame buffer to display it on the screen. The higher the resolution and the higher the refresh rate the more often the RAMDAC has to access the frame buffer. You might think that this is not an issue today anymore, but you are sadly mistaken! A 3D-card that is already limited by its memory bandwidth, such as e.g. a GeForce2 GTS card, reacts extremely sensitive to high refresh rates. I measured an impact of over 15% at 1600x1200x32-bit color when I switched between 60 and 85 Hz refresh rate. At lower resolutions it is still an issue.