The Impact of the AGP-Speed

Introduction

I guess that I don't have to explain too much about AGP, the 'accelerated graphics port', anymore, which is nowadays the state-of-the-art interface between the system chipset and the graphics card. AGP was developed three years ago to allow data transfer between the system and the graphics adapter at a significantly higher bandwidth than PCI. It was 1997 when graphics cards with 3D-acceleration became not only fashionable, but also pretty common, and those 3D-accelerating graphics cards required much more data from the CPU and memory than their only '2D-accelerating' predecessors. AGP was born to accommodate those new needs. Here's an article I wrote about AGP in 1997 , which gives you more details about it.

The Theory Of AGP-Performance Vs. PCI-Performance

Already at the first launch of the AGP, specifications allowed two different AGP-speeds, AGP1x and AGP2x. The main differences between AGP and PCI start with the fact that AGP is a 'port', which means it can only host one client and this client ought to be a graphics accelerator. PCI is a bus, and it can serve several different kinds of clients, may it be a graphics accelerator, a network card, a SCSI host adapter or a sound card. All those different clients have to share the PCI-bus and its bandwidth, while AGP offers one direct connection to the chipset and from there to the memory, the CPU or the PCI-bus.

The normal PCI-bus is 32-bit wide and is clocked at 33 MHz. Thus it can offer a maximum bandwidth of 33 * 4 Byte/s = 133 MB/s. The new PCI64/66-specification offers four times as much, it comes with 64-bit width and a 66 MHz clock, thus its bandwidth limitation lies at 533 MB/s. However, please let's not forget that PCI64/66 is hardly supported anywhere yet and it was particularly developed to host I/O-controllers with very high data bandwidth, as e.g. IEEE1394 or Gbit-network interface cards. AGP is clocked at 66 MHz to begin with and it is 32-bit wide. This offers a maximal bandwidth of 266 MB/s in case of AGP1x, where data is transferred the common way, at the 'falling edge' of each clock. AGP2x offers 533 MB/s, by transferring data at the rising as well as the falling edge of the clock. The new addition called AGP4x doubles this bandwidth another time to 1066 MB/s.

Why AGP?

In the first years of AGP its higher bandwidth was mainly used to get textures for 3D-objects to the 3D-accelerator. Some 3D-accelerator took merely advantage of AGP's high bandwidth and used it for the same kind of tasks as they would have used PCI before. Other 3D-chips were using the 'AGP-texturing', which enables the 3D-accelerator to store and leave large textures in the main system memory and use them in for the rendering process directly from there, without storing those textures in its local graphics memory. This is certainly still an issue today, but the demands for AGP4x were coming from a different corner in the 3D-rendering process, the transfer of triangle-data of complex 3D-objects. Before a 3D-scene goes through the transform and lighting part, the objects in this scene need to be known to the renderer. The more detailed these objects are, the more vertices have to be transferred. NVIDIA's GeForce, as the first 3D-accelerator with integrated Transform and Lighting, can process a huge amount of triangles, but before it can start, the data needs to be transferred to GeForce, which is obviously done over the AGP.

Benchmarking The AGP

This fact obviously needs to be considered when benchmarking the AGP as well. AGP-benchmarks from a few years ago did nothing but displaying scenes that were using huge textures, trying to saturate the AGP with large texture data streams. Those benchmarks were hardly able to show much of a difference between AGP1x and AGP2x even back then, but they certainly can't show the performance advantage of AGP4x today. This is why you need to use different techniques to saturate the AGP. The best way to show AGP-performance today are 3D-scenes with very complex objects in it, using the AGP to transfer huge amounts of triangle data. You will see that in the benchmark results below. However, today's 3D-games are not using by far enough polygons to saturate AGP4x. Again we'll have to wait for 'upcoming titles'. For the time being it is mainly professional OpenGL-software that uses very complex 3D-objects. Thus this software is most suitable to take advantage of AGP4x right now.

Issues Combined With AGP

If you should have read my good old article 'AGP - A New Interface for Graphic Accelerators ' you may recall that back then I demanded the 100 MHz memory bus to supply enough bandwidth for the AGP and the other parts of a system that require memory access at the same time. Today the demands are of course even higher. The AGP's data bandwidth can only be used completely if the system has ample memory bandwidth. The memory is permanently accessed by several system devices at the same time, as the CPU, PCI-Masters, DMA-devices and the AGP. If the AGP is to supply its full bandwidth, the memory bandwidth needs to be at least as high as the AGP-bandwidth, since the memory is where the data to the AGP-device comes from under most circumstances. In case of AGP4x and its 1066 MB/s at least PC133-memory is required, which offers exactly the same bandwidth of 64-bit times 133 MHz = 1066 MB/s. We remember however that the AGP has never got the memory bandwidth to its own disposal; it has to share it with the rest, so that AGP4x can only live up to its full capacity when the system is either using RDRAM or the upcoming DDR-SDRAM. One PC800 RDRAM-channel, as used in platforms with Intel's 820-chipset, supplies 1.6 GB/s, PC200 DDR-SDRAM offers the same, PC266 DDR-SDRAM raises that to 2.1 GB/s and finally two PC800 RDRAM-channels, as found in platforms with Intel's 840-chipset, can supply even 3.2 GB/s. Platforms with one of those memory types will show better performance than PC100 or PC133-platforms as software is starting to make usage of AGP4x.

Fast Writes, A Unique Feature Of GeForce

One of the special features of NVIDIA's GeForce256 graphics accelerator is its unique support for 'Fast Writes '. The idea behind this implementation is the improvement of data transfers that go directly from the CPU to the graphics chip, which obviously does not touch such things as e.g. 'AGP-texturing'. 3D-software with very complex 3D-objects requires that the CPU transfers a huge amount of triangle-data to the graphics chip and here the 'Fast Writes' avoid the stalling detour from the CPU to memory and then from memory to the 3D-chip. 'Fast Writes' idea is to directly connect CPU and 3D-chip. So far about the theory, please look at NVIDIA's white paper for more detail. Currently 'Fast Writes' are only made available to platforms using either Intel's 820 or 840 chipset. Other AGP4x-chipsets like VIA's Apollo Pro 133 Slot1-chipset and VIA's Apollo KX133 SlotA-chipset are currently not supported by GeForce's drivers. Further down in this article you will find out why this is an actual blessing right now, since the driver seems to have some problems with Fast Writes, leading to a rather significant drop in performance in i820 and i840-systems.

AGP And WindowsNT

After describing the hardware-facts of AGP I should not forget to mention that AGP requires a bit of software as well. As you might recall, AGP offers the graphics chip fast access to system memory for several purposes, AGP-texturing is one of them. The operating system has to be aware of that and hand memory resources over to the driver of the graphics card. The GART (graphics address remapping table) is where these memory resources are listed and the GART-driver is the software that takes care of it. Today all graphics card drivers for Windows95 and Windows98 include the GART-driver for platforms with Intel-chipsets, called 'vgartd.vxd'. The other chipset vendors have to supply their own GART-driver with the software that comes with the motherboard. An Athlon-system for example is not able to even recognize its AGP-graphics card unless you've installed this driver, the driver-file called 'amdmp.sys' for the AMD750-chipset or 'viagart.vxd' for VIA's Apollo KX133 chipset.

Microsoft's soon released, but heavily used operating system Windows NT was actually never meant to offer AGP-support. There is no GART-driver in any of the many service pack updates for NT, so that graphic-chip vendors were left alone to supply AGP-support under Windows NT. This AGP-support may or may not be implemented into the NT-driver of a graphics card. You can only tell by using some special detection software or from benchmarking under NT. I have so far tested the AGP-support under Windows NT only for NVIDIA graphics chips and found out that TNT, TNT2 and GeForce have AGP-support, but usually only on platforms with Intel-chipset. Platforms with other chipsets can only take advantage of the so-called 'PCI66' mode under NT, which offers a data bandwidth of not quite as much as AGP1x. The latest, but not official, exception to this rule is currently only VIA's new Athlon chipset Apollo KX133, which runs GeForce at full blown AGP4x even under Windows NT. This will hopefully all improve with Windows2000.

Benchmarks For AGP-Testing

I've already mentioned it above, AGP-benchmarks are not that easy to come by and most games don't take much advantage of anything faster than AGP2x. It's also important to be aware of some more restrictions when using 3D-games for AGP-benchmarking. The test is obviously supposed to have AGP as its bottleneck, so that you get different results at different AGP-speeds. It's obviously rather helpful to use fast CPUs if you want to avoid that the CPU becomes the bottleneck. It's also important to avoid the graphics card becoming the bottleneck, typically in form of a fill rate or local memory bandwidth restriction. Thus the 3D-gaming benchmarks should not run at a screen resolution and color depth that is too high, because that's when the fill rate or memory bandwidth restrictions kick in.

After several trial and errors I found out that the 'High Polygon, 1 Light'-benchmark within 3Dmark2000 is a very revealing AGP-benchmark. NVIDIA's SphereMark, found at NVIDIA's website, is also producing valid results for this purpose. The only 3D-game I used was the widely available Quake 3 Arena from Id-Software, which I ran at 3 different settings, the 'NORMAL'-setting, the 'High Quality'-setting and at a setting derived from 'High Quality' with the resolution increased to 1024x768 pixels. It also turned out that SPEC's 'viewperf' OpenGL-benchmark was producing very helpful results. I ran it under the operating system Windows98, which is rather odd for this benchmark, but because of the above mentioned driver restrictions of GeForce's NT-driver I was not able to adjust the AGP-speeds under WindowsNT.