Does AGP Really Improve Performance?

Technical Background

AGP has several advantages over PCI. It offers a minor data transfer rate advantage when it comes to moving the geometry stream from the CPU to the graphics card. When it comes to managing large texture databases, AGP's GART table allows the OS to manage textures in off screen memory as well as in system memory, and allows the graphics card to access them directly in either location. Prior to AGP, the game developer had three options for methods to manage textures:

Limit the texture database to whatever would fit in off-screen memory only. This usually delivers outstanding frame rates, but memory size constraints can limit artistic creativity. Depending on the graphics card, texture space could be as low as only one megaByte or possibly as high as five or six megaBytes.

Use the OS to manage textures in main memory, and require the CPU to copy textures from main memory to graphics memory as needed. This is PAINFULLY slow. In order to make AGP look as good as possible, Intel likes to compare AGP to this mode. This is what DirectX Retained Mode does, but game developers do not generally use this method.

Place the most frequently used textures in off screen memory (like #1 above), then lock down a few additional megaBytes of system memory for the remainder of the texture database. If the graphics chip needs a texture that is not in graphics memory, then the accelerator must use PCI master mode (DMA) to copy the needed texture, on demand, into texture swap space in graphics memory. Performance is very good. This has been the preferred approach for game developers, and can be programmed under Direct X Immediate Mode.

AGP is a modified approach to option 3. Memory management is a little more flexible, and the data transfer rate is better because of AGP's faster clock speed and bus pipelining.

AGP offers two ways to deal with textures. One is called DMA mode which operates almost exactly like Option 3 above, but transfers occur over AGP rather than PCI. The other is Execute Mode, which allows the graphics chip to access the texture information in main memory without first copying it to graphics memory. The effective bus throughput of Execute mode and DMA mode are the same. If anything, DMA mode could be faster because of better concurrency and deeper pipelining.

Intel has gone to great lengths to convince game developers that DMA mode stinks. They have even gone so far as to refer to method #2 above as DMA mode in order to confuse everybody. DMA stands for "Direct Memory Access". There is nothing direct about using the CPU to copy data. This is pure deception. True DMA uses a hardware bus master, like a PCI or AGP graphics accelerator.

Game developers and users should prefer AGP's DMA mode because it offers excellent performance while still being architecturally compatible with the installed base of PCI accelerators. Intel prefers Execute Mode because it only runs with AGP. As we all know, Intel is always trying to stir up more ways to persuade users to dump their PCI Pentium 233 systems ASAP, and go buy a more costly P2 AGP system. Intel's Mission is not to "Accelerate 3D Graphics", but rather to "Accelerate Obsolescence".

AGP's DMA mode offers excellent concurrency because the game can detect if it must swap textures before the texture is actually needed, while the CPU is still calculating the geometry (in the geometry setup stage). This way, the graphics card can begin fetching the texture before it is needed to paint pixels on the screen. Concurrency is one of the keys to performance.

In AGP Execute mode, texture accesses are driven by the rasterizer at the final stage of the 3D pipeline. At this point, the accelerator is dead if it cannot have immediate access to textures. In this way, AGP creates a nasty performance bottleneck. Instead of having immediate access to textures in high bandwidth local memory, the accelerator must stall while it arbitrates for access to slower system memory via AGP.