Real3D's Response to Does AGP Really Improve Performance?

Technical Background

Bert does an adequate job of explaining what AGP is and how it works in comparison to PCI. AGP DMA Mode can offer good performance and concurrency. But while his points about AGP Execute Mode creating bottlenecks if data is not immediately accessible may apply to some graphics chips, they certainly do not apply to the i740.

The Execute Mode implementation used by the i740 provides a variety of key features.

The i740 was designed with a very deep pipeline to avoid bottlenecks while waiting for texture data. Early in the i740 pipeline, the required texels for each pixel are identified and requested. This allows for the latencies associated with AGP so that the texture data is available for rasterization when needed without creating a bottleneck in the pipeline.

As part of this design, only the texels required by the current frame need to be accessed. So if a texture map has a size of 512x512, but only a 16x32 piece is needed, then the i740 only grabs those required texels. In DMA mode, the entire 512x512 map must be copied to local memory.

By accessing only the required texels, textures up to 1024x1024 can be used with no performance penalty. A single mip-mapped texture of this size would take up 2.7 MB of local memory in other architectures making large maps nearly impossible to use.

It then also follows that if anything less than the entire texture map is needed, which is often the case, that the i740 implementation would require less bandwidth than DMA Mode because less data needs to be accessed. And if less data is being accessed, then the amount of time the CPU is being kept from main memory also goes down.

Finally, the i740 is not impacted by frequent texture changes because there is no reliance on textures being stored locally. All textures are accessed directly from AGP memory so texture changes do not require copying each new texture to local memory for use.

30% Reduction in Accelerator Performance

Bert goes on to discuss the relative performance of AGP texturing to local memory texturing of several boards currently on the market. The test uses texture sizes up to 4 MB and shows an average 30% performance increase for local versus AGP texturing. As stated before, at some point there is a limit to the amount of local memory on the board. A better test would use more texture than any of the boards have in local memory and then evaluate the boards on how effectively they use AGP beyond the bounds of their local memory. This type of test would better represent the newer applications on the market and those to come in the future.

10% Reduction in CPU Performance

We all agree that when an AGP device is accessing main memory that the CPU is locked out of that memory thus reducing overall CPU performance. The point that Mr. McComas continues to overlook is that if textures are DMA'd from system to local memory, the exact same thing happens. The CPU is forced to wait while the DMA copy happens. In reality, since the i740 needs only the required texels for the current frame, the memory accesses come in short bursts and require less total data; therefore, blocking the CPU from memory for less time than the sustained accesses required for DMA copying.

Conclusion

We've tried to address a number of points Bert made and illustrate that AGP certainly delivers a number of advantages over PCI. Also, we'd like to take the opportunity to address what Bert calls "The Deal with the i740".

As Bert himself states, "the i740 seems to be a very good product." As mentioned before, the Intel740-based boards usually rate very high in overall 2D and 3D performance, and this is both in benchmarks and real-world applications such as games. At the same time, no one will deny - including Tom himself - the outstanding image quality delivered by the i740.

We do not agree that the i740 drivers are "crippled" - and that goes for both the Intel reference drivers or our own custom drivers specifically for the Real 3D StarFighter. The i740 hardware will not allow texturing in local memory on the board, and in the case of the i740, it wouldn't deliver any significant benefits. As we've discussed previously, the AGP execute mode implementation of the i740 does not create any bottlenecks and is certainly faster than the PCI bus. Granted, the i740 is not "blowing away the competition" - at least not all - in terms of raw performance. However, we could make a good argument that the i740 - or at least the StarFighter - is "blowing away the competition" when you consider both 2D/3D performance as well as image quality. And we don't expect people to take our word for it. We'd encourage people to check out the numerous reviews of the StarFighter board and then judge for themselves.

We appreciate Bert's recognition that Real 3D may be the only company that has the ability to fix the situation. The point is, though, that there's no situation to fix. The StarFighter/i740 offers a number of advantages over other graphics boards. Is the i740 perfect? No, there are always improvements that can be made. But the fact remains - at this point in time - the i740 is a very good product and the StarFighter is the best of those Intel740-based products.

Tom: Thank you Chris for this comprehensive information! The i740 is certainly a good chip at its price level. However, so far I'm still waiting for the one application that will make me trash my two pathetic PCI Voodoo2 boards for a really great AGP card.

Chris Stellwag

Mgr. Corporate Marketing and Communications
Real 3D