Performance Impact of Low Latency DRAM

Memory Bus Saturation

The fastest uniprocessor PCs, when engaged in ordinary tasks will usually exhibit very low DRAM bus utilization. While we find ourselves wrapped up in talk of 500 MB/s, 800 MB/s or even 1.6 gigabytes per second, most real applications hover around at a startlingly low 5, 10 or 20 MB/s.

Intel clarified this in a presentation given at the Intel Developers Forum in February of 1998. Intel engineers characterized the external bandwidth demand of several popular benchmarks. We should acknowledge that benchmarks will usually work a PC much harder than a normal human can. During the several minutes that a benchmark may be running, the PC accomplishes about as much work as a human can force it to do in a week. Yet, the external bandwidth demands remain quite low as demonstrated by Intel's data below.

Corel Draw is a fairly challenging application. It is probably more challenging than Word, Excel or other business apps. Yet, during the run of the benchmark, the majority of the results are pegged to the bottom of the chart - very close to ZERO megaBytes per second. Of course there are a few blips up to 15 or 30 MB/s.

With a maximum peak bandwidth of 533 MB/s, this load represents about 1% to 5% bus saturation. Increasing the available bandwidth to the Gigabyte level seems utterly senseless for this type of application. The only effect it would have is to create more unused bandwidth. But improving latency could still show a performance improvement - though quite small, due to the low bus utilization.

Other applications do drive the bus a little harder though. Soft DVD decode averages about 60 MB/s quite consistently. 3D games can range from 60 to about 100 MB/s on average. These figures will increase if the cache is turned off or reduced in size, by running faster CPU clock speeds, and due to AGP or UMA architectures.

Above all, let us remember that high bus saturation is a problem , not an advantage. If an application burns a lot of external bandwidth, it may be because the code is not optimized, it is not making good use of the cache, and its performance will be disappointing. Well-refined games like Quake2 are very playable and acceleratable because they are optimized to fit in the caches and have relatively low external bandwidth requirements. Games that do not scale well as CPU speeds increase may be thrashing the cache as a result of poor code optimization.