Bandwidth Vs. Latency - A Progress Report
The burst protocol was popularized in the PC architecture during the 486/33 era (due to the integration of the L1 cache). Today, CPUs are running at 400 MHz - a clock speed increase of 12x . Over the same period, the peak burst bandwidth of the DRAM sub-system has also improved by 12x - having increased from 66 MB/s in the 486 days, to 800 MB/s today. CPU clock speeds and DRAM burst bandwidth seem pretty well aligned. But, a quick look at the latency situation leads to a very different conclusion.
During the same period, effective DRAM latency has not improved by 12x, or even remained constant. In fact, measured in CPU core wait states, latency has become worse by a factor of more than 5x .
In the 486 days, the CPU core operated at its external bus speed (33 MHz) and DRAM latency caused the CPU to stall for about 5 CPU clocks. In a P2/400 system, bus latency is a stiff seven clocks, but the CPU is running at a 4x clock multiplier. With calculator in hand, it is clear to see that when a 400 MHz CPU stalls, it now takes an astronomical 28 core CPU clocks to resolve the stall and resume execution. From the perspective of the CPU core, latency has degraded by an incredible 5.6x.
What's your guess... what really needs fixing, peak burst bandwidth or latency?
Silly as it may seem, some continue to insist that peak burst bandwidth is the main issue. The most prominent example is Rambus. Rambus wants to redouble the peak burst bandwidth up to 24x, while making latency even worse in the process.
Rambus, SLDRAM and DDR can all spew out burst cycles twice as fast as any X86 CPU can ingest them, but how each of these high bandwidth memory types rate on latency must be further evaluated individually.
As a general rule, once you satisfy the CPU's maximum burst bandwidth rate, it is difficult to improve performance by adding more bandwidth. Under these circumstances, differences in CPU performance will be determined primarily by memory latency.