DRAM Performance: Latency vs. Bandwidth

CPUs Need Latency - Caches Want Bandwidth

Before we go any farther let's make sure that you understand how DRAM accesses are generated.

The CPU core screams along very nicely at half a Gigahertz reading code and data from the caches, until it experiences a cache miss. At this point, all or part of the CPU comes to a screeching halt until the CPU's need for the missing code or data is satisfied. The CPU then generates a 64-bit external read which is called the "demand word access". When the "demand word access" is fulfilled, the CPU is able to continue processing. The period that the CPU must wait for the "demand word" is known as latency.

DRAM latency may be measured in nanoseconds, and can dynamically vary from under 40ns to over 100ns depending on many different factors. Latency may also be measured in terms of external CPU bus clocks, but in order to understand the CPU performance impact of latency, it must be evaluated in terms of core CPU clocks.

Next we move to the burst part of the cycle. After the CPU generates a demand word access and waits around for DRAM to respond, the L2 cache controller immediately kicks in with a short burst sequence which fills up part of the cache SRAM. The term "cache line fill" is often used to describe this transaction. The CPU may or may not need this data, but the cache controller fetches it just in case. It is also quite convenient that since the DRAM just went through the painful process of delivering the demand word, it is more than ready to pump out the neighboring data very quickly.

You have seen notations such as 7,1,1,1 or 5,2,2,2 used to describe bus transaction speed. The first value (5 or 7) is the number of bus clocks associated with latency. The next three values (1,1,1 or 2,2,2) are the bus clocks for the burst cache fill.

Peak burst bandwidth may be calculated using this clock rate. 100 MHz SDRAM hits 800 MB/s. The math is easy to do and generates attractively huge numbers. Armed with this formula, anyone with a calculator and a pocket protector can assume that they have found the secret to evaluating memory performance or bus performance. Not True!

Consider the jump from EDO to SDRAM at 66 MHz. EDO has a peak burst bandwidth of a rather pathetic 266 MB/s, while 66 MHz SDRAM delivers a screaming 533 MB/s. How many of you were blown away by the 2x performance boost when you first ripped out your EDO and dropped in SDRAM? You were lucky to see a 1% performance delta.

Unfortunately, peak burst bandwidth does not have a very direct relationship to CPU performance! (Hint: Consider LATENCY!!!)