I have been trying to understand memory timing numbers and how they work with the Front Side Bus (FSB) and memory speeds to determine memory transfer rates to and from the CPU.
The following table shows the relationship between memory rates, FSB clock rates, and peak transfer rates.
Code :
DDR Speed Memory clock Cycle time FSB Bus clock Module name Peak transfer rate
DDR, DDR2, and DDR3 are examples of synchronous dynamic random access memory (SDRAM). Synchronous means that the memory timing is driven by the FSB Memory clock rate (with Nehalem, the FSB is no longer involved with memory access). So the memory timing numbers for SDRAMs are in units of clock cycles.
The memory timing numbers are a measure of the latency (i.e. delay) between when a memory action is requested and when it will finish. There are four memory actions whose latencies are indicated by memory timings. From left to right, the integers denoting the latency in the number of memory cycles are:
Column address strobe latency - elapsed time in clock cycles between the moment a memory controller tells the memory module to access a particular column in a selected row, and the moment the data from the given array location is available on the module's output pins.
Row to column address delay - elapsed time to move from one row to the next
Row Precharge time - elapsed time to change the voltage between a one and a zero; computers are binary
Row Active Time - the number of clock cycles taken between a bank active command and issuing the precharge command
So a memory timing of 7-7-7-20 means that it takes 7 clock cycles to perform each of the first three actions. The row active time (the 4th and last number) is approximately the sum of the first three numbers.
So how does DDR3-800 running at 6-6-6-15 timing compare to DDR3-1333 running at 9-9-9-24? The first chip has slower clock rate (bad) but a shorter latency (good). The second chip has a faster clock rate (good) but a longer latency (bad).
CAS Latency for DDR3-800 6-6-6-15 = 6 / 100 MHz = 6*10**-8 seconds
CAS Latency for DDR3-1333 9-9-9-24 = 9 / 166 MHz = 5.4*10**-8 seconds
In spite of its higher memory timing, DR3-1333 9-9-9-24 is faster than the DDR3-800 6-6-6-15.
CAS latency is the best case number. The Row Active Time (RAT) is the worst case number.
RAT Latency for DDR3-800 6-6-6-15 = 15 / 100 MHz = 1.5*10-7 seconds
RAT Latency for DDR3-1333 9-9-9-24 = 24 / 166 MHz = 1.4*10-7 seconds
So the advantage for the DDR3-1333 9-9-9-24 is less. But if you are moving billions of bytes per second then adding a little to a little adds up to a big savings.
One other key point. A big difference between DDR2 and DDR3 is that DDR3 doubled the size of the data prefetch buffer from 4 bits per cycle to a full 8 bits (i.e. a byte) with each pass. That is a 100% increase.
A question that I still have is how does Intel's Quick Path Interconnect (QPI) technology affect all this? Quad cores trying to access shared memory has got to create some bottlenecks. Shared nothing architecture? A question for another night. In Nehalem, Intel has moved memory access responsibility to a new Integrated Memory Controller (IMC). The IMC directly communicates between the L3 shared cache and the DDR3 triple channel memory, potentially allowing three concurrent memory accesses. In a single Quad-core CPU configuration, QPI provides a point to point connection between that CPU's L3 cache and the X58 IO Hub. The IO Hub handles communication with the PCIe 2.0 graphics card(s). In future multi-CPU configurations (i.e. servers) QPI would also directly link each CPU to every other CPU.
A QPI connection consists of two 20-pair point-to-point data links, one in each direction. This allows communication in both directions simultaneously. The old northbridge (prior to Nehalem) architecture defined one path and communication could occur in only one direction at a time (not simultaneously). Take about S-L-O-W, especially since memory requests for the CPU were competing on this same path with traffic from the PCI graphics card(s); traffic jam!
EDIT: Added info about Nehalem's QPI. Added information about QPI being full-duplex. Noted that with Nehalem, the FSB is no longer directly involved with memory clock rates.
Message edited by MikeJRamsey on 07-06-2009 at 03:34:48 AM
I have given this post a Nehalem flavor (sorry AMD guys). Continuing along that line, I have learned that the timings within an Intel i7 CPU is determined by the BCLK (base clock) frequency. The CPU, the IMC, and the QPI clock rates are derived from the BCLK.
The BCLK frequency is set by default to 133MHz. This says that if you do nothing, your DDR3-1600 memory runs not at the 200 MHz that you thought you were buying but at the same speed as that DDR3-1066 memory that you turned your nose up at. Don't believe me, ask Intel: http://www.intel.com/support/proce [...] 029913.htm
"What is the maximum frequency for DDR3 memory when used with Intel® Core™ i7 desktop processors?
These processors support DDR3 memory with a maximum frequency of 1066 MHz. If faster DDR3 memory is used (such as 1333 MHz or 1600 MHz), it will be down-clocked to operate at 1066 MHz."
And if the CAS timing on that DDR3-1600 stick was greater than 7, boy are you embarressed.
We can also derive that the default memory multiplier for the i7 processors is 1066/133 = 8
"Finally, there is the BCLK (base clock) to consider. The ultimate frequency of a Core i7 processor, its QPI link, and its memory speed are all derived from a BCLK frequency, which is set to 133MHz by default. Although the FSB is no more, it may help to think of the BCLK frequency as similar to the FSB clock. Raising the BCLK raises the CPU clock, QPI speed, and memory speed in lockstep. To keep all things running within stable limits, most X58-based motherboards for the Core i7 give users the ability to alter QPI and memory multipliers independently of the CPU, QPI, and memory clocks. All can be fine-tuned to some degree. Raising the base clock is how we overclocked our Core i7 920."
<Added by EDIT> If you owned DDR3-1333 memory like I do, the memory multiplier has to go to 10.
Mike
Message edited by MikeJRamsey on 07-08-2009 at 03:42:50 AM