Sign in with
Sign up | Sign in
Your question

Understanding memory timings

Last response: in Memory
Share
July 5, 2009 6:05:19 AM

I have been trying to understand memory timing numbers and how they work with the Front Side Bus (FSB) and memory speeds to determine memory transfer rates to and from the CPU.

The following table shows the relationship between memory rates, FSB clock rates, and peak transfer rates.

  1. DDR Speed Memory clock Cycle time FSB Bus clock Module name Peak transfer rate
  2. DDR3-800 100 MHz 10 ns 400 MHz PC3-6400 6400 MB/s
  3. DDR3-1066 133 MHz 7.5 ns 533 MHz PC3-8500 8533 MB/s
  4. DDR3-1333 166 MHz 6 ns 667 MHz PC3-10600 10667 MB/s
  5. DDR3-1600 200 MHz 5 ns 800 MHz PC3-12800 12800 MB/s


DDR, DDR2, and DDR3 are examples of synchronous dynamic random access memory (SDRAM). Synchronous means that the memory timing is driven by the FSB Memory clock rate (with Nehalem, the FSB is no longer involved with memory access). So the memory timing numbers for SDRAMs are in units of clock cycles.

The memory timing numbers are a measure of the latency (i.e. delay) between when a memory action is requested and when it will finish. There are four memory actions whose latencies are indicated by memory timings. From left to right, the integers denoting the latency in the number of memory cycles are:

  • Column address strobe latency - elapsed time in clock cycles between the moment a memory controller tells the memory module to access a particular column in a selected row, and the moment the data from the given array location is available on the module's output pins.

  • Row to column address delay - elapsed time to move from one row to the next

  • Row Precharge time - elapsed time to change the voltage between a one and a zero; computers are binary

  • Row Active Time - the number of clock cycles taken between a bank active command and issuing the precharge command

    So a memory timing of 7-7-7-20 means that it takes 7 clock cycles to perform each of the first three actions. The row active time (the 4th and last number) is approximately the sum of the first three numbers.

    So how does DDR3-800 running at 6-6-6-15 timing compare to DDR3-1333 running at 9-9-9-24? The first chip has slower clock rate (bad) but a shorter latency (good). The second chip has a faster clock rate (good) but a longer latency (bad).

  • CAS Latency for DDR3-800 6-6-6-15 = 6 / 100 MHz = 6*10**-8 seconds

  • CAS Latency for DDR3-1333 9-9-9-24 = 9 / 166 MHz = 5.4*10**-8 seconds

    In spite of its higher memory timing, DR3-1333 9-9-9-24 is faster than the DDR3-800 6-6-6-15.

    CAS latency is the best case number. The Row Active Time (RAT) is the worst case number.

  • RAT Latency for DDR3-800 6-6-6-15 = 15 / 100 MHz = 1.5*10-7 seconds
  • RAT Latency for DDR3-1333 9-9-9-24 = 24 / 166 MHz = 1.4*10-7 seconds

    So the advantage for the DDR3-1333 9-9-9-24 is less. But if you are moving billions of bytes per second then adding a little to a little adds up to a big savings.

    One other key point. A big difference between DDR2 and DDR3 is that DDR3 doubled the size of the data prefetch buffer from 4 bits per cycle to a full 8 bits (i.e. a byte) with each pass. That is a 100% increase.

    A question that I still have is how does Intel's Quick Path Interconnect (QPI) technology affect all this? Quad cores trying to access shared memory has got to create some bottlenecks. Shared nothing architecture? A question for another night. In Nehalem, Intel has moved memory access responsibility to a new Integrated Memory Controller (IMC). The IMC directly communicates between the L3 shared cache and the DDR3 triple channel memory, potentially allowing three concurrent memory accesses. In a single Quad-core CPU configuration, QPI provides a point to point connection between that CPU's L3 cache and the X58 IO Hub. The IO Hub handles communication with the PCIe 2.0 graphics card(s). In future multi-CPU configurations (i.e. servers) QPI would also directly link each CPU to every other CPU.
    A QPI connection consists of two 20-pair point-to-point data links, one in each direction. This allows communication in both directions simultaneously. The old northbridge (prior to Nehalem) architecture defined one path and communication could occur in only one direction at a time (not simultaneously). Take about S-L-O-W, especially since memory requests for the CPU were competing on this same path with traffic from the PCI graphics card(s); traffic jam!

    Each Nehalem processor core has its own dedicated L1 and L2 cache. http://www.intel.com/Assets/PDF/manual/253665.pdf

    BTW, I will correct any mistakes that the community finds in the above analysis. I am trying to understand; no guarantee that I do understand.

    I used the following Wikipedia entries:
    http://en.wikipedia.org/wiki/SDRAM
    http://en.wikipedia.org/wiki/Front_side_bus
    http://en.wikipedia.org/wiki/Memory_timings
    http://en.wikipedia.org/wiki/CAS_latency
    http://en.wikipedia.org/wiki/Precharge_interval

    And this article from Benchmark Reviews which I highly recommend
    http://benchmarkreviews.com/index.php?option=com_conten...

    EDIT: Added info about Nehalem's QPI. Added information about QPI being full-duplex. Noted that with Nehalem, the FSB is no longer directly involved with memory clock rates.
    July 5, 2009 3:58:25 PM

    Good effort thanks.
    AC
    Related resources
    July 6, 2009 9:09:46 PM

    Nice post. I wish I had that link when I started.

    I have given this post a Nehalem flavor (sorry AMD guys). Continuing along that line, I have learned that the timings within an Intel i7 CPU is determined by the BCLK (base clock) frequency. The CPU, the IMC, and the QPI clock rates are derived from the BCLK.

    The BCLK frequency is set by default to 133MHz. This says that if you do nothing, your DDR3-1600 memory runs not at the 200 MHz that you thought you were buying but at the same speed as that DDR3-1066 memory that you turned your nose up at. Don't believe me, ask Intel: http://www.intel.com/support/processors/sb/CS-029913.ht...

    "What is the maximum frequency for DDR3 memory when used with Intel® Core™ i7 desktop processors?

    These processors support DDR3 memory with a maximum frequency of 1066 MHz. If faster DDR3 memory is used (such as 1333 MHz or 1600 MHz), it will be down-clocked to operate at 1066 MHz."

    And if the CAS timing on that DDR3-1600 stick was greater than 7, boy are you embarressed. :cry: 

    We can also derive that the default memory multiplier for the i7 processors is 1066/133 = 8

    To get to the higher memory cycle rate that you paid for, you have to overclock. For example, here: http://www.computerpoweruser.com/editorial/article.asp?...

    "Finally, there is the BCLK (base clock) to consider. The ultimate frequency of a Core i7 processor, its QPI link, and its memory speed are all derived from a BCLK frequency, which is set to 133MHz by default. Although the FSB is no more, it may help to think of the BCLK frequency as similar to the FSB clock. Raising the BCLK raises the CPU clock, QPI speed, and memory speed in lockstep. To keep all things running within stable limits, most X58-based motherboards for the Core i7 give users the ability to alter QPI and memory multipliers independently of the CPU, QPI, and memory clocks. All can be fine-tuned to some degree. Raising the base clock is how we overclocked our Core i7 920."

    <Added by EDIT> If you owned DDR3-1333 memory like I do, the memory multiplier has to go to 10.

    Mike
    July 11, 2009 8:20:18 PM

    cool! nice thread
    July 23, 2010 1:33:24 PM

    Very helpful. I looked for info on BCLK overclocking everywhere with no success.
    September 12, 2010 4:53:24 AM

    I want to ask about overclocking ddr3 VS latencies. I want to build a gaming pc and want to know if I use DDR3 RAM with lower latencies (5-7-5 @ 1375MHZ) VS (7-8-7-20 @ 2000MHZ) will help the computers preformence? The big problem is $$$ for the CL5 DDR3 is $400 for 1x1GB or $800 2x1GB but how does the price bump work? Will I beable to see a bandwidth increase worth all the $$$ for CL5, or is the CL7 worth it? Ive seen people benchmark the (7-8-7-20 @ 2000MHZ) with around 48-52 GBs transfer rate. I think im asking will the lower latences show a preformence boost?
    !