DDR3-1333 Speed and Latency Shootout

Boot Straps, I.e., Intel's "Wrench In The Works"

The next step in complete memory testing is to find a module's highest performance settings at any given clock speed, by using its lowest stable latency values. This sounds simple enough but actually requires hours of stability testing on each module and at each speed to assure results are repeatable.

Most of the modules we tested were able to reach 1600 MHz data rate. The ideal solution for testing these would be to use an FSB-1600 processor at memory data rates of 1600 MHz, 1333 MHz and 1066 MHz. Those data rates correspond to frequently used Intel chipset DRAM to FSB clock ratios of 2:1, 5:3, and 4:3. This should be simple!

Unfortunately, Intel doesn't provide "every available ratio" at "every available bus speed." The company instead picks memory speeds it thinks its buyers will need and supplies only the appropriate ratios to each FSB setting.

Intel X38 Chipset Memory Ratios
FSB Data Rate 1:1 6:5 5:4 4:3 3:2 8:5 5:3 2:1
FSB-800 N/A N/A N/A N/A N/A N/A 667 800
FSB-1066 N/A N/A 667 N/A 800 N/A N/A 1066
FSB-1333 667 800 N/A N/A N/A 1066 N/A 1333
FSB-1600 800 N/A N/A 1066 N/A N/A N/A 1600

In order to choose a ratio that Intel didn't bless for any given bus speed, builder must choose a different FSB speed and "overclock" it.

The dilemma concerns something experienced overclockers know as "Boot Straps." The chipset's Northbridge gets its own clock, based on a ratio of FSB clock, and each Northbridge clock setting is represented by a boot strap. For example, the Northbridge to FSB ratio for FSB-800 is known as the "200 MHz Boot Strap" while the ratio for FSB-1600 is known as "400 MHz Boot Strap" based on the clock rate of the FSB. Manually setting a 400 MHz FSB clock (FSB-1600) while using the boot strap for a 200 MHz FSB clock (FSB-800) will overclock the Northbridge by 100%.

Intel X38 Chipset Memory Ratios (by "Boot Strap")
FSB Data Rate Boot Strap Memory Data Rate Memory Clock FSB Clock DRAM:FSB Ratio
FSB-800 200 DDR2-667 333 MHz 200 MHz 5:3
FSB-800 200 DDR2-800 400 MHz 200 MHz 2:1
FSB-1066 266 DDR2-667 333 MHz 266 MHz 5:4
FSB-1066 266 DDR2-800 400 MHz 266 MHz 3:2
FSB-1066 266 DDR3-1066 533 MHz 266 MHz 2:1
FSB-1333 333 DDR2-667 333 MHz 333 MHz 1:1
FSB-1333 333 DDR2-800 400 MHz 333 MHz 6:5
FSB-1333 333 DDR3-1066 533 MHz 333 MHz 8:5
FSB-1333 333 DDR3-1333 667 MHz 333 MHz 2:1
FSB-1600 400 DDR2-800 400 MHz 400 MHz 1:1
FSB-1600 400 DDR3-1066 533 MHz 400 MHz 4:3
FSB-1600 400 DDR3-1600 800 MHz 400 MHz 2:1

Notice for example that since Intel no longer supports the use of DDR2-533 (266 MHz clock speed), the company no longer provides a 1:1 ratio for its 266 MHz clocked FSB-1066. Also notice that the X38 chipset does support an FSB-1600 boot strap, but this setting does not support the 5:3 ratio needed to use it with DDR3-1333. In order to enable the 5:3 DRAM to FSB ratio, a "200 MHz Boot Strap" must be used rather than the "400 MHz Boot Strap" native to FSB-1600.

The effects of selecting the wrong boot strap cannot be over-emphasized, as neither the P35 nor X38 chipsets can be overclocked by 100%, and even if they could, it would have a noticeable impact on total system performance.

This prevented us from using several "Native DDR3-1333" modules with an FSB-1600 processor on our Gigabyte X38T-DQ6 motherboard, because the board would automatically set the 400 MHz FSB clock and 5:3 DRAM:FSB ratio, which in turn forced the lower 200- MHz boot strap at the higher 400 MHz FSB clock. The result of this 100% Northbridge overclock was a failed boot.

So we can't recommend DDR3-1333 for use with FSB-1600 on the P35 chipset, but what about the X38? Our Asus Maximus Extreme set the correct 400 MHz boot strap, which thus eliminated the required 5:3 DRAM to FSB ratio, and all modules instead defaulted to DDR3-1066 speed.

Create a new thread in the US Reviews comments forum about this subject
This thread is closed for comments
Comment from the forums
    Your comment
  • dv8silencer
    I have a question: on your page 3 where you discuss the memory myth you do some calculations:

    "Because cycle time is the inverse of clock speed (1/2 of DDR data rates), the DDR-333 reference clock cycled every six nanoseconds, DDR2-667 every three nanoseconds and DDR3-1333 every 1.5 nanoseconds. Latency is measured in clock cycles, and two 6ns cycles occur in the same time as four 3ns cycles or eight 1.5ns cycles. If you still have your doubts, do the math!"

    Based off of the cycle-based latencies of the DDR-333 (CAS 2), DDR2-667 (CAS 4), and DDR3-1333 (CAS8), and their frequences, you come to the conclusion that each of the memory types will retrieve memory in the same amount of time. The higher CAS's are offset by the frequences of the higher technologies so that even though the DDR2 and DDR3 take more cycles, they also go through more cycles per unit time than DDR. How is it then, that DDR2 and DDR3 technologies are "better" and provide more bandwidth if they provide data in the same amount of time? I do not know much about the technical details of how RAM works, and I have always had this question in mind.
  • Anonymous
    Latency = How fast you can get to the "goodies"
    Bandwidth = Rate at which you can get the "goodies"
  • Anonymous
    So, I have OCZ memory I can run stable at
    7-7-6-24-2t at 1333Mhz or
    9-9-9-24-2t at 1600Mhz
    This is FSB at 1600Mhz unlinked. Is there a method to calculate the best setting without running hours of benchmarks?
  • Anonymous
    Sorry dude but you are underestimating the ReapearX modules,
    however hard I want to see what temperatures were other modules at
    a voltage of ~ 2.1v, does not mean that the platinum series is not performant but I saw a ReapearX which tended easy to 1.9v(EVP)940Mhz, that means nearly a DDR 1900, which is something, but in chapter of stability/temperature in hours of functioning, ReapearX beats them all.
  • Anonymous
    All SDRAM (including DDR variants) works more or less the same, they are divided in banks, banks are divided in rows, and rows contain the data (as columns).
    First you issue a command to open a row (this is your latency), then in a row you can access any data you want at the rate of 1 datum per cycle with latency depending on pipelining.

    So for instance if you want to read 1 datum at address 0 it will take your CAS lat + 1 cycle.

    So for instance if you want to read 8 datums at address 0 it will take your CAS lat + 8 cycle.

    Since CPUs like to fill their cache lines with the next data that will probably be accessed they always read more than what you wanted anyway, so the extra throughput provided by higher clock speed helps.

    But if the CPU stalls waiting for RAM it is the latency that matters.
  • Anonymous
    what is on pc3-10600s "s" ?