DDR3-1333 Speed and Latency Shootout

Boot Straps, I.e., Intel's "Wrench In The Works"

The next step in complete memory testing is to find a module's highest performance settings at any given clock speed, by using its lowest stable latency values. This sounds simple enough but actually requires hours of stability testing on each module and at each speed to assure results are repeatable.

Most of the modules we tested were able to reach 1600 MHz data rate. The ideal solution for testing these would be to use an FSB-1600 processor at memory data rates of 1600 MHz, 1333 MHz and 1066 MHz. Those data rates correspond to frequently used Intel chipset DRAM to FSB clock ratios of 2:1, 5:3, and 4:3. This should be simple!

Unfortunately, Intel doesn't provide "every available ratio" at "every available bus speed." The company instead picks memory speeds it thinks its buyers will need and supplies only the appropriate ratios to each FSB setting.

Swipe to scroll horizontally
Intel X38 Chipset Memory Ratios
FSB Data Rate1:16:55:44:33:28:55:32:1

In order to choose a ratio that Intel didn't bless for any given bus speed, builder must choose a different FSB speed and "overclock" it.

The dilemma concerns something experienced overclockers know as "Boot Straps." The chipset's Northbridge gets its own clock, based on a ratio of FSB clock, and each Northbridge clock setting is represented by a boot strap. For example, the Northbridge to FSB ratio for FSB-800 is known as the "200 MHz Boot Strap" while the ratio for FSB-1600 is known as "400 MHz Boot Strap" based on the clock rate of the FSB. Manually setting a 400 MHz FSB clock (FSB-1600) while using the boot strap for a 200 MHz FSB clock (FSB-800) will overclock the Northbridge by 100%.

Swipe to scroll horizontally
Intel X38 Chipset Memory Ratios (by "Boot Strap")
FSB Data RateBoot StrapMemory Data RateMemory ClockFSB ClockDRAM:FSB Ratio
FSB-800200DDR2-667333 MHz200 MHz5:3
FSB-800200DDR2-800400 MHz200 MHz2:1
FSB-1066266DDR2-667333 MHz266 MHz5:4
FSB-1066266DDR2-800400 MHz266 MHz3:2
FSB-1066266DDR3-1066533 MHz266 MHz2:1
FSB-1333333DDR2-667333 MHz333 MHz1:1
FSB-1333333DDR2-800400 MHz333 MHz6:5
FSB-1333333DDR3-1066533 MHz333 MHz8:5
FSB-1333333DDR3-1333667 MHz333 MHz2:1
FSB-1600400DDR2-800400 MHz400 MHz1:1
FSB-1600400DDR3-1066533 MHz400 MHz4:3
FSB-1600400DDR3-1600800 MHz400 MHz2:1

Notice for example that since Intel no longer supports the use of DDR2-533 (266 MHz clock speed), the company no longer provides a 1:1 ratio for its 266 MHz clocked FSB-1066. Also notice that the X38 chipset does support an FSB-1600 boot strap, but this setting does not support the 5:3 ratio needed to use it with DDR3-1333. In order to enable the 5:3 DRAM to FSB ratio, a "200 MHz Boot Strap" must be used rather than the "400 MHz Boot Strap" native to FSB-1600.

The effects of selecting the wrong boot strap cannot be over-emphasized, as neither the P35 nor X38 chipsets can be overclocked by 100%, and even if they could, it would have a noticeable impact on total system performance.

This prevented us from using several "Native DDR3-1333" modules with an FSB-1600 processor on our Gigabyte X38T-DQ6 motherboard, because the board would automatically set the 400 MHz FSB clock and 5:3 DRAM:FSB ratio, which in turn forced the lower 200- MHz boot strap at the higher 400 MHz FSB clock. The result of this 100% Northbridge overclock was a failed boot.

So we can't recommend DDR3-1333 for use with FSB-1600 on the P35 chipset, but what about the X38? Our Asus Maximus Extreme set the correct 400 MHz boot strap, which thus eliminated the required 5:3 DRAM to FSB ratio, and all modules instead defaulted to DDR3-1066 speed.

Thomas Soderstrom
Thomas Soderstrom is a Senior Staff Editor at Tom's Hardware US. He tests and reviews cases, cooling, memory and motherboards.
  • dv8silencer
    I have a question: on your page 3 where you discuss the memory myth you do some calculations:

    "Because cycle time is the inverse of clock speed (1/2 of DDR data rates), the DDR-333 reference clock cycled every six nanoseconds, DDR2-667 every three nanoseconds and DDR3-1333 every 1.5 nanoseconds. Latency is measured in clock cycles, and two 6ns cycles occur in the same time as four 3ns cycles or eight 1.5ns cycles. If you still have your doubts, do the math!"

    Based off of the cycle-based latencies of the DDR-333 (CAS 2), DDR2-667 (CAS 4), and DDR3-1333 (CAS8), and their frequences, you come to the conclusion that each of the memory types will retrieve memory in the same amount of time. The higher CAS's are offset by the frequences of the higher technologies so that even though the DDR2 and DDR3 take more cycles, they also go through more cycles per unit time than DDR. How is it then, that DDR2 and DDR3 technologies are "better" and provide more bandwidth if they provide data in the same amount of time? I do not know much about the technical details of how RAM works, and I have always had this question in mind.
  • Latency = How fast you can get to the "goodies"
    Bandwidth = Rate at which you can get the "goodies"
  • So, I have OCZ memory I can run stable at
    7-7-6-24-2t at 1333Mhz or
    9-9-9-24-2t at 1600Mhz
    This is FSB at 1600Mhz unlinked. Is there a method to calculate the best setting without running hours of benchmarks?
  • Sorry dude but you are underestimating the ReapearX modules,
    however hard I want to see what temperatures were other modules at
    a voltage of ~ 2.1v, does not mean that the platinum series is not performant but I saw a ReapearX which tended easy to 1.9v(EVP)940Mhz, that means nearly a DDR 1900, which is something, but in chapter of stability/temperature in hours of functioning, ReapearX beats them all.
  • All SDRAM (including DDR variants) works more or less the same, they are divided in banks, banks are divided in rows, and rows contain the data (as columns).
    First you issue a command to open a row (this is your latency), then in a row you can access any data you want at the rate of 1 datum per cycle with latency depending on pipelining.

    So for instance if you want to read 1 datum at address 0 it will take your CAS lat + 1 cycle.

    So for instance if you want to read 8 datums at address 0 it will take your CAS lat + 8 cycle.

    Since CPUs like to fill their cache lines with the next data that will probably be accessed they always read more than what you wanted anyway, so the extra throughput provided by higher clock speed helps.

    But if the CPU stalls waiting for RAM it is the latency that matters.
  • what is on pc3-10600s "s" ?