How To Overclock RAM
Playing the Silicon Lottery
There is a lot of chip-to-chip variability when it comes to memory, more so for DDR4 than DDR3. Two identical modules, from the same manufacturing batch, may be able to take vastly different maximum voltages before becoming unstable, but only in certain memory slots. While vendors do test for the performance of each IC, a memory chip is only guaranteed to perform at the advertised specifications. The differences show up when you try to overclock them.
Multiple memory packages can all have the same memory module inside, and there are chip-to-chip differences between ICs, even if the batch-number (when available) is the same.
Given how relatively inexpensive memory can be, serious overclockers generally purchase multiple kits, test each module, and select the best. The testing/selection procedure for this portion involves placing the modules (one at a time) in the same DIMM slot with the same memory parameters, and finding the modules that run benchmarks consistently (SuperPi 32m is recommended here) with the lowest DRAM voltage possible. The modules that pass with the lowest voltage are the best pieces of silicon.
After the best modules are selected, each is rotated into each DIMM slot on the motherboard and tested again to find which specific module performs best in which position. This last check should be performed even for a configuration that is restricted to an already-purchased kit. At this point, each DIMM should be physically labeled (a sticker works well) to identify the slot it belongs in; keeping a log of the lowest voltages for each and the parameters used for this test is very useful if long-term damage is suspected somewhere down the line.
Software Tools For Overclocking Memory
Parameters for memory overclocking can be changed via the motherboard firmware or through vendor-supplied software. Many motherboard vendors offer tuning utilities that incorporate stress testing and parameter manipulation, and there are freeware options like CPU-Z that can provide a quick system report and real-time measurement of each component’s operating frequencies.
Intel offers its Extreme Memory Profiles (XMP), which consist of pre-defined and validated overclocking settings that can be loaded via motherboard firmware or a vendor tuning utility. XMP allows the firmware/utility to automatically configure the DRAM voltage and latencies, and it can be a good option for those wishing to work with pre-optimized variables.
For stress-testing, the community's current favorite utility seems to be SuperPi, followed closely by Memtest86+. Both tools have extensive configuration options for running tests. Final benchmarking should be carried out using software that most closely mimics the application the system was intended for, like 3DMark for graphics applications and rendering, WinRAR, virtual machine performance, MATLAB’s memtest, etc.
Memory overclocking, like overclocking the processor, requires iterative tuning and patience. The general procedure is:
- Confirm stability. You can use Memtest86+, SuperPi 32M, Intel’s Extreme Memory tool, or a motherboard/vendor supplied software suite for this.
- Note “good” default parameters (parameters that can be returned to if all hell breaks loose).
- Confirm (via the motherboard firmware or software suite) that the memory frequency, timings/latency, and voltage values are the ones advertised by the memory vendor. Repeat Step 1 if any changes are made.
- Set the memory multiplier to its maximum allowable value, repeat Step 1.
- Increase the BCLK frequency by some small amount (10 Hz or so), repeat Step 1. We veer away from optimizing the memory frequency further if the BCLK being used has been set with regard to processor overclocking considerations. If the memory is unstable with the maximum memory multiplier at this point, it is always possible to reduce BCLK, increase the CPU multiplier, or reduce the memory multiplier (or any combination of these parameters) to achieve system stability. You should adjust the VTT in tandem with the BCLK.
- In case of problems (or just to see if it makes a difference), increase the DRAM voltage by a very small increment (0.01V, for example) and repeat Step 1.
- The CMD should be set to 1 (the motherboard firmware can call this variable CR1/CR2 or T1/T2).
- Tighten the primary memory timings, repeat Step 1. Ideally, the timings would be tightened (decreased) for better performance, but before that it is worth loosening (increasing) them to see if a higher BCLK or memory multiplier can be tolerated by the system with slightly increased latencies. Tightening the memory timings begins with an adjustment of the primary timings—a benchmark/stress test should be carried out after each set of changes—and proceeds in an iterative fashion to lower each number in the primary timings set. Secondary and tertiary timings have a much smaller impact on overall performance, but they can be adjusted in the same manner.
A SuperPi Memory Latency comparison of a single memory IC at various frequencies
Given the number of variables and minute changes that can have large effects, a stopping point for the optimization can be any of the following:
- Maximum safe DRAM voltage reached
- Maximum BCLK reached, or an incrementally higher BCLK is unstable despite all other parameters being optimized.
- A maximum RAM frequency is reached (for DDR3; DDR4 doesn’t have a ceiling as of yet)
- The memory ICs refuse to boot because of thermal or overvoltage damage
We can also go in the opposite direction. You can underclock the memory for improved power savings; lower voltage will lower power consumption a bit, although the clock rate has a far bigger impact.
RTL And IOL: Raw RAM Performance Metrics
Advanced overclocking begins with looking at the memory configuration's real latency. RTL and IOL values can be used as fitness scores to optimize memory performance; the lower the better. Also, these figures are given in clock cycles, but they really should be converted to real time values to get a good idea of the actual latency (the clock frequencies may change).
IOL, the Input Output Latency (also referred to as I/O Latency) is the time it takes the chip to send a response after the query comes in.
RTL, the Round Trip Latency, is the length of time it takes for a signal to be sent to the memory, plus the length of time it takes for an acknowledgment from memory of that signal to be received (the complete round-trip time taken for a signal to be transmitted from point A to point B and then back to point A). RTL values are not always directly (manually) accessible, but they are a function of the IMC frequency, tCL and clock skew (mismatch) between the IMC and DRAM clock frequency. Our sister site Anandtech extrapolated a formula for correctly predicting the RTL values. Of course, their formula requires some motherboard-specific modification, mostly based on the board layout, specifically the distance from the DIMM slots to the CPU.
These values may be directly accessible on some motherboards, in which case it makes sense to set them to some static “good” value to minimize instability issues, which have the potential to increase as more parameters—in addition to the BCLK/Multiplier/Primary timings—are tweaked. Where the RTL initial value and IO latency Offset are manually accessible, the easiest way to tighten the RTL/IO numbers is to set the RTL Initial Value to the lowest number that allows the memory to POST, then, after the system has booted successfully, to set the IO Latency Offset value higher one cycle number at a time (iteratively) until the lowest RTL/IO values are found after a reboot.
There are multiple clocks in a system (CPU, IMC, memory, etc.), with a variable tick/tock initialization from start-up to start-up, a wide variety of signal pathways and variable environmental parameters, all of which combine to create a disparity (a skew) in the real arrival time of various signals at their destination. The pre-boot DDR calibration sequence introduces various delays between signals in order to achieve synchronicity. This is where DDR training kicks in; there are a number of patterns (either preset/provided by vendors, or custom-made) that test various signal/delay sets for the best possible ranges of these values. The accuracy of these delays determines the RTL/IOL, and ultimately influences memory performance. Since RTLs and IOLs are set at boot, training has a very real impact on the CAS latency.
Fast Boot settings either skip the memory training entirely, or use a very rough-and-ready form of training. While this is good enough for normal purposes, the best possible training sequence (determined from literature, or comparing the RTL/IOL values resulting from using each test, or in the absence of additional data, using the sequence that takes the longest time) should be used when fine-tuning memory parameters or benchmarking, because the variable signal/delay accuracy from a sub-par training regime makes parameter comparison questionable. Still, if enthusiasts are looking for moderate increases in memory performance, this step is generally optional.