
If you are reading this article, there is a good chance that at some point, you have found yourself wondering about what you should do to maximize memory performance. Should you go for memory that runs with tight timings and average frequency, or for relaxed timings at high clock frequencies? In fact, you might ask whether the whole timings vs. clock speed discussion affects performance at all?
From a memory requirement standpoint, a pair of 1 GB memory modules is undoubtedly the best choice for a high end computer today. The main purpose of a comfortable RAM size is to offer sufficient memory for your applications, and to prevent Windows from swapping application data to the swap file on your (relatively slow) hard drive. In this context, neither the memory type nor the speed is as important as simply having enough. The only thing we recommend in terms of memory size is to use two modules, to allow you to run in high performance dual channel mode.
However, the current 1 GB DDR products don't support the tight timings or high clock frequencies of 512 MB DIMMs, so your choices between timings and clock frequencies are somewhat limited. Looking at DDR2 memory, which is going to dominate by the end of this year, the situation gets even worse. Here, memory timings are further relaxed for the sake of bumping up memory clock speed.
Even if you don't understand what memory timing parameters actually stand for, it is still easy to understand that smaller cycle times represent quicker operation. For example, DDR memory runs at speeds between CL2-2-2-5 and CL3-4-4-7 clock cycle settings, while DDR2 varies from CL3-2-2-8 to CL5-5-5-15. The latter effectively doubles the latencies when compared to quick DDR1 timings, which is why DDR2 memory must run at considerably faster clock speeds to outperform DDR1.
We picked an AMD Athlon 64 system for our detailed memory performance analysis, because the memory controller is part of the processor and is thus sensitive to memory speed and timing changes. Our intention is to show the performance differences when changing the following:
- CPU clock speed: from 1450 MHz to 2610 MHz;
- Memory clock speed: from 200 to 290 MHz; and
- Memory timings: from CL2.0-2-2-5 to CL3.0-4-4-7
We'll also examine performance scaling at different clock speeds and timings.
The following table shows the two most popular types of system memory chips for enthusiast systems, and their typical specifications:
| Popular Enthusiast Memory Types | |||
|---|---|---|---|
| Memory chips | Timings | Clock frequencies | Voltage |
| Winbond BH-5/BH-6/UTT | CL1.5 or 2.0-2-2-5 | 400-500 MHz | 2.5 - 3.8 V |
| Samsung TCC5/TCCD | CL1.5 or 2.0-2-2-5
CL2.5 or 3.0-4-4-7/8 |
400 MHz
500-600 MHz |
2.5 - 2.9 or 3.0 V |
As you can see, there are some differences between system memories depending on which chips are used. Samsung TCC5/TCCD could be described as flexible when compared to the Winbond chips, as they will work at low clock frequencies and tight timings, as well as at high clock frequencies and more relaxed timings. However, Winbond chips will reach higher clock frequencies than Samsung chips if you are dead set on using 2.0-2-2-5 timings; this assumes that you have a motherboard capable of supplying them with the voltage they need.
In practice, you might see 2x 512 MB Winbond memory reaching 250 MHz or as high as 270 MHz at CL2.0-2-2-5 timings, while 2x 512 MB Samsung memory might reach 300 or slightly more at CL2.5-4-3-7, CL2.5-4-4-8 or CL3.0-4-4-8.
Test Memory: G.Skill F1-4400DSU2-1 GBFC
Since we are comparing the relative impact of latency and clock frequency increases, we decided to use memory based on Samsung's TCC5/TCCD. This memory will allow us to use a wider range of settings.
Taiwanese memory maker G.Skill supplied us with two 512 MB sticks of its fairly new F1-4400DSU2-1 GBFC product. It is rated at PC4400 (DDR550) and CL2.5-4-4-8 timings, and is the company's attempt to bring enthusiast performance and overclocking to the mainstream market.
Performance will obviously vary from module to module, and will also depend on the motherboard, BIOS version, BIOS settings and CPU used. In our case, stable operation was possible in the range from 2.0-2-2-6 at 203 MHz to 2.5-4-3-7 at 300 MHz.
| Processor | |
|---|---|
| Socket 939 | AMD Opteron 165
Denmark core (1800 MHz, 2 x 1 MB L2 Cache) |
| Motherboard | |
| Socket 939 | DFI LANParty UT SLI-DR Expert
Rev. A3 BIOS: 12/07/2005 Chipset: Nvidia nForce4 |
| Memory | |
| DDR | G.Skill F1-4400DSU2-1 GBFC
2x 512 MB DDR550 (275 MHz, CL 2.5-4-4-8 1T) |
| Common Hardware | |
| Graphics Card (PCIe) | Point of View 7800GT
GPU: Nvidia GeForce 7800GT (480 MHz) Memory: 256 MB GDDR3 (1180 MHz) |
| Hard Drive I | SATA Western Digital WD740
74 GB, 8 MB Cache, 7200 RPM |
| Hard Drive II | SATA Maxtor Diamondmax 10
300 GB, 16 MB Cache, 7200 RPM |
| Software | |
| Nvidia nForce 4 SLI | Forceware 6.70 |
| Nvidia Graphics | Detonator 81.98 |
| DirectX | Version: 9.0c (4.09.0000.0904) |
| OS | Windows XP, SP2 |
Benchmarks And Settings
| Benchmarks and Settings | |
|---|---|
| DirectX 9 | |
| 3DMark01 SE | Build 330
Resolution: 640x480, 32 Bit (to eliminate any graphics bottleneck) Default Benchmark Run |
| Synthetic | |
| SiSoftware Sandra 2005 | Version 2005.10.10.69 SR3
Memory Bandwidth Benchmark |
| Everest Ultimate Edition 2006 | Version 2.50.480
Memory Latency Benchmark |
| Applications | |
| SuperPI | Version: 1.1
8M |
As most of you are probably aware, the MHz clock speed of the Athlon 64 and Opteron is the product of the HyperTransport speed (HTT) times the multiplier of the CPU. In the case of the Opteron 165 we used, the default multiplier is 9x and the default HTT speed is 200 MHz, resulting in a CPU speed of 1,800 MHz. The multiplier is only adjustable downwards, while the HTT speed can be adjusted both up and down. There is also a third adjustable factor: the memory speed divider. By default this is set to 1/1, which means that the memory clock frequency will be the same as the HTT speed (200 MHz system base speed equals 200 MHz DDR memory base clock).
Increasing the CPU multiplier would be the easiest way to overclock a processor, but it is factory locked for values higher than the default setting, so you can't. Thus, to overclock, you are going to have to increase your HTT speed. If you increase your HTT speed from 200 MHz to 300 MHz, and lower your CPU multiplier to 6 at the same time, you still get a CPU speed of 1,800 MHz. If your memory divider remains set at 1/1, this means that your system memory will also run at 300 MHz instead of its default speed of 200 MHz. Instead, you should set a memory divider of 2/3, which will bring the memory clock frequency back down to 200 MHz. You pretty much achieve nothing in doing this, but it shows that overclocking is quite flexible nowadays.
As you can see, overclocking your system memory (and not your CPU) requires an increased HTT speed together with a decreased CPU multiplier. Then, all you have to do is play around with memory speed divider settings.
There is one obstacle in the memory scaling benchmark effort, though. If you intend to increase your memory frequency by a small amount - for example, from 200 MHz to 205 MHz - you will have to increase the HTT base clock by this amount. This also means that your CPU clock speed will increase by the clock speed gain (5 MHz) times the CPU multiplier. In this case, CPU clock will rise from 1800 MHz to 1845 MHz.
That makes things a bit more complicated for us. We don't want varying CPU speeds while we are analyzing memory performance, because it will have an impact on system memory performance results. Unfortunately, there is no way to isolate different system memory speeds and settings as the only factor impacting benchmark results.
We selected some benchmarks that are particularly dependent on CPU speed and system memory speed. We then ran them while changing only the CPU multiplier, isolating CPU clock frequency as the sole varying factor.
As mentioned, the default speed for the Opteron 165 used in this article is 1800 MHz. It is, however, stable at speeds of up to 2600 MHz with a core voltage of 1.5 V, running a HTT speed of 290 MHz and the default multiplier of 9. The CPU probably has even more room to overclock if given better cooling.
We set the HTT base speed to 290 MHz and then changed the multiplier for every new test. The memory speed divider was kept at 1/1, eventually resulting in a 290 MHz memory clock frequency with CL2.5-4-3-7 timings.




We were not surprised to see that SuperPI and 3DMark01 are CPU speed dependent, but were hoping that Everest and SiSoft Sandra would care more about the consistent memory speed than about the changing CPU speed. In an attempt to increase the importance of system memory and decrease importance of CPU speed in the benchmarks, we disabled the L2 cache in separate runs. This resulted in roughly the same level of variance, though naturally with lower performance numbers.
By looking at the graphs, one could easily be fooled into believing that the effects of increasing the CPU clock frequency are bottlenecked by the 290 MHz memory speed at higher CPU speeds, because the performance increase shrinks as the CPU clock speed goes up. Keep in mind, though, that the CPU multipliers used are 5, 6, 7, 8, and 9, which gives us actual clock frequency increases of 20%, 16.6%, 14.3% and 12.5%. In SuperPI, the resulting performance increases are 19%, 15.6%, 13.5% and 11.9%, so things do scale perfectly.
As for the Everest memory latency tests, the results have at least two different explanations. The latency at 2030 MHz was faster than expected, but we can't exactly nail down why this is the case. During testing, we ran into a number of irregular benchmark results when using certain multipliers and dividers, which perhaps has something to do with the asynchronous speeds sometimes resulting in better results. It could also be that the system memory actually is limiting the results, and that even if we could increase the CPU speed further, we might not get any lower latency out of the system than 37-40 ns.
Now that we know how much of an impact the CPU clock frequency has in our benchmarks, the next logical step is to find out how the memory clock frequency affects the results. We used CL2.5-4-3-7 timings and changed only the memory speed divider settings throughout the benchmark runs. For this test, disabling the L2 cache actually did produce some interesting results, so we decided to include those graphs.





As you can see, increasing memory clock speed actually helps squeeze the maximum performance out of the system. However, in SuperPI and 3Dmark - which reflect real world performance - we see tiny 5.2% and 3.5% performance increases, respectively, from a big 45% increase in memory clock frequency. Bear in mind that this is 3DMark2001, which, in contrast to newer versions of 3DMark, actually is somewhat bottlenecked by CPU speed or even memory speed when used with a new, yet reasonably-priced, graphics card. This will not necessarily be the case with modern games or benchmarks.
Also, remember that we used the same timings for all memory speeds. This means that if you end up in a situation where you are choosing between DDR400 and DDR600, the effective difference will actually be quite small, since DDR400 can run tighter timings. Exactly how much of a difference this makes we'll examine shortly.
Now let's take a look at how performance scales when changing the system memory timings.
The tests were performed with a 290 MHz HTT base speed and a 2/3 memory speed divider, resulting in 193 MHz memory clock - this equals DDR386. This speed is almost as comparable to 290 MHz (DDR580) as DDR400 is to DDR600.




The different timing settings produce nice scaling results in SuperPI, whereas in 3DMark01 the middle three results pretty much just represent normal benchmark run variations. Everest results look normal, and SiSoft Sandra doesn't really seem to care about timings.
What's of most interest here is to compare the most relaxed timings to the tightest timing settings. When keeping the timings and changing the system memory clock frequency from 200 to 290 MHz, we saw 5.2% and 3.5% better performance in SuperPI and 3DMark01. Now, when we're keeping the clock frequency at 193 MHz and changing the timings from CL3.0-4-4-7 to CL2.0-2-2-6 - settings you probably would use when running DDR600 and DDR400 respectively - we see a difference of 5.1% and 5.7%. Interestingly enough, the difference between the best and the worst times in SuperPI is 19 seconds in both cases.
It seems as if 3DMark01, at least, reacts worse to relaxed timings than to slow system memory clock frequencies. In any case, CL2.0-2-2-6 @ DDR386 got outperformed by CL2.5-4-3-7 @ DDR580 in all of our tests. The synchronous HTT/memory speeds when using DDR580 might play a role as well.
These timings can only really be reached at that speed when using 2x 512 or 2x 256 MB modules. For 2x 1 GB modules, if you can find any that actually run at these tight settings, you will have to use CL3.0-3-3-7, or more likely, CL3.0-4-4-8.
In an attempt to project how memory at tight timings (Winbond) compares to memory at relaxed timings and higher clock frequencies (Samsung), we ran the benchmarks using CL2.0-2-2-6 and CL3.0-4-4-7 at different clock speeds. We experienced very strange performance scaling when using a 9x CPU multiplier and different memory speed dividers. The results didn't form a curve, so we had to settle with a 7x CPU multiplier, resulting in a CPU speed of 2,030 MHz.



We see here that at the different tested memory speeds, tight timings outperform relaxed timings by 3.2% to 4.1% in SuperPI, and 2.3% to 4.4% in 3DMark01. Unfortunately, our CPU speed of 2030 MHz is acting as a bottleneck when the memory frequency is increased, which takes away the significance of tightening the timings.
At a 1450 MHz core clock and 193 MHz memory frequency, the difference between tight and relaxed timings in 3DMark01 is 1.9%; at 2030 MHz it is 2.3%, and at 2610 MHz it is 5.7%. It's not unreasonable to then guess that if we were to double the increase in clock speed from 2030 to 3190 MHz (instead of to 2610 MHz), we might see a difference in the area of 8-12%, if the memory speed were kept the same.
On the next page you will find a scaling trend curve based on both the results above and the ones we presented earlier in the benchmark section.

Based on a few assumptions, 270 MHz @ CL2.0-2-2-6 outperforms 320 MHz @ CL3.0-4-4-7 by two seconds in SuperPI 8M. Read more below.
This is what the scaling performance in SuperPI could possibly look like - it is purely hypothetical, of course. The graph above requires an explanation, and please don't be too picky, since it's not very scientific. Also, performance will probably be much worse at lower clock frequencies than this graph shows.
We used Excel to generate a logarithmic trend curve based on five real measurements with both relaxed and tight timings. We tested at 145, 169, 184, and 203 MHz - the highest we could reach with both timing sets. Using relaxed timings, we also got a measurement at 290 MHz.
We then added a theoretical tight timing measurement at 290 MHz, calculated from the 4.6% better performance at 290 MHz compared to 203 MHz at high timings that we measured at a 2030 MHz CPU speed. The theoretical 290 MHz measurement with tight timings could be both lower and higher, depending on how much of a bottleneck the CPU clock frequency turned out to be, as also noted earlier.
Based solely on the first scaling results, it is more likely that the advantage of using tight timings will be less than 4.6% at this low CPU clock frequency.

Basing the equation only on the real measurement points, at 290 MHz the difference would only be 2.1%, and 320 MHz at CL3.0-4-4-7 would outperform 270 MHz at CL2.0-2-2-6 by 3 seconds. We think that the measured results are too far away from these high speeds to say that for certain with just four data samples, though.
At higher CPU clock frequencies, the importance of memory speed is larger both in SuperPI and in 3DMark01. In SuperPI, the gain from tight timings (at 203 MHz memory clock) was 3.2% at 2030 MHz CPU clock, and 5.1% at 2610 MHz. In 3DMark01 the gains were 2.3% and 5.7%.
The gains when going from 203 to 290 MHz were 4.6% at 2030 MHz and 5.2% at 2610 MHz in SuperPI. For 3DMark01 the gains were 3.5% and 5.7%. See the graph below for a better overview.

If we were to draw any conclusions from this graph, it would be that SuperPI prefers tight timings, while 3DMark01 prefers high clock frequencies.
We conclude the following from our testing:
- There are very small real-life differences in performance between low clock frequency/fast timing Winbond memory and high clock frequency/relaxed timing Samsung memory. This is true in 3DMark01 and SuperPI even for CPU/memory intensive applications.
- To accurately answer the question we asked earlier in this article - namely, whether to go for tight timings or high clock frequencies - one should conduct the tests using a very fast CPU to eliminate bottlenecks. It is our opinion that even our overclocked 2610 MHz dual core Opteron wasn't really fast enough to do more than hint at a possible victory for tight timings at even higher CPU clock frequencies.
- Given the two facts above, our conclusion must be that our testing using only Samsung memory and extrapolation, instead of comparison to actual Winbond memory, does not result in data accurate enough to give an entirely foolproof answer. According to our calculations, the difference between CL2.0-2-2-6 at 270 MHz and CL3.0-4-4-7 at 320 MHz in SuperPI 8M swayed from 0.7% to 0.5% in both directions, depending on how performance scales.
- Testing indicated that tight timings become more important as CPU clock is raised. This could potentially lead to Winbond memory performing 1-2% better than Samsung memory at Athlon 64/Opteron CPU speeds over 3 GHz. Of course, Winbond is no longer an active player on the DDR1 market, so getting a hold of this kind of memory is so hard that the issue becomes moot.
- Depending on CPU clock frequency, tight timings have a performance advantage over relaxed timings in CPU/memory intensive applications, ranging from 2% at 2 GHz to 6% at 2.6 GHz.
- When leaving the timings untouched at a CPU clock of 2 GHz or 2.6 GHz respectively, DDR600 performs 2% or 5% better than DDR400 in CPU/memory intensive applications.
- For Samsung TCC5/TCCD memory, like the G.Skill F1-4400DSU2-1 GBFC used for this article, DDR600 at medium timings outperforms DDR400 at tight timings. DDR600 at relaxed timings performs about the same as DDR400 at tight timings (+/- 1%).
Final Conclusion

Unfortunately, we are limited in the conclusions we can draw from this article. We could of course follow up on the article by testing with a faster CPU and some Winbond memory. However, Winbond's high voltage memory chip series is discontinued, the performance difference seems to be miniscule, the transition to 2x 1 GB memory sticks is in progress, and AMD will soon introduce DDR2. Given all of these factors, it wouldn't make much sense to bother.
For the majority of users, the 1-2% benefit in performance when comparing Winbond to Samsung memory will go unnoticed. In fact, we would not advise anyone to spend a lot of money on highest end memory in the hope of improving computer performance by increasing memory speed. As noted earlier in the article, keeping the timings unchanged and at a steady CPU clock of 2.6 GHz, DDR600 performs 5% better than DDR400 in CPU/memory intensive applications. These are very weak gains given that there is a 50% increase in memory speed, and these gains are even smaller at lower CPU speeds. In modern games, which are mostly limited by the graphics card, the performance increase would be zero, as even big changes in CPU speed can go by unnoticed.
The bottom line is that as long as you have enough memory - preferably 2 GB - the extra money you pay for more memory speed would be better invested in a faster graphics card. And if you don't play games, then the CPU and hard drive offer more room for improvement than the memory.