Bandwidth and Latency:
Frequently Asserted Objections 1 and 2
This post is the first of a series that will deal with the most frequently asserted objections (FAO) to Rambus and RDRAM.
Because of the narrow 16 bit data path used by RDRAM, the bandwidth of PC800, 1.6 GB/s, only equals 200 MHz DDR, and is less than 266 MHz DDR (2.1 GB/s).
RDRAM has 25% higher (or 50% higher, or twice as high...) latency as SDRAM.
The existence of a bandwidth "issue" is a triumph of the clever use of syntax to frame a topic. RDRAM has the elegant capability to easily use any multiple of a 16 bit data path (18 bits for error correcting code, ECC). The PlayStation2 and the Nintendo64 use a 32 bit data path, and thus have a peak bandwidth of 3.2 GB/s. The Pentium 4 also uses a 32 bit data path, and thus has a bandwidth much higher than 266 MHz DDR, while using a data path half the size. When Compaq releases the successor to the RISC computational champion, the Alpha, it will use RDRAM with a 128 bit data path (yielding a whooping 12.8 GB/s peak bandwidth). Actually, 12.8 GB/s RDRAM has already been introduced by Sitera, a part of Vitesse, and used in their IQPrism (1). The ultrahigh bandwidth of RDRAM used in the 128 bit data path mode has led to much speculation that future communications devices meeting the OC-192 standard will require RDRAM (2). Because the minimum RDRAM data path, 16 bits, is called a "Rambus channel", a 32 bit data path is "dual channel" RDRAM, etc. It is exactly the same RDRAM, however.
In essence, a clear virtue, the ease with which the data path stream of RDRAM can be adjusted, has, by clever framing of the issue, been turned into a supposed Rambus deficiency. Of course having an option is an advantage. For any given data path width chosen, however, RDRAM has by far the higher bandwidth. Even phone lines with 56K baud can be used to get the bandwidth of a single cable modem line, if you run 100 phone lines into a house. This is not a cost effective technique. Likewise, dozens of coaxial (copper) cable lines can produce the bandwidth of a single fiber optic line, but the latter is preferable in very high bandwidth situations. The size of the data path greatly affects the cost of the overall system. In particular, a data path of 128 bits (which advocates call "dual DDR", perhaps to hide its 128 bit complexity), would be quite expensive, if not actually impossible, in a desktop PC motherboard. The RDRAM versus DDR cost issue has to be treated within the context of total system cost. A future FAO will address total system costs.
The latency issue is an excellent example of why, when guessing the correct answer on a multiple choice test, the percentage play is to pick the longest answer. The correct answer usually involves more nuances and hedges than the wrong ones do. Simple answers are often given about RDRAM latency versus SDRAM latency, but they are misleading at best. For example, Samsung (3) gives nice simple numbers showing RDRAM to have lower latency than DDR or SDRAM. Hyundai also has a simple answer, but with the opposite result. In fact, latency depends on a number of factors, most importantly system load (4).
RDRAM has a variety of power saving modes (Intel will probably introduce a laptop using RDRAM next year -- not for performance reasons, but to save power). When a page of RDRAM memory has not been addressed for a long enough time, it drops to a very low stage of alertness. The high latency Hyundai proclaims for RDRAM comes from this effect. (Incidentally, although the Intel P4 motherboard does not offer his option, some motherboards by third parties will probably offer the option of keeping the RDRAM always in a high state of alertness).
Latency is not independent of load, however. The higher the load, that is, the more frequent requests to memory are, the more likely conflicts are likely to occur between address lines. Part of the higher cost of RDRAM stems from the larger number of banks used, reducing addressing conflicts. Moreover, when the CPU changes from reading to writing, SDRAM (including DDR) has several dead cycles. RDRAM can change between reading and writing in a single cycle. Thus this transition creates "dead time" in SDRAM that is not present in RDRAM.
Thus under extremely light load conditions, RDRAM has a modestly higher latency (although this can be altered at the expense of increasing power consumption). Under heavy load conditions, the latency of SDRAM deteoriates rapidly. RDRAM holds up quite gracefully under heavy usage. Of course, by definition, under light usage, few access to memory are being made, hence latency (or memory performance in general) is unimportant. Under a heavy load, where memory performance is crucial to CPU performance, RDRAM has far lower latency than SDRAM.
The high latency of DDR under load, along with the clock cycles lost every time a switch is made between writing and reading or vice versa (5), greatly restricts the effectiveness of the memory. It is typically estimated that RDRAM in an ideal system could exploit about 90% of the nominal peak bandwidth, while DDR (and other forms of SDRAM) are restricted to about 60% of the nominal peak bandwidth. Thus, for a 32 bit RDRAM data path, effective memory utilization is about twice as good as 266 MHz DDR, instead of just the 50% one would estimate from the nominal bandwidths. Now that real world memory bandwidth scores are available for both the P4/RDRAM and 266 MHz DDR/760 system (actually using a form of low latency DDR, CL=2.5, unlikely to be widely available), results from Sis Soft Sandra (5) and other results amply confirm this RDRAM superiority. In fact, in actual systems, running a battery of real world programs, the 32 bith path RDRAM has up to 3 times the effective bandwidth that 266 MHz DDR on a 266 MHz bus does.
One other point about latency: it is actually the overall system latency that matters. For example, it reduces latency to synchronize the FSB and the memory, since the FSB is always ready to accept data just as the memory is available to hand it over. The i820, for example, suffered from a system latency because of the mismatch between the FSB and memory speeds that was not intrinsic to either theFSB alone or the memory alone. (This is why the i820 beat the non-synchronized products it was first compared to, but lost to Tom Pabst's overclocked BX440, and later to the i815. Both of the latter are sychronized.) Thus when Tom Pabst mentions that the 400 MHz FSB of the i850 is in "perfect harmony" with the 800 MHz RDRAM (which is actually 400 MHz DDR -- all RDRAM is also DDR) he has a point. Any 266 MHz DDR system coupled to the 400 MHz FSB of the P4 would have several major problems. Besides the lower bandwidth and higher pincount, the non-commensurate speeds would introduce an additional system latency into the DDR P4 product. The above exposition is likely too dry for many readers, so let me conclude with a sports analogy. Some players are very good at racking up statistics during "junk time" play, when nothing is on the line. When the pressure mounts, however, these players may wilt. RDRAM is a pressure player: the more heavily memory is stressed, the better it looks compared to the alternatives.
DDR is the king of junk time. Its latency looks good, so long as memory accesses are rare. Under pressure, DDR, like other forms of SDRAM, wilts rapidly.
This is all very interesting. But one question bothers me greatly. Everyone knows the tremendous demand for high memory bandwidth and low latency in the graphics cards market, and how competitive it is. So why has no one yet tried to produce an RDRAM graphics card? There must be a good reason.
So far, in many benchmarks, where the application used is not biased, as in not being SSE2 optimized nor AMD optimized, AMD's 1.3GHZ CPU and DDR-RAM beats the P4 RDRAM combo by quite a gap. I wonder, then if RD-RAM is able to transfer such high bandwidth between the processor and the RAM, then why isn't the 1.7GHZ, which already has a .4Ghz advantage, beating the 1.3GHZ Athlon?
That was excellent, well done, Raysonn. I now have the proper respect for RDRAM that it deserves. Benchmarks are not enough to prove real time performance. I think the real reason that RDRAM had such a poor reception and continuing bias was the price: (2) 128MB modules was like $1000 when first introduced.
A company like Intel could not support such an important component, so closely related to the success of their own pruducts (CPU's and chipsets), based on marketing tricks and computer industry politics. They still have I believe 80% of the market.
Therefore, RDRAM is the best and the future of memory technology until the next major development.