how to best use pc100 cl2 ram?
I have several sticks of 128 MB pc100 ram, some cl2 and some cl3 (and some i'm not sure). I ran FutureMark (MadOnion)'s PC2002 benchmark using each memory stick separately and the memory results for cl2 vs. cl3 differ by only about 2% (e.g. 999 vs 984). Is there something I can do to take better advantage of cl2, like some BIOS setting? Currently I have it set to the default, which I believe reads SPD information off the chip.
Hi. Will this help?. Check out this site.
Table Of Contents
LostCircuits BIOS guide
What You Never Wanted To Know But Constantly Dared To Ask
(by MS, Timeless)
Advanced Chipset Features
Memory timing and performance settings
In most cases, the Advanced Chipset Features start out with the DRAM Timing settings. Most manufacturers have carried over this field from the days of fast page and EDO memory which are still supported by some of the VIA chipsets, but cannot be run on any of the modern boards since the higher voltages for EDO are no longer supported by the power supply circuitry, even though some manuals state otherwise. Essentially, in almost all cases the entries use a completely outdated terminology and, further, are either redundant or counterintuitive in that they suggest faster settings where, in fact, the opposite is the case.
Needless to say that there is quite a bit of confusion regarding this field but I'll try to dissect the different settings as well as to give an explanation of their functional significance. Before going into details, it is necessary, however, to give a little background information about the parameters that are contribute to the DRAM timing as a general topic.
A short intro to SDRAM, penalty cycles and latencies
SDRAM (same as DDR) is not infinitely fast. DRAM consists of capacitors, gating transistors and bitlines and wordlines (data lines and address lines). Capacitors and lines need to be precharged and the address strobes need time to lock into the correct position in order to retrieve the data.
1. The typical cycle starts with a bank activate command that selects and activates one bank and row of memory through the input pins.
2. During the next cycle, the data is selected onto the data (or bit) lines and moves towards the sense amplifiers.
3. When the data bits reach the sense amplifiers the data is latched by an internal timing signal.
4. This process takes length of time called the Row Access Strobe to Column Access Strobe delay (RAS to CAS delay) with a latency of usually two or three cycles.
5. After this delay, a read command can be issued along with the column address to select the address of the first word to be read from the sense amplifiers.
6. After the read command there is a CAS delay or latency while the data is select from the sense amplifiers and clocked to the output pin. The CAS latency is typically 2 or 3 cycles. Once the data is released to the bus, another word is output every cycle until the data burst is complete.
7. Only after all the information has been output, the data can be moved back from the sense amplifiers to the row of cells to restore its contents. Movement of the data back to the empty cells again takes 2 to 3 clock cycles.
8. Depending on the leaking or bleeding of the memory cells, the quality of the charge may have to be restored during a so-called refresh cycle. The need for a recharge is determined by a refresh controller whereas the actual process of refreshing is monitored by the refresh counter. This refreshing requires additional 7-10 clock cycles during which the data flow is interrupted and, thus results in a performance hit.
The typical timing settings that decide over performance are:
SDRAM CAS Latency, CAS-DL; often called DRAM cycle time (n cycles): number of cycles the column address strobe needs to select the correct address.
SDRAM tRAS-to-CAS Delay, tRCD; often described as Bank X/Y DRAM timing: (n cycles) number of cycles from when a bank activate command is issued until a read or write command is accepted, that is, before the CAS becomes active. In other words, after a bank activate command, the RAS lines need to be precharged before a read command (specifying the column address) can be issued. This means that the data need to be moved out of the memory cells into the sense amps which takes somewhere between 2 and 3 penalty cycles. It is important to know that tRCD only plays a minor role in the overall penalty since most reads occur as page hits, data are read out of a page already open (but see below). Unfortunately, in most BIOS, tRCD is not directly accessible, at least not under its real name but is hidden in the Bank X/Y DRAM Timing field.
SDRAM SRAS Precharge Delay: tRP (n cycles) necessary to move the data back to the cell of origin (close the bank / page) before the next bank activate command can be issued.
The weigthing of different latencies
Somewhere between 30 and 60% of all read requests fall within the same page (or row) which is called a page hit. In this case, there is no need for the bank activate and tRCD, the data are already in page and the only thing that needs changing is the column address via the Column Address Strobe. Therefore, the CAS-latency becomes the most important factor in the performance of the main memory subsystem.
If the data requested are not found within the same page, the data need to be moved back to the memory cells and the bank will be closed. There are two different cases to be considered.
Either, the data are located in the same bank but in a different row, in this case, a precharge command needs to be issued, the bank will be closed within two or three cycles (tRP) and a new bank activate command will open the correct row (tRCD). Subsequently, a read command will select the correct column address (CAS delay). In this case, the full number of penalty cycles for CAS-DL, RAS-to-CAS and precharge will pass until the next data are output. In the case of a 2:2:2 DIMM, this will mean 6 penalty cycles, with a 3:3:3 part, the latency will increase to 9 cycles
If the requested data are located in a different row, it is not necessary to wait for the first bank to close and, thus, tRP can be skipped. Consequently, the latency only comprises tRCD and CAS-DL. Precharging of the first bank (closing the page) can then occur in the background of RAS-to-CAS delay of the second bank.
It does get a bit more complicated than that. If data are contained within the same bank but in a different row, the bank needs to be closed and reactivated. In this case, the bank cycle time SDRAM tRC becomes a critical factor since every bank has a minimum time that it needs to stay open.
=> CAS, RAS-To-CAS, Precharge, What Is Defined In Hardware And What Is Not =>
All advice and educational articles on LostCircuits are free, but if you feel you can, please make a small donation to us!
Bottom of Form 1
General disclaimer: This page only reflects the author's personal opinion and assumes no responsibility whatsoever regarding any of the contents or any damages that may occur explicitly or implicitly from reading the contents of this site. All names and trademarks mentioned in this review are the exclusive property of the respective parent companies.
All contents of this site are protected by international copyright laws. Reproduction of the contents even in parts is not allowed except after written permission by the author and referral to this site.
Copyright 2002 LostCircuits
Table Of Contents
LostCircuits BIOS guide
What You Never Wanted To Know But Constantly Dared To Ask
(by MS, Timeless)
Memory Latencies And Their Hardware Basics
Memory latencies can be varied depending on the performance of the DRAM chips used. CAS, tRCD, tRP, most BIOS settings have means to increase or lower the latency cycles. Typical numbers in SDRAM are 2 and 3. As we know, CAS-2 means that the data will be output on the second rising clock edge after a read command, CAS-3 means that data output happens on the third rising clock edge after the read command.
The same goes for tRCD and tRP, however, there are some fundamental differences between those parameters. RAS to CAS delay can be extended indefinitely and that is what happens in reality in every row access. That is, once a page is open, the data from an entire row within the memory array have been read out into the primary sense amplifiers where they are stored in form of transistor states. That means that the data are stable until the page has to be closed again. The RAS-to-CAS delay, by definition, is only of importance on the first Read Command after a page has been opened. All subsequent "In-Page" accesses can retrieve the same data regardless of how long the page was open which means that the time interval between the Row Address Strobe (RAS) selecting a page and the Column Address Strobe (CAS), in short RAS-to-CAS delay can be indefinitely long (within the refresh interval).
For CAS latency, the situation is entirely different in that the data, once they are retrieved from the sense amplifiers, are charges that are moving along the data lines towards the output pins and as such, they are very volatile. To extend the life span of the data beyond a single cycle, it is necessary to insert output buffers into the output path. The advantage is that the output buffers are pipeline stages that are in general faster than the sense amplifiers when it comes to releasing the data and furthermore, there is no more address strobe necessary to select the correct column address.
On a functional level, this has the consequence that the access time (tAC) for CAS-1 will always be the sum of the address strobe and the time it takes to spit out the data while the access time for CAS-2 and higher will only be the time required to move the data from their respective pipeline stage to the output pin on the module. Typical examples are in the order of 10 and 3.5 ns for tAC1 and tAC2,3, respectively.
The main problem with the use of intermediate buffers is that the number of pipeline stages inserted into the path will define the number of latency cycles since the data are held within each stage for one clock cycle (in the case of DDR, it can also be 1/2 clock cycle). This implies that it is impossible to run a CAS-2 part at CAS-3. We do know, however, that almost every memory module rated at CAS-2 will be able to run at CAS-3 as well. The technology to enable flexible CAS latencies is the so-called programmable CAS latency, brought to the DRAM world by none
. other than Rambus Inc.
Programmable CAS latency can be achieved by inserting bypass switches into the output path between the Sense Amplifiers (SA) and the output pins to bypass the pipeline stages P1 and P2. The illustration shows an example of an EMS HSDRAM chip capable of running in CAS-1, -2 or -3 mode. Depending on the mode register set (MRS) command issued by the controller during initialization, the switches (S1 and S2) are left open (CAS-3) to force the data through the buffers, or closed to establish a fast bypass (CAS-1). Alternatively, one single switch can be opened so that the data still have to go through one buffer but can bypass the second for CAS-2 operation.
According to the above, only CAS latencies that are established in form of pipeline stages can be programmed. This in turn means that regardless of whether a BIOS offers e.g. a CL-1.5 setting or a CL-3 setting, those can and will not work if the chips used on the modules only have the pipeline stages for CL2 and 2.5 operations
Once again, we also see rather often that BIOS settings are specified in the graphics interface but not executed on the PCI register settings. This is exploited by unserious DRAM and module vendors to claim that their modules will, for example, work at CL-1.5 which is nothing but fraud and false advertising.
The bank cycle time (tRAS) specifies the number of clock cycles needed after a bank active command before a precharge can occur. In other words, after a page has been opened, it needs to stay open a minimum amount of time before it can be closed again. tRC specifies the minimum cycle time until the same bank can be reactivated. Since a precharge has a latency of 2 or 3 cycles, Trc is the sum of tRAS and RAS precharge time (tRP). The Intel i815 chipset allows for 5T,7T and 7T,9T, that is, 7 or 9 cycles bank cycle time, that is, tRP is fixed to 2T. The VIA chipsets offer tRAS values of 5T and 6T and allows to set tRP to 2 and 3 cycles, respectively but they are generally not directly accessible but part of a coctail of settings
Most current high-end SDRAM is specified at about 50-60 ns cycle time. In turn, this means that, theoretically, at up to 133 MHz (7.5ns clock cycle), it is possible to run at a Trc of 7T (7x7.5ns=52 ns). If the clock frequency is increased, the number of cycles has to be increased, too, in order to provide the 50 ns. In other words, the theoretical limitation of the memory speed is somewhere around 183 MHz (9x6ns = 49.2ns). Interestingly, in the early revisions of the i815 chipset boards the bank cycle times were specified as [5T, 7T] and [6T, 8T] which would limit the memory bus to approximately 166 MHz.
For 100 MHz memory bus speed, in order to get best performance, the bank cycle time should be set to 5/7, for the 133 MHz memory bus, it needs to be set to 5/8 or else to 6/8, depending on how much overclocking is involved.
Why is there a minimum bank cycle time and what is tRAS violation?
After the RAS activates a bank, the data are latched onto the sense amplifiers. The signal is measured as the potential differential between two lines, wordlines and wordlines running in parallel where one of them is the signal and the other is the reference. This is not hardwired but works like line interleaving where each line can be the signal and the other one is the reference.
The sense amplifiers sense the voltage differential (the charge released by the memory cell / capacitor onto one wordline) between the charged wordline and the reference wordline and amplify it. This signal can be relatively weak but at the same time it also needs to be restored in the memory cell. This requires amplifying the signal a bit more (up to ca. 2 V). The wordlines themselves have a certain capacitance which slows down the charging up (on average around 30-40 ns).
If a precharge occurs (to wipe all the information from the wordlines for the next bank activate / row access), before the signal is strong enough to restore the original content in the memory cell, "tRAS is violated", resulting in loss or corruption of the data. Often enough, the corruption of data is not enough to crash a system immediately, however, once the data is written back to the HDD, the drive content is corrupted as well which can cause failure of the operating system or even "bad sectors" on the hard disk drive.
To summarize, tRAS is the time necessary to develop the full charge of the wordlines lines and restore the data in the memory cells before a precharge can occur. A precharge is the command that closes the page or bank, and therefore tRAS is also defined as the minimum page open time. If one adds the precharge (tRP) you end up with the total number of clocks required for opening and closing a bank, in other words the bank cycle time or tRC.
Refresh Interval (15.6 µsec)
Because capacitors are leaky, it is necessary to restore their content in 64 msec intervals, as defined by JEDEC standards. The refresh works by reading the data into the sense amps and then, without outputting them, moving them back into the memory cells. A typical memory chip contains 4k or 8k rows (4096 or 8192 rows). Opening a row will allow to refresh all cells within this row simultaneously. However, this also causes increased power consumption. This means that the best scheme for refreshing is not to refresh all rows at a time but to use an alternate refresh protocol, meaning that the best way to distribute the individual refreshes is to divide the 64 msec by the number of rows:
64000 µsec / 4096 = 15.6 µs
Consequently, a refresh command needs to occur every 15.6 µsec to service a single row. If there are more than 4k rows / chip, either 2 rows can be serviced with each command, or else, the refresh frequency needs to be doubled. Some BIOS offer the possibility to select the refresh frequency in µsec intervals. As a rule of thumb, for all current DIMMs the longest value of 15.6 µsec is adequate. As SDRAM densities will increase towards 1 GB / DIMM, it will become necessary to shorten the refresh interval since more address lines will need to be served.
SDRAM PH limit
Refreshing is, at present, an almost negligible factor (less than 1% performance hit), however, as explained above, with increasing DIMM density, refreshing will increasingly gain importance. As mentioned above, the need for refreshing originates in the fact that capacitors lose charge and, thus, the information will expire after a certain time. The same paradigm applies to an open page since the sense amps can hold the high or low (I or O) of the information only for a limited time. In order to maintain integrity of the data, because they also have to be restored to the original memory cells, it is necessary to limit the open-time of a page. Some chipsets (BIOS) offer the option of setting the page hit limit (PH-limit), in the case of the original AMD 751 Irongate North Bridge, this limit can be selected between 8 and 64 page hits before mandatory closing of the page.
SDRAM idle cycle limit
Some BIOS interfaces offer the selection of specifying the SDRAM idle cycle limit. The idle cycle limit is the number of clock cycles that a page is allowed to stay open even if there is no access. The relevance of this setting is that in cases where intermittent accesses to the cache are made or else, the CPU does not issue any read requests, the controller is still able to go back to the same page even after some idle cycles. Whether setting the idle cycle timer shorter or longer results in more performance, depends on the application.
In Server-specific applications where random accesses prevail, a close-page policy is always of advantage, meaning that the idle counter should be set as short as possible.
In applications that are using data streams, an open page policy will yield the highest performance, that is, the counter should be set to a high value. There are trade-offs if the counter is set too high since it may interfere with DRAM refresh. Empirically, we have found that a value between 16 and 64 cycles will give the highest performance in e.g. gaming applications
Older DRAM chips with limited density, that is 16 Mbit or less used a single block of DRAM array inside their chips. With the migration to 32 Mbit, this array was split into two internal blocks or banks and all DRAM chips of 64 Mbit or higher are composed of four internal banks or blocks of DRAM that are separated by the central routing of the core logic and I/O traces.
The splitting of the banks into four quarters also allows to use the so-called "bank-select" pins on the DRAM chips open all four internal banks which means that on a per chip basis four pages can be open at any time. This, in turn, allows access to 4 times as many data without having to change the actual address of the data's location (the row and column addresses are shared between all four blocks). As a consequence, the controller can jump from one internal bank to the next in order to get the next set of data. This is called bank interleaving and has the additional advantage that each internal bank can be closed while data are still being output from the other banks. As a consequence, there will be no precharge latency in case of a page miss. Opening all four internal banks cannot be done at the same time, though, there is a one-cycle delay for every bank to which a bank activate command is issued. On the other hand, if the controller knows that the next set of data is going to be in a different bank, it can issue read commands to the next location without trashing the first bank's data burst. This way, there is the possibility to hop from one bank to another with only one penalty cycle (bank-to-bank latency) between four word bursts. In addition, as already mentioned, precharge or bank closing can run in the background of readouts from alternating banks. Settings supported are:
2-way interleaving (data are toggled between 2 banks, not applicable for most modern DRAMs)
4 way interleaving (data are toggled between 4 banks)
Note that many BIOS interfaces still call it 2-way or 4-way interleaving which is blatantly wrong parlance. Moreover, a correct BIOS implementation should be able to read the memory's SPD and determine the DRAM size, and then offer the appropriate setting. In addition, it is hardly conceivable that there are any DIMMs in current circulation that are built on 32 Mbit technology, the maximum density of such a DIMM would be 64 MBytes.
Note also that with the next step in DRAM density, that is 1 Gbit chips, the die will be split into eight internal banks which then will enable 8-bank interleaving.
As nice as interleaving sounds, only streaming applications really take advantage of this feature. Specifically, any application heavily depending on the CPU cache will not be able to benefit from 4-bank interleaving, for the simple reason that the pages may have expired by the time the data from the cache are exhausted. In this case, bank interleaving may even cause a performance hit since a wrong bank may be open and must be closed before the next data access.
CAS-Delay (CAS latency, in most VIA chipset BIOS falsely described as DRAM cycle time)
In most BIOS, CAS latency has its own separate entry and, the lower the latency is, the better will be the performance. CAS latency is the most important performance parameter in the memory subsystem for the reasons outlined above.
SPD (Serial Presence Detect)
In order to standardize the SDRAM interface, the so-called Serial Presence Detect was introduced in 1998 as part of the PC100 specifications hammered out by Intel. The SPD is a small EEPROM located in the corner of the DIMM's PCB and contains all necessary specifications of the DIMMs including speed of the individual components as CAS and bank cycle time as well as valid settings for the module and the manufacturer's code. In addition, parameters like the width (64 bit for standard, 72 bit for ECC) are stored in the SPD chip. The SPD enables the BIOS to read the spec sheet of the DIMMs on boot-up and then adUnfortunately, not in all cases is a "Memory Timing by SPD" even present. In other cases, the SPD recognition is done incorrectly, in that Timing by SPD only results in the slowest setting of 3:3:3. In other cases, the SPD reads the manufacturer code and adjusts the timing according to some idea of the quality of the product without checking the actual performance parameters. This also means that if the BIOS doesn't know the manufacturer because the DIMMs have not been qualified, it will set the BIOS, once again, to the slowest settings.
One potential drawback of the SPD is that it is not write protected and, thus, anyone with an EEPROM writer can reprogram it to show data out of specs which suggest better quality than actually present. These are just a few reasons not to trust the SPD entirely and, rather, use the manual settings as outlined above.
just the memory timing parameters accordingly. DRAM read latch delay (VIA chipset only)
If the memory clock frequency is increased, the cycle time gets shorter. Therefore, the data valid window (tDV), that is the time during which the chipset can receive data from the DRAM would come earlier (on an absolute time scale) and could possibly expire before the data from the DIMMs arrive. In order to compensate, tDV needs to be delayed or moved further back on the cycle to synchronize with the data output holding time of the DIMMs. In the case of the KT chipset, the situation is a bit more complicated since the Athlon family uses what is called clock forwarding or clock syncronous transfer. That is, a reference signal synchronizes all devices. Keep in mind that the KT chipset is a variation of the KX chipset and based on the timing parameters of the original SlotA Athlon. The socketed Thunderbird has shorter traces and, therefore, is able to receive data earlier relative to the originally established timing characteristics. Therefore, for example, the ASUS A7V BIOS even features a negative DRAM read latch delay of 0.1ns, that is, the data valid window opens even before the reference signal arrives. With all DIMMs tested, though, the Auto setting worked as well as any other manual setting.
<P ID="edit"><FONT SIZE=-1><EM>Edited by fastingsetiman on 12/22/02 03:39 AM.</EM></FONT></P>