ThundersCorsa

Distinguished
Jul 17, 2002
2
0
18,510
I have heard that running 2 of the same dimms is better than running one eg. putting in 2 x 128MB's is better than 1 x 256MB. How true is this?
Mikey
 

Oracle

Distinguished
Jan 29, 2002
622
0
18,980
On an nForce based Mobo, two is best.
On all the rest, one is best (except for RDRAM of course).

<font color=red>Floppy disk?!? What the heck's a floppy disk?!?</font color=red>
 

bum_jcrules

Distinguished
May 12, 2001
2,186
0
19,780
Mike. Welcome to THGC. Sorry for the rude delay in a response.

Running one memory module on SDR SDRAM, DDR SDRAM, or 32 bit RDRAM modules is better than two.

Here are two scenarios...

1. The farther you put the memory away from the MCH and the CPU the longer it takes for the signal to travel. It woun't really be noticable but under load it will show up. (Latency is latency.)

2. When you have more memory modules you have to deal with addressing issues. If a module trys to read an address and has a misread, it has to try again. When you have those read locations on multiple different modules the longer it takes to get a good read and thus increasing the latency time.

There are others, but the main point is that there is an increased number of latencies.

<b>"If I melt dry ice in a bathtub, can I take a bath without getting wet?" - Steven Wright</b>
 

Oracle

Distinguished
Jan 29, 2002
622
0
18,980
Bum, you make an interesting point.
In reference to your second point, would that be a reason why nForce's TwinBank technology is unable to leave KT266A or KT333 in the dust?
On paper, TwinBank looks WAAAAY more prettier than KT266A/333.

<font color=red>Floppy disk?!? What the heck's a floppy disk?!?</font color=red>
 

phsstpok

Splendid
Dec 31, 2007
5,600
1
25,780
The reason is FSB speed. nForce's memory subsystem has a theoretical bandwidth of 4.2 GB/sec but Front Side Bus is still limited to 2.1 GB/sec (not overclocked of course). Both chipsets have the same limitation for FSB bandwith and thus pretty much the same thruput.

The cool thing about nForce is that you don't need PC2700 memory or higher to overclock. Hopefully, we'll see the practicality of this with nForce2 which is 200mhz capable. You'll be able to run the FSB at 200 mhz (400 mhz DDR) with only PC2100 memory. (Actually you could use PC1600 memory if the proper ratios are supported). I'm assuming a board without onboard video. Otherwise the video will steal half the memory bandwidth.

<b>I have so many cookies I now have a FAT problem!</b><P ID="edit"><FONT SIZE=-1><EM>Edited by phsstpok on 07/20/02 03:03 PM.</EM></FONT></P>
 

Oracle

Distinguished
Jan 29, 2002
622
0
18,980
Well, I'd like to agree with you, but I believe the theoretical bandwidth of 4.2 GB/s is a total of the two channel which are each 2.1 GB/s. Each bank has its own channel to nForce's IGP (or SPP). So memory bandwidth is actually 2.1 GB/s (per channel).
Have a look here : http://www.nvidia.com/docs/IO/12/ATT/nForce_TwinBank_Memory_Architecture_Tech_Brief.pdf


<font color=red>Floppy disk?!? What the heck's a floppy disk?!?</font color=red>
 

bum_jcrules

Distinguished
May 12, 2001
2,186
0
19,780
That is the theoretical peak bandwidth to the MCH which is on the Northbridge. However that is then limited to the FSB and it's bandwidth. AMD specified ~133 or 400/3MHz. That FSB is on a 133MHz clock with two data pathways. (Double pumped for an effective 266MHz speed.) If you overclock the FSB to ~166 or 500/3MHz that theoretical bandwidth is then calculated at...

FSB at ~133.333 or 400/3 MHz

400/3MHz x 8 Bytes wide pathway(64 bits)x 2 pathways = 2,133.333 B/s or 6,400/3B/s or ~2.133GB/s.


FSB at ~166.667 or 500/3 MHz

500/3MHz x 8 Bytes wide pathway(64 bits)x 2 pathways = ~2,666.667 B/s or 8,000/3B/s or ~2.667GB/s.


The FSB bandwidth is the limiting factor. The P4 uses a 2 Byte pathway (16-bit), with 4 pathways, at 1600/3MHz or ~533.333 clock speed.

So that would be...

1600/3MHz x 2 Bytes x 4 pathways = 12800/3 Bytes/s or ~4266.667 B/s or ~4.267 GB/s.

So no matter how fast you RAM clock is, the FSB will limit the results of that higher memory speed and bandwidth. So for AMD systems... if you can achieve a 166MHz FSB, then using PC2700 will be operating at the same speed as the FSB. (1:1 Ratio) Anything faster will only be a small fraction faster. It is limited by the FSB. It is not that the memory will not run faster, it is the processor that can recieve or send it fast enough.

Understand?


<b>"If I melt dry ice in a bathtub, can I take a bath without getting wet?" - Steven Wright</b>
 

Oracle

Distinguished
Jan 29, 2002
622
0
18,980
MCH. Northbridge. Right (my mistake, sorry!).
I know that PC2700 is not on par with a 133Mhz (266DDR) FSB, but PC2100 is. All I was saying is that nForce's memory bandwidth is actually 2.1 GB/S times 2 channels and not a straighforward 4.2 GB/s. So FSB is not limiting anything when PC2100 modules are used in TwinBank configuration.
That's all.

And please... drop the sarcasm! Understand?
This is no place to disrespect people, but rather to help them and correct them when they're wrong!
Computing is complex and people are bound to make mistakes or ignore some information. I surely don't claim to know all there is about IT. I'm just a hobbyist who's really in to IT and building systems. I just share here what I learned from experience or from knowledgeable sources (like you obviously are). This comment was not meant as an attack, but rather as to get my point across that not all people here are IT experts. Some are novices, some are intermediate level users and some are experts. I come here to get answers and to give answers to the best of my knowledge and experience. That being said, forgive me if I misinterpreted your final comment and accept my apologies. Last thing I wan't is to disrespect another member of this community or make an enemy out of him.
If what I stated hereabove about memory and FSB is incorrect in some ways, please correct me.
Respectfully,
Yan.

<font color=red>Floppy disk?!? What the heck's a floppy disk?!?</font color=red>
 

phsstpok

Splendid
Dec 31, 2007
5,600
1
25,780
When there is no onboard video (as in the 415D) the bandwidth of both channels is available. Dual channel bandwidth is cumulative is this case. (This is why the article mentions that peak bandwidth is 4.2 GB/sec. It would be false representation if each channel was isolated and the bandwidth not cumulative). Slower memory can be used because of the overiding thruput limitation of FSB. PC1600 memory would yield a cumulative, dual-channel bandwidth of 3.2 GB/sec which would mean that FSB could be run as high as 200 mhz (this is assuming nForce2 and that nforce2 could use a 4:2 CPU/Memory ratio. I don't know that it can). At 200 mhz FSB bandwidth would be able to match the 3.2 GB/sec.

<b>I have so many cookies I now have a FAT problem!</b>
 

Oracle

Distinguished
Jan 29, 2002
622
0
18,980
Agreed!
Damn those integrated graphics chipsets!

<font color=red>Floppy disk?!? What the heck's a floppy disk?!?</font color=red>
 

bum_jcrules

Distinguished
May 12, 2001
2,186
0
19,780
Actually, the "Understand?" comment was to make sure you understood. (No sarcasm intended. I only use it with regulars that I know I can badger.) Some people inside the THGC don't understand the technical mumbo jumbo. I put that there just to say if you had more questions or if I was unclear, to point it out.

I apologize as well.


With that being said...

The Memory Controller Hub is integrated into the Northbridge of the motherboard. (I am not trying to talk down here...more for all that are reading along.) The MCH has clock generators that send out signals to the memory on the DIMMs at a specific speed. The MCH is the go between for the CPU and the memory modules. The CPU is connected to the MCH and the Northbridge through the FSB, Front Side Bus.

So back to what was stated above...

If the FSB is on a slower clock speed than the memory, any speed above the FSB's speed will yield a limited, fractional, improvement.

nForce uses the same FSB as all AMD chipsets. The Athlon, XP, and MP are designed around the 100MHz and 133MHz FSB clock speed signals. Since the FSB has two data pathways the effective speed is 200/266MHz. So until AMD starts using a faster FSB the improvements to all forms SDRAM and even if they used RDRAM will not help the bandwidth problem.

This is why there have not been significant improvements/yields out of the nForce boards. Athlon isn't looking for that additional bandwidth because it only knows the FSB.

:smile:


<b>"If I melt dry ice in a bathtub, can I take a bath without getting wet?" - Steven Wright</b>
 

Oracle

Distinguished
Jan 29, 2002
622
0
18,980
I clearly understand the FSB concept. But you still haven't convinced me how 2 channels with a 2.1 GB/s flow each could only transport the same amount of data in the same amount of time as a single 2.1 GB/s channel. Take a single-lane road where it's not posible to drive faster than 100mph. You'd get half as much cars at the end than a for a two-lane road with same limit, wouldn't you? Or is that a bad analogy and I just don't get it?

Considering what you said, why haven't board makers constructed a board with an enhanced dedicated path to the memory (something like HyperTransport, MuTIOL or others, but with way more bandwidth)?
The way I understand it, and correct me if I'm wrong, all request sent to the memory MUST go through the Northbridge via FSB. Then the Northbridge communicates with the memory through the front serial bus and the answer comes back through the same path. Wouldn't a dedicated high-bandwidth (say 6.4 GB/s) path be helpful between Northbridge and memory? I won't believe nobody thought of that or else it must be simply economically or technologically unfeasable on an 8-layer mobo.


<font color=red>Floppy disk?!? What the heck's a floppy disk?!?</font color=red>
 

bum_jcrules

Distinguished
May 12, 2001
2,186
0
19,780
You are completely correct in saying that higher bandwidth between the memory and the MCH is a good thing.

The FSB is crucial in the fact that that bandidth is then lost going to the CPU.

<b>"If I melt dry ice in a bathtub, can I take a bath without getting wet?" - Steven Wright</b>
 

Oracle

Distinguished
Jan 29, 2002
622
0
18,980
Is the MCH the only element to communicate with the memory? Aren't the DMA devices "talking" directly to the memory as well or are they going through the MCH first?


<font color=red>Floppy disk?!? What the heck's a floppy disk?!?</font color=red>
 

bum_jcrules

Distinguished
May 12, 2001
2,186
0
19,780
Everything that goes to the memory modules flows through the memory controller. It has a to translate the signal for the memory clock generator and then is relayed to the memory bus.

Here are the steps involved in a memory write. (Basic layout)

1 CPU requests info.

2 Request travels down the FSB to the memory controller.

3 Signal is translated into a new signal for the modules.

4 Modules receive read request.

5 Modules send requested info.

6 Memory Controller converts the signal for the FSB.

7 Signal travels along the FSB to CPU.

Now that was a simplified step by step explanation.


Why is it translated???

Let me do this...I'll use RDRAM to give you an overview on how memory is setup.

On page four of the paper, <A HREF="http://www.rdram.com/downloads/DRCG_d_0056_V1_3.pdf" target="_new"><i>Direct Rambus® Clock Generator</i></A>, there is a diagram of the memory system layout. It is technical but I will explain the main pieces.

There are 4 pieces involved. Three pieces are the memory itself and the fourth piece is the System Clock.

1 <b>The System Clock is the key to the whole operation. </b>

The CPU, Memory, AGP bus, PCI Bus, etc. are all driven by the System Clock. For the memory in our case it drives the Controller and the DRCG, Direct Rambus® Clock Generator.


2 <b>Second is the Controller.</b>

It is an ASIC, pronounced "A-Sick," which is short for (Application Specific Integrated Circuit) It is a chip that is designed for a specific application. ASICs perform better than general-purpose CPUs, because ASICs are "hardwired" to do a specific job.

Inside the ASIC is the RMC, Rambus® Memory Controller (a.k.a. MCH) and the RAC Rambus® Access Cell.

3 <b>The DRGC - Clock generator. </b> (I'll explain more later.)

4 <b>The RIMMs and RDRAMs. </b>

Here is the process for RDRAM.

The controller takes the system clock signal, uses a logic component to translate it to a frequency that the DRCG uses to send the signal to the RIMMs across memory bus.

The modules on the RIMMs then send the signal back to the logic unit to be re-translated back to the FSB signal.

With PC1066 the clock speed is 1600/3 or 533MHz. So if the FSB is running at 400/3 or 133MHz with four channels it has an effective 533MHz signal. So the Ratio through the Memory Controller is a 1:1 ratio.

So back to the earlier discussion...

That explains why the yields are only fractional better when the memory is faster than the FSB. Intel is trying to keep the 1:1 ratio. Right now AMD using a 400/3 or 133MHz signal for the FSB. (133MHz x 2 = 266MHz effective Bus) But the signal is still 133MHz. Using DDR400 or PC3400, the signal is 200MHz. So the ratio is 2:3 or ~0.667:1 with a 133MHz FSB and DDR400. IF you increase the FSB to 166MHz the ratio is then that is ~0.8333:1. Much better results. So the improvement in the FSB signal speed yields a better result. If the FSB were 1:1 there then would be an almost seamless interaction with the memory. (The differences would be the latency cycles. Tiny Numbers)

I hope that was clear...Getting technical makes it harder to explain.

<b>"If I melt dry ice in a bathtub, can I take a bath without getting wet?" - Steven Wright</b>
 

bum_jcrules

Distinguished
May 12, 2001
2,186
0
19,780
So for conceptual purposes think of the FSB as a kink in a garden hose. It slows the volume of water down because it has to slow down or be forced into a narrower pathway. Os if they are the same effective size and the same speed the ratio of 1:1 is bliss. No inefficiencies.


BTW: I am no rocket scientist...Just an idiot that like computers. Look at my profile. It is just my hobby. I do, however, have friends that are Rocket Scientists, EEs, MEs, PHd's, etc. Like I said, I am just an idiot that like to know how things work.

<b>"If I melt dry ice in a bathtub, can I take a bath without getting wet?" - Steven Wright</b>
 

Oracle

Distinguished
Jan 29, 2002
622
0
18,980
Your explanation was pretty clear to me. I do understand pretty much of the tech gobbledygook and the concepts behind computing.
I clearly understand the basis of FSB, PCI, ISA (whatever old that is), USB (1.1 or 2), 802.11a (or b), IEEE 1394, SCSI, ATA (or ATAPI or E-IDE or UDMA), Serial ATA, RDRAM, DDR and all that kind of stuff. I just don't know all of their specifics and deeper meanings.
Thanks for the info, though. That helped me to get deeper in the computing stuff and better help friends who rely on me for advice.
Thanks!
-Yan

<font color=red>A platform is not an oil rig.</font color=red>
 

Oracle

Distinguished
Jan 29, 2002
622
0
18,980
BTW: I too have friends in the business who are robotics engineers, but still, I manage to know more than them at times. They are quite surprised when I explain to them an IEEE or JEDEC standard they never heard of before. ;-)


<font color=red>A platform is not an oil rig.</font color=red>
 

Stain

Distinguished
Jun 28, 2002
331
0
18,780
Okay, much of what was said here was beyond me but is the gain that the nForce gets because it's ratio is closer to 1:1? Also how does having DDR400 over DDR266 help if FSB is still 133? Doesn't what you said mean that having 133FSB limits your memory bandwidth to 2.1GB/s and having anything over that is not only useless but it negatively effect system performance as it wouldn't be 1:1?!

Information Overload
-going to read this thread again and hopefully soak up some more
 

bum_jcrules

Distinguished
May 12, 2001
2,186
0
19,780
1. "...is the gain that the nForce gets because it's ratio is closer to 1:1?"

No. nForce is still limited by the FSB at 100MHz or 133MHz. So it will only yield a fractional gain with any memory system with a signal faster than the 133MHz FSB. The system clock is driving all of it. AGP, CPU, PCI, etc. are all multiples of 100 or 133.

2. "Also how does having DDR400 over DDR266 help if FSB is still 133?"

DDR400 is on a 200MHz clock. So just like RDRAM, it has to be translated for the FSB. It is like the transmission on a car. The controller puts both FSB and the Memory bus in harmony with each other. So if the speed to and from the memory is faster to the FSB then the overall process will improve, however only fractionally.

It is like two guys that are working together in an assembly line. Man #1 is waiting for parts from #2 to finish what man #1 is working on. Man #1 is the FSB/CPU and #2 is the memory/memory bus. If #2 always has parts ready for #1 then #1 will do his job in an efficient manner. If however #2 can have them ready for him and organized and ready for #1 to use then the process can improve slightly. The whole operation is still dependent on #1.

3. "Doesn't what you said mean that having 133FSB limits your memory bandwidth to 2.1GB/s and having anything over that is not only useless but it negatively effect system performance as it wouldn't be 1:1?!"

First, the memory bandwidth is for the memory bus itself. So the FSB does not limit the memory bus. They are two different buses.

Secondly, it will not negatively effect the process. Only fractionally improve it. This why there have been greater yields with DDR333, DDR400, and dual channel DDR systems. Those greater yields are only minimal at best. This is what I mean by fractional.

As the ration between FSB and the memory bus reach 1:1, the better the system will operate. For a long time the FSB was 1:1. PC66 and 66MHz FSB, PC100 and a 100MHz FSB, PC133, and even DDR266 with an effective 266 MHz FSB. Until the ratio goes back to 1:1 there will only be fractional gains. Intel engineered systems with 1:1 ratios. (An effective 533MHz, 133MHz x 4 channels, with PC1066 which uses a 533MHz clock.) Hummmmm.... Dual channels, a DDR signal, and at 533MHz equals the bandwidth and clock speed of the FSB.

This could be one reason that the AMD will have the memory controller in the integrated northbridge on the Hammer's die. Only time will tell on that CPU for sure.

<b>"If I melt dry ice in a bathtub, can I take a bath without getting wet?" - Steven Wright</b>
 

Oracle

Distinguished
Jan 29, 2002
622
0
18,980
"This could be one reason that the AMD will have the memory controller in the integrated northbridge on the Hammer's die. Only time will tell on that CPU for sure."

Just wait a minute here... If Northbridge is integrated into the CPU (ie, on-die MCH) and the path to and from the memory is dedicated, wouldn't there be a MAJOR kickass boost in memory performance?
Let's suppose the CPU requires a 200Mhz FSB for an actual 800Mhz QDR FSB (I wouldn't expect anything less!) and a dedicated path to the memory, I'm drolling just thinking about the possibilities. A PC6400 module would make a major difference.


<font color=red>A platform is not an oil rig.</font color=red>
 

bum_jcrules

Distinguished
May 12, 2001
2,186
0
19,780
Hammer is going to put the controller on the die and build a direct line to the modules. So forget about the FSB effecting the memory. The system clock will still be the driving factor.

"The Hammer micro-architecture incorporates a dual-channel DDR DRAM controller with a 128-bit interface capable of supporting up to eight DDR DIMMs (four per channel) as seen in the Hammer micro-architectural diagram in Figure 2. The controller will be initially designed to support PC1600, PC2100, and PC2700 DDR memory using unbuffered or registered DIMMs. This translates into available bandwidth to the processor of potentially up to 5.3GB/s with PC2700 memory." - <i>AMD Eighth-Generation Processor Architecture</i> Page 4.

On page 5 it goes on to state...

"The integrated Hammer memory controller has an even more dramatic effect when powering multiprocessing systems. The controller results in an outstanding advance in x86 system architecture scalability by enabling “glueless” multiprocessing where the available memory bandwidth to the system scales with the number of processors. In Figure 3, an example of a four-processor multiprocessing system is shown. In this configuration, the system is able to support up to 32 DIMMs capable of delivering an extraordinary 21.3GB/s of available memory bandwidth to the system with PC2700 memory."

The reason is that on a 4-way system each processor die can access any DIMM. So 2 channels x 4 sets of channels x 500/3 (~166.667MHz) = 21.333Gb/s.

The only thing I am not sure on is how limiting the Hypertransport for a multiprocessor system will act. I.E say a 4 way system. Hypertransport is a bi-directional, 16-bit, 1600MT/s (Mega-transfers per second), 3.2GB/s data-pathway. See page 7 for a block diagram.

On page 5 of the <i>AMD HyperTransport™ Technology I/O Link - A High-Bandwidth I/O Architecture</i>, it shows that the Hypertransport bus is 32-bits wide with a maximum of a 1.6GHz or 1600MHz signal. (1600MHz x 4 Bytes = 6.400GB/s. Now that is one direction. Bi-directional would be 12.8GB/s.

My questions are these...

1. Which is it in the core? 16-bit or 32-bit?

2. Does one use only one direction or both to figure the bandwidth for Hammer using HT?

[rant on]
Also, in the <i>"AMD’s Next Generation Microprocessor Architecture"</i> which was presented by Fred Weber back in October of 2001, one point was "Bandwidth and capacity grows with number of CPUs." This cannot be... But that is another story. [rant off]

I don't know what to exactly believe. Either way it will be a lot faster than what you think. The FSB has nothing to do with memory bandwidth here.

For a single chip setup, the peak bandwidth will be 5.3GB/s. For a multiprocessor system, say a 4-way system, it is peaked out at whatever HyperTransport will allow or what the memory will allow, whichever is smaller or some combination of the two together. That would be... 6.4GB/s (in one direction) x 2 HT data paths = 12.8GB/s for the Hypertransport. The memory would be memory bandwidth of 5.3GB/s per CPU die. (4 CPUs x 5.3GB/s = 21.2GB/s) That is the number that AMD shows but I am not sure it can reach that memory number.

I think it will be something like this.

5.3GB/s for each CPU plus what it can access from the other CPUs' DIMMs. So I think it would be 5.3GB/s (CPU direct)+ 12.8GB/s (Hypertransport) = 18.1GB/s.

Now again this is only my assumption based on what I have read and understand. So if someone out there has a better understanding or has a spin I am not seeing yet, please point it out.

<b>"If I melt dry ice in a bathtub, can I take a bath without getting wet?" - Steven Wright</b>