Sign in with
Sign up | Sign in
Your question

Question about GDDR5 memory

Last response: in Memory
Share
June 10, 2012 4:18:13 PM

Hi, I'm trying to understand how GDDR5 memories work but there a few things I do not understand.

Basically I've read that a single GDDR5 chip has a 32 bit data bus and datas are transferred to the I/O bus at the WCK frequency (2*CK), 2 bit per cycle. So every WCK cycle two 32 bit words are available to be sent to the I/O bus.
So it's a 4bit/bit line per CK cycle memory.
What I don't undestand is the purpose of the 8 bit pre-fetch buffer if it's basically a DDR memory running at actual WCK frequency or equivalently a DDR2 memory running at CK internal frequency.
With WCK being double CK it would only need a 2 bit pre-fecth buffer to achieve its data transfer rate, not a 8 bit buffer.

If someone could help with my doubts it would be greatly appreciated, thanks :)  :bounce:  :sol: 

More about : question gddr5 memory

a b } Memory
June 10, 2012 7:11:26 PM

Just wondering, but why do you need to know all this?
June 10, 2012 9:50:01 PM

Raiddinn said:
Just wondering, but why do you need to know all this?


I'm studying computer architecture and digital electronics.
Unfortunately this stuff is considered to be pretty specialistic so it's not covered in standard books and even on the internet I've not been able to find clear articles/white papers on the matter.
I've tried to read a few data sheets from a few memory manufacturers (Qimonda, Hynix, etc) but they basically try their best not to explain exactly how things work :lol:  :lol:  Others (like Wikipedia) simply make random cut&paste from these same data sheets without actually explaining anything.

So I thought/hoped that there would be experts here that can help me with my doubts.
Related resources
a b } Memory
June 10, 2012 10:04:30 PM

To be completely honest with you, you probably know more about this stuff than 99% of people that offer support here.

We don't tend to be the sorts of people that work directly for Samsung, Crucial, Hynix, etc designing these things. More commonly we are just regular guys that don't even work in IT that happen to be particularly good at figuring out why computers aren't working.

I highly doubt that you will find the answer you seek on these boards. I would look for some sort of message boards specific to people who do computer hardware engineering if any exist.

As you mentioned, I am sure such things are closely guarded secrets of the handful of companies that actually have the expertise to design these sort of chips in the first place.

Even half the RAM brands out there probably don't have people on staff who can adequately answer the question. Most brands just buy chips from a major manufacturer, stick them on a PCB, slap a label on it, and shove it on store shelves somewhere.
a b } Memory
June 10, 2012 10:40:41 PM

I don't know every little detail, but GDDR5 has a quadruple data (or at least something akin to it) rate, so you're right about it transmitting 4 bits per clock cycle per bit line. Each chip definitely has a 32 bit bus and although I've never heard of any that deviate from this, I can't confirm that there are no GDDR5 chips that have a wider or smaller bus.

Take this with a grain of salt, but Wiki has had a few half-decent pages about several memory interfaces. Maybe the GDDR5 wiki has the information that you are looking for. If not, it might at least have some good source articles. I admit that I've done some research on RAM technology in the past, but it's been a while.
a b } Memory
June 10, 2012 10:59:22 PM

Also, each chip's actual frequency is a fraction of the frequency of the input/output bus which runs at a frequency as measured in MHz that is one fourth of its transfer rate as measured in MT/s, for GDDR5. I don't know the ratios for chip to input/output bus frequency GDDR5, but I do know them for DDR3 and DDR4 and I'm pretty sure that DDR3 and GDDR5 have the same ratio for chip to input/output bus frequency, 1 to 4 and if you apply that to your expected 2 bit prefetch, then an 8 bit prefetch makes sense. I think that DDR4 and GDDR5 chips with the same bandwidth run at the same actual frequency, just with GDDR5 running the input/output bus at four times faster than the chip's internal frequency and one fourth of the transfer rate (transfer rate is sixteen times greater than the chip frequency) and DDR4 instead has the input/output bus running at eight times the actual chip frequency and one half of the transfer rate (still sixteen times the actual chip frequency).

Basically, a DDR3 chip on a module marketed as 1600MHz is actually running at 200MHz (assuming that you run the module at the intended stock frequency), the input/output bus is running at 800MHz, and this bus transfers data on both the rising and falling edge of the clock, so you get 1600MT/s, marketed as 1600MHz to not confuse tech newbie people. This I can confirm. I believe that GDDR5 does the same thing, just with a quadruple data rate on it's input/output frequency instead of a double data rate, and DDR4 goes the simpler, cheaper route of doubling the input/output frequency instead of the data rate of that frequency to get similar performance per chip as GDDR5.

So, in order for GDDR5's input/output frequency to be four times greater than the internal frequency of each chip, it needs an 8 bit wide prefetch. A 2 bit wide prefetch would leave it more similar to the first generation DDR SDRAM interface, except with a quadruple data rate on the input/output bus instead of a double data rate, so it would perform more like DDR2.

I'm not a RAM engineer, so don't hold me to this, but I think that it at least gives you an idea for what to look for. Another thing that I can confirm is that GDDR5 is at least partially based on DDR3. If you find anything, would you mind PMing me or posting in this thread? I'd like to get caught up on this and you seem like you've at least got something going here.

DDR3-1600MHz effective
chip frequency = 200MHz
input/output frequency to processor = 800MHz
double data rate 800MHz = 1600MT/s

GDDR5-3200MHz effective
chip frequency = 200MHz
input/output frequency = 800MHz
quadruple data rate 800MHz = 3200MT/s

And for completeness's sake,
DDR4-3200MHz effective
chip frequency = 200MHz
input/output frequency = 1600MHz
double data rate 1600MHz = 3200MT/s

The chip frequencies should be the frequency that the actual memory cells are running at, not necessarily the rest of the chip, but it explains the need for an 8 bit prefetch quite nicely IMO. I think that this is done because DRAM cells can't be clocked very high safely, so we do tricks like this to mitigate the problem that low reliable clock frequencies presents.

EDIT: I corrected a few sentences that could have been misunderstood due to very poor wording and added a small reference sheet thingy at after the last paragraph.
June 11, 2012 12:49:08 AM

blazorthon said:
Also, each chip's actual frequency is a fraction of the frequency of the input/output bus which runs at a frequency as measured in MHz that is one fourth of its transfer rate as measured in MT/s. I don't know the ratios for GDDR5, but I do know them for DDR3 and DDR4. I think that DDR4 and GDDR5 chips with the same bandwidth run at the same actual frequency, just with GDDR5 running the input/output bus at four times faster than the chip's internal frequency and one fourth of the transfer rate (transfer rate is sixteen times greater than the chip frequency) and DDR4 instead has the input/output bus running at eight times the actual chip frequency and one half of the transfer rate (still sixteen times the actual chip frequency).

Basically, a DDR3 chip on a module marketed as 1600MHz is actually running at 200MHz, the input/output bus is running at 800MHz, and this bus transfers data on both the rising and falling edge of the clock, so you get 1600MT/s, marketed as 1600MHz to not confuse tech newbie people. This I can confirm. I believe that GDDR5 does the same thing, just with a quadruple data rate on it's input/output frequency instead of a double data rate, and DDR4 goes the simpler, cheaper route of doubling the input/output frequency instead of the data rate of that frequency to get similar performance per chip as GDDR5.

So, in order for GDDR5's input/output frequency to be four times greater than the internal frequency of each chip, it needs an 8 bit wide prefetch. A 2 bit wide prefetch would leave it more similar to the first generation DDR SDRAM interface, except with a quadruple data rate on the input/output bus instead of a double data rate, so it would perform more like a less efficient form of DDR2.

I'm not a RAM engineer, so don't hold me to this, but I think that it at least gives you an idea for what to look for. Another thing that I can confirm is that GDDR5 is at least partially based on DDR3. If you find anything, would you mind PMing me or posting in this thread? I'd like to get caught up on this and you seem like you've at least got something going here.



Thank you for your reply. Based on what I know you're correct with the things you say and you even ended up with my same doubt! :D  Let me explain...

Basically having a data rate being double, quadruple or octuple the internal clock frequency (as in DDR, DDR2 and DDR3 respectively) relies on a tecnique called pre-fetch buffering.
The idea is the same, to access a word in a memory I have to access to a row which contains multiples words. When I access to a row, I get simultaneous access to multiple words.
I can use a multiplexer with a column address to choose a single word out of the row and write it on the data bus, but that's not efficient considering that the next word the CPU is probably gonna need is the next one in the same row which I already had available in first place.
So we want to take advantage of the fact that we have multiple words available after a RAS signal. How?
Basically once a full row is ready, in the next clock cycle we can take a few words (how many depends on the burst level we want to achieve) out of the row and write them simultaneously in a buffer which acts as a queue at a bit line level.
If we take 2 words simultaneously out of a row, we will need a 2 bit buffer for each bit line of the data bus.
If we take 4 words, a 4 bit buffer for each bit line.
8 words, 8 bit buffer.
This queue will be emptied at a speed which is a multiple of the clock frequency so that all the words queued will be out in a single clock cycle (in time to receive the next batch of words to be transmitted or to end the read operation).
So we'll get 2 words out in a single clock cycle for DDR, 4 for DDR2 and 8 for DDR3.
In a DDR memory the timings of the signals which control the queue buffer (or pre-fecth buffer) are such that the data trasmissions on the bus are synchronized with both the fronts of the internal clock.
In DDR2 each clock cycle, on each bit line 4 bit are trasmitted, those transmissions are synchronized with a signal which is double the frequency of the internal clock and on both fronts (alternatively they can be seen as synchronized with a signal which is 4 times the frequency of the internal clock but on a single front).
DDR3 is the same thing. The data trasmissions happen on both fronts of a clock signal with quadruple the frequency of the internal clock or on single fronts of a clock with 8x the internal frequency.

Now on to the GDDR5.
As far as I could get from the data sheets, this memory has a claimed DATA RATE (not the I/O frequency) of 4 times the internal clock cycle (which means that I/O frequency is 2 times the internal clock frequenct but with trasmissions on both fronts).
In fact to calculate the bandwidth of GDDR5 memory used on graphics cards the formula is CK*4*(bus width/8).
The difference between this memory and DDR3 is that words are sent out of a row with a WCK signal which is double the internal clock and the burst level is claimed to be 2x, so for each bit line, two bits are taken out of the memory in a WCK period which is half the CK period.
So basically to me this memory seems just like a DDR memory where the (2) words are read out of the memory at double the clock frequency. So effectively it's a DDR2, with a data rate being 4 times the internal clock.
It would only need a 2 bit buffer to achieve that data rate....yet......it has a 8 bit buffer like DDR3 which seems useless to me if you only take two words simultaneously out of memory banks. To take advantage of it we should take 8 words simultaneously in a WCK period. But that would lead to a data rate 16X the internal clock, while the data rate is claimed to be just 4X.
So in this whole mess I guess that I got something wrong...maybe it's better to go to the beach :bounce:  :bounce:  :bounce:  :lol:  :lol:  :lol:  :lol: 
June 11, 2012 1:16:16 AM

blazorthon said:


GDDR5-3200MHz effective
chip frequency = 200MHz
input/output frequency = 800MHz
quadruple data rate 800MHz = 3200MT/s




I've just read your edit.

Afaik GDDDR5 are not classified like that.
That would solve the problem :bounce:  It would be a 8 bit prefetch buffer being fed at twice the CK frequency, hence a 16X effective data rate.
But it's not like that. GDDR5 are classified like:
GDDR5 (internal CK ) or GDDR (Gigabit/s=Data Rate per bit line).
Today's fastest GDDR5 chips are 32bit chips running a 1.5 GHz or 6 Gigabit/s. On a 256 bit bus they provide an impressive 192 GB/s bandwidth.
An example from Wikipedia:
The WCK runs at twice the CK frequency. Taking a GDDR5 with 5 Gbit/s data rate per pin as an example, the CK clock runs with 1.25 GHz and WCK with 2.5 GHz


Btw as you can see effectively you have the same doubt as me :lol:  :lol:  :lol: 
a b } Memory
June 11, 2012 2:03:26 AM

You might be confusing internal clock with the memory clock.

Here's what I think:

The clock frequency of the actual memory in GDDR5-6GHz effective (such as the GTX 680) is not 1500MHz, it is much lower than that. Not even SRAM cells can clock at 1500MHz and I know that DRAM cells like GDDR5's top out at a quarter of it without some serious overclocking and even then, they can't get too much higher.

We call DDR3 double it's input/output frequency, but the DRAM cells in DDR3-1600 actually only run at 200MHz and the input/output frequency is four times the frequency of the memory cells. I know that this is true.

DRAM cells can't really clock at over 400MHz (I know this is true) easily, so we do tricks to get much higher bandwidth out of it. GDDR5's memory clock is a quarter of it's CK clock which is a quarter of its data rate. The clock frequency of the memory cells in GDDR5 6GHz effective is 375MHz, very close to that 400MHz barrier. We've made minor improvements over time, but DRAM cell's maximum frequency is still not even 500MHz reliably. It takes extreme cooling to take it so far and farther, stuff such as liquid nitrogen or even better, liquid helium.

So, GDDR5 has an 8 deep prefetch so that it's CK clock can be filled due to its memory clock being a quarter of its CK clock. The data rate is 4 times the CK clock, but is 16 times the memory clock.

GDDR5-6GHz
memory cell clock frequency is 375MHz
CK clock frequency (aka chip's bus frequency) is 1500MHz
WCK clock frequency is 3000MHz
Data rate is 6000MT/s

The 400MHz barrier is still in effect for stock VRAM on all cards that I know of. We have made some improvement, but that can only be seen when you overclock the memory and find out that even today, going past the 450MHz mark (GDDR5-7.2GHz effective) is not really doable most of the time.

Sure, we don't classify GDDR5 by its actual memory clock, but that memory clock is still there and is very important. You need memory such as SRAM to break the 400MHz (nowadays, more like 450MHz) barrier by great margins. DRAM cells can't go very high because their capacitors take too long to charge and discharge, so they can't be clocked any higher than the capacitors can charge and discharge. SRAM can clock higher because it doesn't store data in slow capacitors, it uses a transistor flip-flop that can move electricity much faster and more reliably. SRAM, however, needs at least four transistors to do this and they tend to be kinda large transistors, so it can't be nearly as dense as DRAM made on a similar process node and is thus much more expensive and is then reserved for more specific uses where lower capacity is not much of a problem.

I hope that this clears things up.
June 11, 2012 12:43:32 PM

Thank you blazorthon,
I think that I found the solution to my problem.
As you said my mistake was to consider the CK and WCK signals as internal frequencies.
Instead CK and WCK are just temporization signals fed by the memory controller to the memory module externally.
In the memory those frequencies are both diveded by a factor of 4.
So CK/4 is the frequency of the memory cells, WCK/4=CK/2 is the working frequency of the column multiplexers.
Each bit line has a 8 bit buffer which is fed at double the working frequency of the memory cells. So each CK cycle 4 bits are trasmitted per bitline, or equivalently 2 bits every WCK cycle.
Basically GDDR5 has double the bandwidth of DDR3 while still using a 8 bit pre-fecth buffer thanks to a higher (2X) column mux frequency.

Thanks again for your help, of course it should have been obvious since the beginning that memories manufactured with the same technology processes couldn't work at such different internal core frequencies.
But sometimes things get more complicated than they actually are by misleading labels, commercial names and such.
a b } Memory
June 11, 2012 1:14:57 PM

Kenji83 said:
Thank you blazorthon,
I think that I found the solution to my problem.
As you said my mistake was to consider the CK and WCK signals as internal frequencies.
Instead CK and WCK are just temporization signals fed by the memory controller to the memory module externally.
In the memory those frequencies are both diveded by a factor of 4.
So CK/4 is the frequency of the memory cells, WCK/4=CK/2 is the working frequency of the column multiplexers.
Each bit line has a 8 bit buffer which is fed at double the working frequency of the memory cells. So each CK cycle 4 bits are trasmitted per bitline, or equivalently 2 bits every WCK cycle.
Basically GDDR5 has double the bandwidth of DDR3 while still using a 8 bit pre-fecth buffer thanks to a higher (2X) column mux frequency.

Thanks again for your help, of course it should have been obvious since the beginning that memories manufactured with the same technology processes couldn't work at such different internal core frequencies.
But sometimes things get more complicated than they actually are by misleading labels, commercial names and such.


Any time :)  Thanks for the refresher course.
!