DDR vs GDDR vs SSD

apisorder

Honorable
Jun 19, 2012
97
0
10,640
I was reviewing technical details of some motherboards and Nvidia graphic cards, and a couple of questions stuck me.

(1)
In my experience with system memory, the CPU has always been faster than RAM, much faster actually (if one could actually simply compare the clock speed of the CPU and memory, that is), and GPU is much slower than GRAM. Is there a reason why they are the way they are? In my simplified thinking, I would think the closer the speed of the CPU/GPU to its memory, the better: less waiting involved.

(2)
Why isn't GRAM being used for system memory too? I think the GPU also computes, but of course there are way many more cores on the GPU. I don't think costs is the reason: people are buying monitors that cost over $3,000, and running RAID with enterprise SSDs (Intel 750 1.2T, for example), for example.

(3)
Lastly, is it sensible to compare iops of SSDs to CPU/GPU/memory speed?


Thanks.
 
Solution


Like everything else, depends on what the hell you're doing!

Case 1:
You are doing very simple additions with all functions perfectly spread across cores, but accessing a memory location outside cache. You will be 100% memory limited
Case 2:
You have highly branching code, something like if(number[i-1]) number[ i ]++; . You will be 100% frequency limited
1) I'm not sure what you mean, the clock speed of RAM is almost always slower than that of the parent processor (whether it be the GPU or CPU) now, and when it's not it's simply comparable. Don't be deceived by the claims of manufacturers of "6000MHz VRAM SPEED!!!!!!!", they get this number by adding up the clock speed of each VRAM chip on the GPU.

2) VRAM is more or less the same kind of memory as RAM. The same type of memory chip is used. VRAM just needs to be on the GPU board so signals don't have to travel far from the GPU (and definitely so they don't have to go through the PCIe bus).

3) Not really.

Turns out I don't know that much about memory.

330f.gif
 
Very roughly
1) The CPU is indeed faster than the RAM; it has to wait for data to come from RAM if the data is not in the CPU's caches. I'd be interested to know why you say that the GPU is much slower than the GDDR. If it were "much slower," card makers wouldn't advertise the width and speed of their memory buses.

While the idea that the speeds should match is reasonable for many general applications, CPUs and GPUs are _very_ fast. Memory that fast is phenomenally expensive, and needs to be physically close to the processor to reduce delays. That's why a CPU die has several registers (immediately available data), more L1 cache (slower), more L2 cache than that (but slower). Large-scale memory technology is simply slower than processing technology.

2) GDDR is built specifically to support the access patterns most common in graphics tasks. DDR is optimized to support the patterns most common in less-parallel, more switching processing tasks. Each is better for the task that it is used for. Even the version numbers are unrelated; GDDR3 memory is not from the same family as DDR3 memory.

3) Yes, in the sense that you can compare the weight of a tank to the weight of a Volkswagen. The tank is heavier, but that doesn't really mean anything. For comparing SSDs to DDR/GDDR, I would look at the cost per IOPS and cost per gigabyte. DDR/GDDR is faster but more expensive. You could fill a box with DDR and support it as a disk drive, and it would be very fast indeed compared to any SSD. But at a ridiculous cost, and the memory would serve the machine better as main memory, accessed through a faster random-access mechanism than disk drives.
 
1) Making RAM faster means making it about thousands of dollars per GB, and then you have the issue of transferring that much data. If your ram ran at the same speed as a cpu's L1 cache (which is slower than registers), you would either need millions of pins or a clock a million times faster!
2) GDDR has different access patterns than DDR. It CAN be used for system memory (PS4 is an example), it's just not worth it because it's more expensive and pretty much requires the memory to be soldered to the board
3) YES! The only thing that makes sense actually. The order of magnitude is pretty big though, even with x4 PCIe NVME drives
 

apisorder

Honorable
Jun 19, 2012
97
0
10,640
I did some Googling and in retrospect, my thinking really was simplified, perhaps even over-simplified. Feel free to correct me.

I read that DDR4's transfer data rate is the product of its memory clock times its I/O clock, but I thought memory was slow in comparison to CPU because I thought the actual memory speed is only half of the number, i.e., PC-3600 is 1800mhz, times double rate (in comparison to the typical 3Ghz+ Intel CPU for example.) But it's actually much lower than that right? (For example, for PC-2400, the memory clock is 2400/800 = 300.)

I thought GPUs are much slower than their GDDR(X) memory because I presume the same thing, that a GDDR memory rated for 8000mhz clock is actually 4000mhz (the double rate part) but still twice as fast as typical GPU clock of around 2Gz. Frankly, even after reading Wiki, I still don't quite understand, but all I know is that there are actually three clocks for the GDDR5 memory, and it gives one example with 5 Gigabit memory bandwidth, the clocks are between 1.25Ghz (command clock) and 2.5Ghz (two write clocks), so maybe not that different from GPU speed after all. As to GDDR5X, I am just confused: if it really is twice as fast as GDDR5, then it has a command clock of 3Ghz, and two work clocks at 5Ghz?

And by the way, I read that what really matters is the number of CUDAs, memory bandwidth, and architecture, not the GPU core clock. Is that really so?
 
GPU memory clocks are false advertising most of the time. The actual chips on the graphics card board don't run anywhere near 8000MHz or whatever, the manufacturer just adds up the speed of individual chips. It's basically like advertising that a PC's RAM is clocked at 6400MHz but in fact it's just got a couple of 3200MHz sticks.

For GPU performance a lot of things matter. The number of cores (CUDA for Nvidia, Stream Processors for AMD) is not an absolute basis of comparison as a single CUDA core for example will be different between GPU architectures, so 1 Maxwell CUDA core =/= 1 Pascal CUDA core. More cores in practice means more performance as long as you're staying within the same architecture.

Memory bandwidth defines the rate at which data can be transferred to and from the VRAM. In practice, higher bandwidth means better performance at higher resolutions (or other workloads that will load up the VRAM such as large 3D animations).

The GPU core clock matters a lot, but generally all the cards from a specific GPU architecture will have similar core clocks, so it ends up being irrelevant when actually choosing cards. One example where this wasn't the case however is the 980 Ti and the Titan X. Both used the Maxwell architecture, but despite having fewer CUDA cores, the 980 Ti could perform a lot better in games because you could overclock a LOT more than the Titan X.
 

TJ Hooker

Titan
Ambassador

Umm, no. If a card says 8 GHz GDDR5, it means that the memory is running at an effective speed of 8 GHz. The actual clock rates may be lower, because as @apisorder mentioned the actual memory clock is only half the transfer rate (due to DDR), and then there's the command clock which is half of that again. But that's no different than normal DDR, where the frequency advertised is double the actual memory clock.
 
"Effective speed", also known to manufacturers as "the biggest number we can get away with advertising without getting sued". The actual VRAM does not run at 8GHz (to be fair RAM doesn't actually run at advertised speeds either but still).

Although this pretty much like the FX 8-core argument, with the conclusion being that it doesn't actually matter.
 

TJ Hooker

Titan
Ambassador
Let's say you had a GDDR5 chip with a 1 bit interface advertised as running at 8 GHz. The throughput to that chip would be 8 Gbps. Now, labeling it as 8 GT/s or 8 Gbps may be technically more accurate than using Hz (and I've seen that done). But many people probably understand Hz better, so I don't fault graphics card makers for labelling it that way when it's accurate from a performance perspective.
 

scuzzycard

Honorable


Actually GPUs have large caches, because as fast as GDDR5 and GDDR5X memory are, the access latency is still much too slow without a multi-level cache system. The GTX 1080, for example, has 48K L1 and 2MB L2 cache. The overall performance is a combination of the number of CUDA cores, memory bandwidth, architecture, and core clock. For example, GM204 in the GTX 960 is essentially half of the GM206 in the GTX 980. It has half as many CUDA cores, ROPs and TMUs. Therefore, a GTX 960 would have to run at double the core clock of the 980 to match its performance (and it would still suffer a bit because the memory bandwidth is only half due to having a 112-bit bus instead of 224-bit).
 

apisorder

Honorable
Jun 19, 2012
97
0
10,640
So then, I understand that the architecture, number of CUDA cores, memory bandwidth, and GPU core clock are all important, but what is the order of importance? (i.e. how to prioritize them when comparing similar cards/similar costs.)

Thanks.
 


Like everything else, depends on what the hell you're doing!

Case 1:
You are doing very simple additions with all functions perfectly spread across cores, but accessing a memory location outside cache. You will be 100% memory limited
Case 2:
You have highly branching code, something like if(number[i-1]) number[ i ]++; . You will be 100% frequency limited
 
Solution