Report: AMD Carrizo APUs To Get Stacked On-Die Memory
Is AMD's HSA arriving with force in the upcoming Carrizo APUs?
Perhaps it's an unusual source, but Bits 'n Chips from Italy has reported that AMD might be using stacked-DRAM memory in its upcoming Carrizo APUs. In itself, this shouldn't be particularly surprising – AMD is very actively pushing the Heterogeneous System Architecture.
The concept of the Heterogeneous System Architecture is that the CPU and GPU cores all have equal access to the system memory, where they can work together without each core being assigned a specific part of the memory, but where both are able to address any part of the memory at any time. This promises to allow for much higher performance levels, where the GPU can take care of highly parallelized tasks, and the CPU can take care of serial tasks. Such an architecture will really shine when some of the memory is on-die, delivering significantly lower latencies.
The Italian source indicates that having stacked on-die memory will lead to more cost-effective performance compared with on-die L3 cache. This will certainly be due to the more efficient use of the given hardware. The bulk of system memory would still be placed on DDR3, mainly because on-die memory will only add up to somewhere around 128 MB or 256 MB. The report also mentions the ability to stick to DDR3 for the system memory, as opposed to the much more expensive DDR4.
According to the report, the Carrizo APUs will be fabricated on a 28 nm lithographic process, while the stacked-DRAM will be fabricated on a 20 nm process.
Other sources also indicate that some of the Carrizo APUs will have the FCH on-die. We wonder how long it will take before we have all the required parts on-die, essentially creating SoC's (System-on-a-Chip).
Follow Niels Broekhuijsen @NBroekhuijsen. Follow us @tomshardware, on Facebook and on Google+.
Intel have a tendency to coast when allowed. It was athlon that drives them into the CORE microarch, and abandon netburst. Now we have the sandy-ivy-haswell coast again...
Intel have a tendency to coast when allowed. It was athlon that drives them into the CORE microarch, and abandon netburst. Now we have the sandy-ivy-haswell coast again...
But their IGP does.
Most desktop applications require a balance between bandwidth, latency and processing power. Once you pass the typical bandwidth and latency requirements for typical workloads for a given architecture, benefits drop off sharply. Intel simply happens to be a few miles ahead of AMD at decoupling their CPUs from memory latency and bandwidth under most circumstances.
GPUs on the other hand are almost entirely dictated by bandwidth since almost every computational challenge GPUs face can be made easier and faster with more, faster memory to cache results and duplicate frequently accessed items across memory channels to accommodate more concurrent accesses.
AMD + Radeon combo should have brought a real good product. We are waiting for a huge leap from the previous APU line up.
Keep fight back and bring a good competition. It will benefit all of us, customers
A good ideea would be to slap 1GB to 2GB of GDDR5 in there over a 192-256 bit bus, so the GPU can use that as a framebuffer (think A10-8800K with built in R9 270) and the CPU part can use that for L3 cache. Another cool thing is that you should be able to start your PC with all DIMM slots empty since RAM is on-die.
Why a 192 to 256 bit bus? Because it needs to be as fast as possible. If it's over a 128 bit bus like sistem memory, the CPU will not be able to use effectively it as L3 cache. Why GDDR5? Because the L3 cache should be as fast as the CPU, so it will not slow it down. Think FX 8350 - when you OC the northbridge to CPU speed (and with it the L3 cache) you notice significant performance improvement in demanding task (especially those FPU-related).
They moved away from it in favor of faster smaller cache on the CPU itself, and now they are moving back to having RAM, probably running at half speed again, connected on the CPU but not on the CPU die itself to increase performance. Interesting how history is repeating itself.
128MB may not sound like much but it is large enough to cache most of the most frequently used textures for the IGP if you tune settings accordingly. Crystalwell has about half the latency of system RAM and 4X as much bandwidth (50GB/s read + 50GB/s write, which is on par with 6GT/s 128bits GDDR5 without GDDR's extra latency), which should also be fairly useful for multitasking by keeping most regularly accessed stuff evicted from the CPU caches from having to be fetched from system RAM again every time a context switch flushes stuff out.
The frame buffer uses relatively little bandwidth compared to other more important things like the Z-buffer so it would be one of the last things that gets dumped in there when there is spare space.
Today's high-end GPUs may have a seemingly impressive 6GB RAM but what is most of that RAM used for? Resource duplication across multiple channels for read bandwidth multiplication.
What they're doing with stacked DRAM is the basically the best of both worlds. It's faster than falling back to main memory (bandwidth AND latency) so it can act as an L3 cache if necessary, but it will mainly benefit HSA and graphics. Invalid's post covers the graphics side of the equation pretty well.
update I ment murfies law*
Just released 35W FX-7600P Kaveri is pretty good for gaming laptop. Does anyone know which laptop manufacturer is making 17.3" with it?
I should've patented it...
^.^