Perhaps it's an unusual source, but Bits 'n Chips from Italy has reported that AMD might be using stacked-DRAM memory in its upcoming Carrizo APUs. In itself, this shouldn't be particularly surprising – AMD is very actively pushing the Heterogeneous System Architecture.
The concept of the Heterogeneous System Architecture is that the CPU and GPU cores all have equal access to the system memory, where they can work together without each core being assigned a specific part of the memory, but where both are able to address any part of the memory at any time. This promises to allow for much higher performance levels, where the GPU can take care of highly parallelized tasks, and the CPU can take care of serial tasks. Such an architecture will really shine when some of the memory is on-die, delivering significantly lower latencies.
The Italian source indicates that having stacked on-die memory will lead to more cost-effective performance compared with on-die L3 cache. This will certainly be due to the more efficient use of the given hardware. The bulk of system memory would still be placed on DDR3, mainly because on-die memory will only add up to somewhere around 128 MB or 256 MB. The report also mentions the ability to stick to DDR3 for the system memory, as opposed to the much more expensive DDR4.
According to the report, the Carrizo APUs will be fabricated on a 28 nm lithographic process, while the stacked-DRAM will be fabricated on a 20 nm process.
Other sources also indicate that some of the Carrizo APUs will have the FCH on-die. We wonder how long it will take before we have all the required parts on-die, essentially creating SoC's (System-on-a-Chip).
Follow Niels Broekhuijsen @NBroekhuijsen. Follow us @tomshardware, on Facebook and on Google+.
Intel have a tendency to coast when allowed. It was athlon that drives them into the CORE microarch, and abandon netburst. Now we have the sandy-ivy-haswell coast again...
Most desktop applications require a balance between bandwidth, latency and processing power. Once you pass the typical bandwidth and latency requirements for typical workloads for a given architecture, benefits drop off sharply. Intel simply happens to be a few miles ahead of AMD at decoupling their CPUs from memory latency and bandwidth under most circumstances.
GPUs on the other hand are almost entirely dictated by bandwidth since almost every computational challenge GPUs face can be made easier and faster with more, faster memory to cache results and duplicate frequently accessed items across memory channels to accommodate more concurrent accesses.