Sign in with
Sign up | Sign in

Report: AMD Carrizo APUs To Get Stacked On-Die Memory

By - Source: Bits 'n Chips | B 22 comments
Tags :

Is AMD's HSA arriving with force in the upcoming Carrizo APUs?

Perhaps it's an unusual source, but Bits 'n Chips from Italy has reported that AMD might be using stacked-DRAM memory in its upcoming Carrizo APUs. In itself, this shouldn't be particularly surprising – AMD is very actively pushing the Heterogeneous System Architecture.

The concept of the Heterogeneous System Architecture is that the CPU and GPU cores all have equal access to the system memory, where they can work together without each core being assigned a specific part of the memory, but where both are able to address any part of the memory at any time. This promises to allow for much higher performance levels, where the GPU can take care of highly parallelized tasks, and the CPU can take care of serial tasks. Such an architecture will really shine when some of the memory is on-die, delivering significantly lower latencies.

The Italian source indicates that having stacked on-die memory will lead to more cost-effective performance compared with on-die L3 cache. This will certainly be due to the more efficient use of the given hardware. The bulk of system memory would still be placed on DDR3, mainly because on-die memory will only add up to somewhere around 128 MB or 256 MB. The report also mentions the ability to stick to DDR3 for the system memory, as opposed to the much more expensive DDR4.

According to the report, the Carrizo APUs will be fabricated on a 28 nm lithographic process, while the stacked-DRAM will be fabricated on a 20 nm process.

Other sources also indicate that some of the Carrizo APUs will have the FCH on-die. We wonder how long it will take before we have all the required parts on-die, essentially creating SoC's (System-on-a-Chip).

Follow Niels Broekhuijsen @NBroekhuijsen. Follow us @tomshardware, on Facebook and on Google+.

Discuss
Ask a Category Expert

Create a new thread in the News comments forum about this subject

Example: Notebook, Android, SSD hard drive

This thread is closed for comments
Top Comments
  • 12 Hide
    PEJUman , July 15, 2014 8:13 AM
    Most CPU benches for intel does not seem to scale with memory bandwidth (at least when compared to AMD APUs). I think AMD processors would benefit a lot more from on package DRAM (ala crystalwell); who knows, maybe this will allow them to finally catch-up to intel again. We really need AMD.

    Intel have a tendency to coast when allowed. It was athlon that drives them into the CORE microarch, and abandon netburst. Now we have the sandy-ivy-haswell coast again...
Other Comments
  • 6 Hide
    SteelCity1981 , July 15, 2014 7:39 AM
    Not surprising Carrizo APUs will stick with DDR3. AMD always waits a year or two for new gen DRAM prices to go down before using it.
  • 4 Hide
    InvalidError , July 15, 2014 7:42 AM
    Since many of Intel's roadmaps seemed to indicate Intel was planning to make their 128MB Crystalwell L4$/GDR standard across most of their lineup next year, I would have been more surprised if AMD did not announce something similar to avoid falling even further behind.
  • Display all 22 comments.
  • 1 Hide
    ykki , July 15, 2014 7:52 AM
    Your move, Intel.
  • 5 Hide
    Menigmand , July 15, 2014 8:07 AM
    They should call it AMD Chorrizo..
  • 5 Hide
    CaptainTom , July 15, 2014 8:10 AM
    Please god let it be so!
  • 12 Hide
    PEJUman , July 15, 2014 8:13 AM
    Most CPU benches for intel does not seem to scale with memory bandwidth (at least when compared to AMD APUs). I think AMD processors would benefit a lot more from on package DRAM (ala crystalwell); who knows, maybe this will allow them to finally catch-up to intel again. We really need AMD.

    Intel have a tendency to coast when allowed. It was athlon that drives them into the CORE microarch, and abandon netburst. Now we have the sandy-ivy-haswell coast again...
  • 0 Hide
    InvalidError , July 15, 2014 9:02 AM
    Quote:
    Most CPU benches for intel does not seem to scale with memory bandwidth

    But their IGP does.

    Most desktop applications require a balance between bandwidth, latency and processing power. Once you pass the typical bandwidth and latency requirements for typical workloads for a given architecture, benefits drop off sharply. Intel simply happens to be a few miles ahead of AMD at decoupling their CPUs from memory latency and bandwidth under most circumstances.

    GPUs on the other hand are almost entirely dictated by bandwidth since almost every computational challenge GPUs face can be made easier and faster with more, faster memory to cache results and duplicate frequently accessed items across memory channels to accommodate more concurrent accesses.
  • 2 Hide
    danwat1234 , July 15, 2014 10:17 AM
    AMD needs to make some ~47W TDP mobile APUs, not just up to 35W.
  • 2 Hide
    hannibal , July 15, 2014 11:54 AM
    Yep... Expensive memory in budget class computer is not smart idea ;-)
  • -2 Hide
    knowom , July 15, 2014 11:54 AM
    AMD still needs to get it's ducks in a row in terms of power efficiency because it's miles behind Intel in terms of clock for clock basis with the right hardware and know how Intel CPU's simply run at much better voltages for their clock for clock performance output.
  • 0 Hide
    Drejeck , July 15, 2014 12:06 PM
    APUs are really interesting products and they are really evolving into something completely unseen. The Xbox One experiment and the X360 with on board SRAM gave excellent results in performance/efficiency, plus, they never achieved to build fast L3 cache and the obvious workaround is a larger cache to avoid cache miss and larger bandwidth even if the client is not going to buy that expensive ram kit 2400 CL 9.
  • 3 Hide
    AMD Radeon , July 15, 2014 12:07 PM
    A good news from AMD :) 

    AMD + Radeon combo should have brought a real good product. We are waiting for a huge leap from the previous APU line up.

    Keep fight back and bring a good competition. It will benefit all of us, customers
  • -3 Hide
    razzb3d , July 15, 2014 4:10 PM
    I have a bad feeling about this. Sticking 64-256MB of on die memory, at say half CPU speed is pointless. It's too little for the GPU to use as a framebuffer and too slow for the CPU to use as L3 cache.

    A good ideea would be to slap 1GB to 2GB of GDDR5 in there over a 192-256 bit bus, so the GPU can use that as a framebuffer (think A10-8800K with built in R9 270) and the CPU part can use that for L3 cache. Another cool thing is that you should be able to start your PC with all DIMM slots empty since RAM is on-die.

    Why a 192 to 256 bit bus? Because it needs to be as fast as possible. If it's over a 128 bit bus like sistem memory, the CPU will not be able to use effectively it as L3 cache. Why GDDR5? Because the L3 cache should be as fast as the CPU, so it will not slow it down. Think FX 8350 - when you OC the northbridge to CPU speed (and with it the L3 cache) you notice significant performance improvement in demanding task (especially those FPU-related).
  • 0 Hide
    IInuyasha74 , July 15, 2014 4:16 PM
    I find it funny that CPUs have sort of come full circle. CPUs around the time of the original Pentium had cache on the motherboard in addition to the RAM, and then they had cache on the Slot A and Slot B units which had basically RAM connected to the CPU at half speed on the same board.

    They moved away from it in favor of faster smaller cache on the CPU itself, and now they are moving back to having RAM, probably running at half speed again, connected on the CPU but not on the CPU die itself to increase performance. Interesting how history is repeating itself.
  • 3 Hide
    InvalidError , July 15, 2014 6:37 PM
    Quote:
    I have a bad feeling about this. Sticking 64-256MB of on die memory, at say half CPU speed is pointless. It's too little for the GPU to use as a framebuffer and too slow for the CPU to use as L3 cache.

    128MB may not sound like much but it is large enough to cache most of the most frequently used textures for the IGP if you tune settings accordingly. Crystalwell has about half the latency of system RAM and 4X as much bandwidth (50GB/s read + 50GB/s write, which is on par with 6GT/s 128bits GDDR5 without GDDR's extra latency), which should also be fairly useful for multitasking by keeping most regularly accessed stuff evicted from the CPU caches from having to be fetched from system RAM again every time a context switch flushes stuff out.

    The frame buffer uses relatively little bandwidth compared to other more important things like the Z-buffer so it would be one of the last things that gets dumped in there when there is spare space.

    Today's high-end GPUs may have a seemingly impressive 6GB RAM but what is most of that RAM used for? Resource duplication across multiple channels for read bandwidth multiplication.
  • 2 Hide
    alextheblue , July 15, 2014 7:11 PM
    Quote:
    A good ideea would be to slap 1GB to 2GB of GDDR5 in there over a 192-256 bit bus, so the GPU can use that as a framebuffer (think A10-8800K with built in R9 270) and the CPU part can use that for L3 cache.
    You want the CPU to use GDDR5 as L3 cache? No. Just no. Caches need to have very low latency. GDDR5 had horrendously HIGH latency. Worse than regular ol' DDR3.

    What they're doing with stacked DRAM is the basically the best of both worlds. It's faster than falling back to main memory (bandwidth AND latency) so it can act as an L3 cache if necessary, but it will mainly benefit HSA and graphics. Invalid's post covers the graphics side of the equation pretty well.
  • 0 Hide
    WINTERLORD , July 15, 2014 9:02 PM
    I wonder what they mean by much higher performance. Is married law comming about or will it be the same 15-30%increase every year like usual?
    update I ment murfies law*
  • 0 Hide
    oj88 , July 15, 2014 11:09 PM
    Quote:
    AMD needs to make some ~47W TDP mobile APUs, not just up to 35W.


    Just released 35W FX-7600P Kaveri is pretty good for gaming laptop. Does anyone know which laptop manufacturer is making 17.3" with it?
  • 0 Hide
    falchard , July 16, 2014 12:24 AM
    Nice to see AMD utilizing tech they developed for Microsoft. 128MB is not much, but its big enough to hold a 4k res buffer. Obviously usage depends on the game. If a game is about the visuals and needs 8 GB GDDR5 with 2048 Stream Processors at 1 Ghz each, that's a different beast to tackle. But if its a strategy game that has a lot of AI calls or used to process Physics, it could be quite something.
  • 0 Hide
    memadmax , July 16, 2014 2:47 AM
    Back in 95' I had the idea to stack CPU's for multi-core processing....

    I should've patented it...

    ^.^
Display more comments