Report: AMD Carrizo APUs To Get Stacked On-Die Memory
Tags:
-
GPUs
-
CPUs
-
AMD
Last response: in News comments
Is AMD's HSA arriving with force in the upcoming Carrizo APUs?
Report: AMD Carrizo APUs To Get Stacked On-Die Memory : Read more
Report: AMD Carrizo APUs To Get Stacked On-Die Memory : Read more
More about : report amd carrizo apus stacked die memory
SteelCity1981
July 15, 2014 7:39:45 AM
Related resources
ykki
July 15, 2014 7:52:49 AM
Menigmand
July 15, 2014 8:07:13 AM
PEJUman
July 15, 2014 8:13:39 AM
Most CPU benches for intel does not seem to scale with memory bandwidth (at least when compared to AMD APUs). I think AMD processors would benefit a lot more from on package DRAM (ala crystalwell); who knows, maybe this will allow them to finally catch-up to intel again. We really need AMD.
Intel have a tendency to coast when allowed. It was athlon that drives them into the CORE microarch, and abandon netburst. Now we have the sandy-ivy-haswell coast again...
Intel have a tendency to coast when allowed. It was athlon that drives them into the CORE microarch, and abandon netburst. Now we have the sandy-ivy-haswell coast again...
Score
12
PEJUman said:
Most CPU benches for intel does not seem to scale with memory bandwidthBut their IGP does.
Most desktop applications require a balance between bandwidth, latency and processing power. Once you pass the typical bandwidth and latency requirements for typical workloads for a given architecture, benefits drop off sharply. Intel simply happens to be a few miles ahead of AMD at decoupling their CPUs from memory latency and bandwidth under most circumstances.
GPUs on the other hand are almost entirely dictated by bandwidth since almost every computational challenge GPUs face can be made easier and faster with more, faster memory to cache results and duplicate frequently accessed items across memory channels to accommodate more concurrent accesses.
Score
0
danwat1234
July 15, 2014 10:17:09 AM
hannibal
July 15, 2014 11:54:13 AM
knowom
July 15, 2014 11:54:42 AM
Drejeck
July 15, 2014 12:06:24 PM
APUs are really interesting products and they are really evolving into something completely unseen. The Xbox One experiment and the X360 with on board SRAM gave excellent results in performance/efficiency, plus, they never achieved to build fast L3 cache and the obvious workaround is a larger cache to avoid cache miss and larger bandwidth even if the client is not going to buy that expensive ram kit 2400 CL 9.
Score
0
razzb3d
July 15, 2014 4:10:16 PM
I have a bad feeling about this. Sticking 64-256MB of on die memory, at say half CPU speed is pointless. It's too little for the GPU to use as a framebuffer and too slow for the CPU to use as L3 cache.
A good ideea would be to slap 1GB to 2GB of GDDR5 in there over a 192-256 bit bus, so the GPU can use that as a framebuffer (think A10-8800K with built in R9 270) and the CPU part can use that for L3 cache. Another cool thing is that you should be able to start your PC with all DIMM slots empty since RAM is on-die.
Why a 192 to 256 bit bus? Because it needs to be as fast as possible. If it's over a 128 bit bus like sistem memory, the CPU will not be able to use effectively it as L3 cache. Why GDDR5? Because the L3 cache should be as fast as the CPU, so it will not slow it down. Think FX 8350 - when you OC the northbridge to CPU speed (and with it the L3 cache) you notice significant performance improvement in demanding task (especially those FPU-related).
A good ideea would be to slap 1GB to 2GB of GDDR5 in there over a 192-256 bit bus, so the GPU can use that as a framebuffer (think A10-8800K with built in R9 270) and the CPU part can use that for L3 cache. Another cool thing is that you should be able to start your PC with all DIMM slots empty since RAM is on-die.
Why a 192 to 256 bit bus? Because it needs to be as fast as possible. If it's over a 128 bit bus like sistem memory, the CPU will not be able to use effectively it as L3 cache. Why GDDR5? Because the L3 cache should be as fast as the CPU, so it will not slow it down. Think FX 8350 - when you OC the northbridge to CPU speed (and with it the L3 cache) you notice significant performance improvement in demanding task (especially those FPU-related).
Score
-3
I find it funny that CPUs have sort of come full circle. CPUs around the time of the original Pentium had cache on the motherboard in addition to the RAM, and then they had cache on the Slot A and Slot B units which had basically RAM connected to the CPU at half speed on the same board.
They moved away from it in favor of faster smaller cache on the CPU itself, and now they are moving back to having RAM, probably running at half speed again, connected on the CPU but not on the CPU die itself to increase performance. Interesting how history is repeating itself.
They moved away from it in favor of faster smaller cache on the CPU itself, and now they are moving back to having RAM, probably running at half speed again, connected on the CPU but not on the CPU die itself to increase performance. Interesting how history is repeating itself.
Score
0
razzb3d said:
I have a bad feeling about this. Sticking 64-256MB of on die memory, at say half CPU speed is pointless. It's too little for the GPU to use as a framebuffer and too slow for the CPU to use as L3 cache.128MB may not sound like much but it is large enough to cache most of the most frequently used textures for the IGP if you tune settings accordingly. Crystalwell has about half the latency of system RAM and 4X as much bandwidth (50GB/s read + 50GB/s write, which is on par with 6GT/s 128bits GDDR5 without GDDR's extra latency), which should also be fairly useful for multitasking by keeping most regularly accessed stuff evicted from the CPU caches from having to be fetched from system RAM again every time a context switch flushes stuff out.
The frame buffer uses relatively little bandwidth compared to other more important things like the Z-buffer so it would be one of the last things that gets dumped in there when there is spare space.
Today's high-end GPUs may have a seemingly impressive 6GB RAM but what is most of that RAM used for? Resource duplication across multiple channels for read bandwidth multiplication.
Score
3
alextheblue
July 15, 2014 7:11:55 PM
Quote:
A good ideea would be to slap 1GB to 2GB of GDDR5 in there over a 192-256 bit bus, so the GPU can use that as a framebuffer (think A10-8800K with built in R9 270) and the CPU part can use that for L3 cache.What they're doing with stacked DRAM is the basically the best of both worlds. It's faster than falling back to main memory (bandwidth AND latency) so it can act as an L3 cache if necessary, but it will mainly benefit HSA and graphics. Invalid's post covers the graphics side of the equation pretty well.
Score
2
WINTERLORD
July 15, 2014 9:02:10 PM
oj88
July 15, 2014 11:09:48 PM
falchard
July 16, 2014 12:24:08 AM
Nice to see AMD utilizing tech they developed for Microsoft. 128MB is not much, but its big enough to hold a 4k res buffer. Obviously usage depends on the game. If a game is about the visuals and needs 8 GB GDDR5 with 2048 Stream Processors at 1 Ghz each, that's a different beast to tackle. But if its a strategy game that has a lot of AI calls or used to process Physics, it could be quite something.
Score
0
Drejeck
July 16, 2014 4:24:13 AM
Dunno why I can't quote Razzb3d. Anyway, GDDR5 latency is tipically 15 CAS, DDR3 is tipically 9. Now speaking of real time access they are quite similar but the higher bandwidth favors larger chunks of data, which a CPU doesn't require. AMD has never built an L3 cache fast enough but giving the caching algorithm more memory means less cache miss. I'd prefer static ram for it's higher performance. A 32/64 MB would be sufficient with an efficent caching algorithm, but it costs more and probably has a larger power consumption. GPUs are really latency tolerant. The bits are not important for CPUs. Maybe you can't even imagine how fast is an Intel L3 cache, or a L1. Those caches are for repetitive tasks not to store data, frame buffering or system memory (even if they are going closer to a SoC)
Score
0
Quote:
APUs are really interesting products and they are really evolving into something completely unseen. The Xbox One experiment and the X360 with on board SRAM gave excellent results in performance/efficiency, plus, they never achieved to build fast L3 cache and the obvious workaround is a larger cache to avoid cache miss and larger bandwidth even if the client is not going to buy that expensive ram kit 2400 CL 9.I wouldn't say unseen considering the concept of a shared CPU and GPU with common memory was the goal all along with a very good idea of what the benefits are. Maybe unseen to the average person, not those in the computer industry for 20+ years.
Also, sticking a fast buffer next to a processor isn't always the best solution to solve the issue of the main memory being too slow. The small, fast buffer will simply fill up causing a bottleneck which in terms of gaming would be massive stutter. Game developers aren't too happy with the XBOX ONE because of this and they have to spend expensive man-hours trying to avoid this, unlike the PS4.
Score
0
!