Soo sumarizing the total amount of data from you the RAM is the problem...
Not really.
AMD has IP for stacked on-die "T-RAM" but the bottom line is the graphics function is secondary to the "SIMD Engine" capability being developed for the stream processors.
Kaveri doubled the number of 256-bit 'Fusion Control Links" between CPU cores and RAMs. As part of HSA AMD has developed 'Unified Memory Addressing' equally accessible by the CPU cores and SIMD Engine 'CUs' (There are 64 Radeon stream processors in each CU). This actually ties in well with the serial nature of DDR4 and the further advancement of the AMD 'IOMMU' (Look it up :lol
In order to make the Unified Memory Addressing function at a high level, they added a 256-bit 'Radeon Control Link' whereby the SIMD CUs sniff the L2 cache of the Steamroller cores for 'coherency'
It's safe to assume as more CUs are added to the die a 2nd 256-bit Radeon Control Link will be added and the question becomes ... whether it will be used for CPU core cache coherency or not. The CUs could have their own on-die T-RAM to accelerate SIMD instructions without have to bother with the IMC or IOMMU ... OR ....
DDR4 does away with the concept of memory 'channels.' The CUs conceivably could have as much address space "as required" by the task at hand --- 512MB .... 1GB .... 2GB ...