My no. 1 question, leaving aside the fact that zram2 can or can't be fast enough for today's cpus, is: Will zram gen2 be able to offset penryn's power consumption if used for L2 and L3 Cache by AMD (or posibly on L3 only) ??
Why I ask this without much worry about zram speed is the fact that in server space power consumtion and core numbers may give better value than raw CPU speed in hz.
Z-RAM Gen2 stores significantly more charge in the memory bitcell. The additional charge provides an order-of-magnitude improvement in both cell margin (the difference between a "1" and a "0"), and in data retention time. This higher margin also provides much faster data read and write times, yet reduces power consumption significantly. As a result, Z-RAM Gen2
substantially broadens the range of applications able to take advantage of ZRAM's density: these include highperformance applications requiring greater than 1GHz operation (when pipelined); and low-power applications requiring longbattery life.
Z-RAM was already the densest memory technology in the world; Z-RAM Gen2, however, is now more than twice as fast and cuts memory read power by 75% and memory write power by an impressive 90%. It also exhibits extreme flexibility, since the technology can be 'tuned' for a very wide range of speed/power operating points, from ultra-low power to very high performance. Z-RAM Gen2 also exhibits an ultra-high density greater than 5Mbits per mm2 at 65nm, and greater than 10Mbits per mm2 at 45nm. This is effectively double the density of an eDRAM and up to six times the density of an SRAM. Other salient features include random array access greater than 400MHz, and very low active power consumption of under 10µW/MHz.
Don't know if this was posted... But looks like AMD is actually testing ZRAM2 on its chips now
Sorta old news --- but the Gen2 is still being talked about in the 400 Mhz range....
Frankly, knowing AMD's endeavor into embedded and consumer electronics, I can see them leveraging the SOI/Zram for some very application specific products off the bat --- but for L3 cache, I think this is still too slow....
But I could be wrong.
well it comes down to is 15 megs of 400 MHZ L3 zram cache faster than having to hop to main ram after u pass the 3 meg limit on the k10
However, there is the latency piece --- even though it would run at the speed of the core, signal propogation, setting the bit, etc. would take more than one tick or tock... this is latency. The major portion of latency is the signal propogation to the bit cell --- this is why the general rule of thumb is the larger the cache the larger the latency. The reason?? Because the fastest you can accurately ensure you get the data from the cache will depend on the time it takes to get the last bit of data physically located the farthest from the core. So a 2 meg cache may only take up 30% of the die area, but a 4 meg cache may take up 50% of the die area. The larger 4 meg cache will have transistors and bits physically farther from the core --- hence the delay getting the signal from the bit to the core will be longer --- higher latency.
There is more to it though.
Latency has an impact on performance only for sparse memory accesses.
For access to adjacent memory locations, SRAM (and even DRAM) supports burst modes where you can get a pipelined access which yields a throughput of 1 data element per clock, after locating the first element of the burst.
Since cache access is always performed per block (or cache line), the burst mode can be applied all the time and it hides most of the negative effects of (high) latency.
For example, let's say that you want transfer a block of 128 bytes from the L3 to the L2 cache, on a 128bit bus (16 bytes) and that the L3 cache has a latency of 20 clocks.
So with SRAM and a pipeline burst mode, you get the first 16 bytes after 20 clocks, but you get the rest in chunks of 16 bytes per clock.
Overall, you'd transfer your whole cache block in (1x20 + 7x1) = 27 clocks.
If your cache is running at, say, 3GHz, that's about 8.91ns.
Now let's take our ZRAM running at 400MHz, and let's suppose that the latency is only 1 clock... still, to transfer our cache block, we'd need 8 clocks, but with a clock period of 2.5ns, that would require a whopping (8x2.5= ) 20ns to transfer, more than double of what we get with high frequency SRAM.
Theoretically, you could just use a wider cache interface, or organize it in multiple banks to do some kind of interleaving, so yes you could get a higher throughput even from Z-Ram.
But since i'm not a "semiconductor guru", i have no idea about the feasibility of such a design.
I know ZRAM is much more dense and energy effective than conventional cache (If my memory is not tricking me by a large amount, I think it's something like 32MB on about 50mm^2) don't know if you can do this with eDRAM.