Mountain View (CA) - When we talk about processor performance, most of the performance typically comes from the depth of the pipeline, the number of cores, the size and the type of the cache or the clock speed. However, we rarely hear about the way how a processor actually communicates between these components, and such technologies usually do not make it into marketing brochures. But Intel has an idea that could change this scenario: the company plays with the thought of integrating DRAM into the CPU.
One of the most important goals when designing a new chip is to keep the available processing units as busy as possible. One way to achieve this goal is to feed enough data into the cores as quickly as possible through improved inter-core communication. The progress from one processor generation to another is obvious: For example, while the 65nm Kentsfield quad-core provided a bandwidth of about 8GB/sec. to 9GB/sec., the 45nm Harpertown chip offers 18GB/sec. to 20GB/secs.
At last week’s Research@Intel Day event, we spotted a technology that holds the potential to multiply the available bandwidth within a processor. In our opinion, this technology is actually the most impressive research we saw on that day. The reason why you may not have heard about this technology is because Intel did not specifically promote it and did not even mention it in its "Demo Cheat-Sheets" given out to journalists and analysts.
A small research team inside Intel succeeded in reducing the size of DRAM cells to only two transistors and completely removing the capacitors. Conceivably, these two achievements could change the way how we will use DRAM in the future. For example, expensive and complex SRAM (static RAM) cells could be entirely removed from a CPU and replaced with DRAM.
In contrast to Intel’s two-transistor ("2T") DRAM bit cell, SRAM usually requires six transistors per stored bit. Of course, there is also 1T-SRAM (which uses only one cell), but this type is very rare (and used for example in Nintendo game consoles such as the GameCube and Wii).
SRAM has some advantages over DRAM, including lower power consumption, higher speed and no need to be refreshed. However, SRAM is known to be much more expensive than DRAM and not as dense.
Intel said that it was able to fine tune its DRAM design and hit a physical clock of 2GHz using a 65nm manufacturing process. The resulting 2T-DRAM offers a stunning bandwidth of 128GB/s. If Intel is successful taking the clock speed up to the level of its QX9770/9775 processors, the bandwidth would climb to 204.8GB/sec. In other words, Intel would gain more than a 10x improvement over its current L2 cache technology.
Intel scientists believe they will be able to use 45nm High-k technology to match and exceed Intel’s existing clock speed design. And as a next step, DRAM cells are planned to be stacked into Intel’s Terascale processors. The Terascale processor itself may be seeing a migration to a massive number of x86 mini-cores - which, sooner or later, may reveal the successor of architectures such as Larrabee and Itanium.
In case you are wondering: Yes, it looks like there will be an introduce of a CPU/GPU hybrid.
Seeing 32nm wafers at Intel’s Research Day was nice, but at the end of the day, 32nm is just another manufacturing process. DRAM on the processor is actually what would make the greatest difference in performance in our opinion.