32-Core Processors: Intel Reaches For (The) Sun

Scaling Bottlenecks

While multiple processor cores on a single die communicate with each other directly, the approach of building multi core processors by combining distinct processor dies creates the necessity to communicate via the processor interface, which, in case of desktop and server mainstream armada, is the Front Side Bus. This has been criticized as a huge bottleneck for multi core configurations ever since Intel released its first dual core Pentium D 800 aka Smithfield. As one core accesses data located in another processor's L1 or L2 cache it must use the Front Side Bus, which eats away at the available bus bandwidth.

For this very reason, the Core 2 generation implements a large, unified L2 cache, which means it is shared by two cores. However, as soon as you pack two dual core dies onto a physical processor to build quad cores, the FSB bottleneck issue is back again - and it is probably even worse, as there are more cores fighting over more data in larger L2 caches. Intel's countermeasure consists of a bus clock speed upgrade. The server platform already runs 333 MHz (FSB1333 quad-pumped); the desktop platform will probably receive the upgrade by the time the first quad core product hits the market.

The second bottleneck is the system's main memory. It is not a part of the processor, but resides in the chipset northbridge on the motherboard. Again the Front Side Bus is used to interconnect the processor(s) with the motherboard core logic, which has two or more cores fight over memory access. AMD integrated the memory controller into its processors as early as 2003, which minimizes the memory access path and improves performance due to faster operation at full CPU core clock speed. The real advantage of on-die memory controllers becomes obvious in multiprocessor environments, where each CPU can access its own memory at maximum bandwidth.

There is the issue of memory coherency, but e.g. the Opteron is smart enough to deal with it at up to four processors. We believe there are two reasons for Intel not integrating the memory controller. First of all, nobody embraces changes as long as they are not required for the business. And second, there is a chipset business that Intel may want to defend. Moving the memory controller into the processor would eliminate platform selling points: Compatibility, continuity and features that are exclusively available to Intel platforms (think of I/O AT).

What If The Memory Controller Were Integrated?

At some point the memory controller simply has to be relocated into the processor due to the reasons described above. Adding bigger cache memories certainly helps, but if you have four or much more processor cores working on your applications you need to make sure that they don't run out of data - who needs multi-lane freeways if there aren't sufficient entries and exits to access it?

In addition, 45 and 32 nm manufacturing processes will allow the RAM access logic to become a part of the processor die at very little additional cost. So, expect memory controllers to move into Intel processors in the future. I'm sure that some of you feel compelled to refer to AMD and its memory controller integration that happened as early as 2003. Well, I have to ask you to read on, as there is actually a concept behind this move; a concept that needs some more explanation.