IDF Spring 2006: Will Intel's Core Architecture Close the Technology Gap?

Core To The Rescue

We already mentioned the key milestones that Intel set for the development of its next-generation Micro Architecture: a great number of instructions per clock cycle and record-setting energy efficiency (measured in energy per instruction). There are three processor designs that were derived from the same dual-core architecture: Conroe for the desktop, Merom for mobiles and Woodcrest for servers. Everything will be produced with 65-nm process technology. While the three are technically almost identical, there are certain features and characteristics that will be enabled for certain segments only. High clock speeds is something we will only see in the high-end desktop and maybe the server space. For all other applications, clock speed independent efficiency was the primary goal. This will be achieved by increasing the pipeline throughput and bandwidth.

The new micro architecture is now called Core Micro Architecture and is characterized by five key features: Wide Dynamic Execution, Advanced Digital Media Boost, Advanced Smart Cache, Smart Memory Access and Intelligent Power Capability.

Core Micro Architecture is an out-of-order design with which individual instructions are scheduled and staggered in a 14-stage pipeline. In order to increase instruction efficiency, Intel focused on improving the flexible instruction execution. While that sounds easy, it conflicts with the requirements of IA machines to have a clean memory ordering for the sake of adhering to program semantics. One easy example is that store operations need to be completed prior to loading data, because you would want to access the current (latest) dataset.

Executing more instructions at the same time was also achieved within the three ALUs (Arithmetical Logical Unit), which can process SSE instructions in a single cycle (128 bit wide SSE). In addition to that, L2 cache improvements, thanks to the shared design as well as new prefetchers that work on the basis of memory disambiguation (prefetch data that is not going to be modified by other queued instructions), help to feed the pipeline more efficiently.

Critics might want to compare the Core architecture to the Pentium III now. However, Intel obviously built something completely new, because Core features inline decoding, which wasn't the case with P3. Also, there are 3 ALUs, while the P3 had one only (two in NetBurst). Lastly, a trace cache has also been eliminated.

Intel built a radically new processor design from scratch, but took lots of ingredients from what is has learned with the Pentium M (Banias, Dothan), all for the sake of improving instruction level performance while keeping thermals low. "We had to go back and carefully design a balanced machine," Intel mobility Vice President Mooly Eden said.

Merom will be available for Socket 479. Basically, current Napa systems are capable of running Merom processors by updating the BIOS only. Intel prefers to call this a 'Napa Refresh'.

Conroe will be deployed for Socket 775. It will require either the 975X chipset (for gaming) or the upcoming 965 chipset (for the digital home and office). Again, a BIOS update should do it unless you want to use the upcoming Extreme Edition, which is expected to offer a FSB1333 (667 MHz) system speed.

Woodcrest will be known as Xeon and will run on Bensley platforms, which are going to power Dempsey processors at up to 3.73 GHz starting next month.