
We already mentioned the key milestones that Intel set for the development of its next-generation Micro Architecture: a great number of instructions per clock cycle and record-setting energy efficiency (measured in energy per instruction). There are three processor designs that were derived from the same dual-core architecture: Conroe for the desktop, Merom for mobiles and Woodcrest for servers. Everything will be produced with 65-nm process technology. While the three are technically almost identical, there are certain features and characteristics that will be enabled for certain segments only. High clock speeds is something we will only see in the high-end desktop and maybe the server space. For all other applications, clock speed independent efficiency was the primary goal. This will be achieved by increasing the pipeline throughput and bandwidth.
The new micro architecture is now called Core Micro Architecture and is characterized by five key features: Wide Dynamic Execution, Advanced Digital Media Boost, Advanced Smart Cache, Smart Memory Access and Intelligent Power Capability.
Core Micro Architecture is an out-of-order design with which individual instructions are scheduled and staggered in a 14-stage pipeline. In order to increase instruction efficiency, Intel focused on improving the flexible instruction execution. While that sounds easy, it conflicts with the requirements of IA machines to have a clean memory ordering for the sake of adhering to program semantics. One easy example is that store operations need to be completed prior to loading data, because you would want to access the current (latest) dataset.
Executing more instructions at the same time was also achieved within the three ALUs (Arithmetical Logical Unit), which can process SSE instructions in a single cycle (128 bit wide SSE). In addition to that, L2 cache improvements, thanks to the shared design as well as new prefetchers that work on the basis of memory disambiguation (prefetch data that is not going to be modified by other queued instructions), help to feed the pipeline more efficiently.
Critics might want to compare the Core architecture to the Pentium III now. However, Intel obviously built something completely new, because Core features inline decoding, which wasn't the case with P3. Also, there are 3 ALUs, while the P3 had one only (two in NetBurst). Lastly, a trace cache has also been eliminated.




