Nehalem: An Overview
It’s hard to talk about an overview of an architecture like Nehalem, which is fundamentally designed to be modular. The Intel engineers wanted to design a set of building blocks that can be assembled like Legos to create the various versions of the architecture.
It is possible, though, to take a look at the flagship of the new architecture—the very high-end version that will be used in servers and high-performance workstations. At first glance, the specs will likely remind you of the Barcelona (K10) architecture from AMD. It is natively quad-core and has three levels of cache, a built-in memory controller, and a high-performance system of point-to-point interconnections for communicating with peripherals and other CPUs in multiprocessor configurations. This proves that it wasn’t AMD’s technological choices that were bad, but simply its implementation, which hasn’t scaled well enough through its current design.
But Intel has done more than just revise its architecture by taking inspiration from their competitor’s innovations. With a budget of more than 700 million transistors (731 million, to be exact), the engineers were able to greatly improve certain characteristics of the execution core while adding new functionality. For example, simultaneous multi-threading (SMT), which had already appeared with the Pentium 4 "Northwood" under the name Hyper-Threading has made its comeback. Associated with four physical cores, certain versions of Nehalem that incorporate two dies in a single package will be capable of executing up to 16 threads simultaneously. While this change appears simple at first glance, as we’ll see later on, it has a wide impact at several levels of the pipeline; many buffers need to be re-dimensioned so that this mode of operation doesn’t impact performance. As has been the case with each new architecture for several years now, Intel has also added new SSE instructions to Nehalem. The architecture supports SSE 4.2, components of which appear to be borrowed from AMD’s K10 micro-architecture. .
Now that you know the broad outlines of the new architecture, it’s time to take a more detailed look, starting with the front end of the pipeline—the part that’s in charge of reading instructions in memory and preparing them for execution.