So, the front end hasn’t been profoundly overhauled; neither has the back end. It has exactly the same execution units as the most recent Core processors, but here again the engineers have worked on using them more efficiently.
With Nehalem, Hyper-Threading makes its great comeback. Introduced with the Northwood version of Intel’s NetBurst architecture, Hyper-Threading—also known outside the world of Intel as Simultaneous Multi-Threading (SMT)—is a means of exploiting thread parallelism to improve the use of a core’s execution units, making the core appear to be two cores at the application level.
In order to use parallel threads, certain resources—such as registers—must be duplicated. Other resources are shared by the two threads, and that includes all the out-of-order execution logic (the instruction reorder buffer, the execution units, and cache memory). A simple observation led to the introduction of SMT: the “wider” (meaning more execution units) and “deeper” (meaning more pipeline stages) processors become, the harder it is to extract enough parallelism to use all the execution units at each cycle. Where the Pentium 4 was very deep, with a pipeline having more than 20 stages, Nehalem is very wide. It has six execution units capable of executing three memory operations and three calculation operations. If the execution engine can’t find sufficient parallelism of instructions to take advantage of them all, “bubbles”—lost cycles—occur in the pipeline.
To remedy that situation, SMT looks for instruction parallelism in two threads instead of just one, with the goal of leaving as few units unused as possible. This approach can be extremely effective when the two threads are executing tasks that are highly separate. On the other hand, two threads involving intensive calculation, for example, will only increase the pressure on the same calculating units, putting them in competition with each other for access to the cache. It goes without saying that SMT is of no interest in this type of situation, and can even negatively impact performance.