Branch Predictors
The last improvement to the front end has to do with the branch predictors. The efficiency of branch prediction algorithms becomes crucial in architectures that need high levels of instruction parallelism. A branch breaks the parallelism because it necessitates waiting for the result of a preceding instruction before execution of the flow of instructions can be continued. Branch prediction determines whether or not a branch will be taken, and if it is, quickly determines the target address for continuing execution. No complicated techniques are needed to do this; all that’s needed is an array of branches—the Branch Target Buffer (BTB)—that stores the results of the branches as execution progresses (Taken or Not Taken and target address) and an algorithm for determining the result of the next branch.
Intel hasn’t provided details on the algorithm used for their new predictors, but it is known that they are now two-level predictors. the first level is unchanged from the Conroe architecture, but a new level with slower access that can store more branch history has been added. According to Intel, this configuration improves branch prediction for certain applications that use large volumes of code, such as databases—more evidence of Nehalem’s server orientation. Another improvement is to the Return Stack Buffer, which stores the return addresses of functions when they’re called. In certain cases this buffer can overflow, which could lead to faulty predictions. To limit that possibility, AMD increased its size to 24 entries, whereas with the Nehalem Intel has introduced a renaming system for this buffer.