IDF Spring 2006: Will Intel's Core Architecture Close the Technology Gap?

Wide Dynamic Execution

Wide Dynamic Execution summarizes the improvements that Intel made to execution width (four parallel processes rather than three) and to the efficiency of micro ops processing.

As you can see on the image above, the greater execution width of four (partially even five) is maintained throughout the whole execution path, which represents an internal bandwidth increase. In other words the processor can fetch, dispatch, execute and return four instructions simultaneously.

In addition to that, the Core architecture supports the techniques that the Pentium M applies to reduce the total number of micro-ops: Micro Ops are broken down x86 instructions that the processor understands. Two of these can be fused into another micro op in order to save time (and energy). According to Intel, roughly every 10th instruction can be merged with another one using Micro Ops Fusion.

The idea of fusing micro ops has also been applied to the instruction level (instruction level parallelism) by allowing for two independent instructions (e.g. a compare and a jump) to be merged for decoding and execution. This feature, which is called Macro Ops Fusion, can even be carried into the ALUs: These allow for overall single cycle instruction execution, whether that is a macro op that consists of two instructions or generic instructions.

Both fusion mechanisms together can help to increase the efficiency of each core considerably. Think about it as some sort of instruction or micro ops level.