The Execution Ports Continued
Now to get the actual 'work done', the OPs have to be dispatched into the execution ports. Here's where Athlon really shines. Pentium III has only got 5 execution ports (two of which are dedicated to memory stores), Athlon comes with no less than 9. This means that Pentium III can only dispatch 5 OPs per clock, Athlon can dispatch 9 at the same time. Let's get back to the execution units a bit. Pentium III has 11 of it and three units represent three of the five ports, the two address generation units (load/store address) and the store data unit. Then there is execution port 0, including the IEU (integer execution unit) 0 and the Integer Shifter, MMX execution unit 0, SSE Multiplier, FADD (floating-point add), FMUL (floating-point multiply) and FDIV (floating-point division), the latter is not pipelined. Execution port 1 hosts IEU 1, MMX 1 and the main SSE execution unit. Those execution units can all more or less work in parallel, and most of them are pipelined. It still doesn't change the fact that Athlon can dispatch almost double as many OPs at the same time, because it's got those 9 instead of only 5 ports. Athlon's execution units are the following. There are three IEUs, each of them has its own port, so that three integer OPs can be executed at the same time. Athlon comes with three parallel AGUs (address generation units) as well, which also have their own ports. Then there are the three FP/multimedia-ports, one used for FSTORE (storing floating-point data), one used for FADD, MMX 0 and 3DNow! 0 and another one used for FMUL, MMX 1 and 3DNow! 1.
Summarizing we can say that this is definitely performance advantage No. 2, the parallelism inside Athlon is definitely ahead of Pentium III.
Most of that stuff and a bit more you'll find summarized on this AMD-slide .