The Instruction Control Unit
As you can see from the processor block diagrams, the next stage, once an instruction is decoded, is in Athlon's case the Instruction Control Unit. This Unit can hold up to 72 MOps (because a MOp can equal an x86 instruction, this means Athlon can have up to 72 in-flight instructions) before they're dispatched to the schedulers. This is a lot more than the 20 µ-Ops (if you take an average of say 1.5 uOPs per instructions, then the P6 archtecture has approximately 13 in-flight instructions) that can be held in Intel's Reservation Station, which is already the next advantage of Athlon over PIII, but let's not even count that. The next step is where it gets really interesting.
The Execution Ports
You certainly agree that the most important thing a microprocessor has to do is to actually execute the instructions of the software it's running. Thus it's about time that we are getting to this stage. You cannot really see it in the block diagrams, but Pentium III has 11 (+1) parallel execution units, Athlon has even more. Those units are executing the OPs, and since it's so many in parallel, you can imagine why we are talking of 'out-of-order' execution here. Executing one OP after another would obviously not make any use of parallel execution units. To make sure that the out-of-order execution is actually working, Intel is using the 'Renamer & Allocator' as well as the 'Reorder-Unit'. The 'Integer/FP Renamer/Allocator' is found before the Reservation Station, and as the name already says it, this unit is responsible for integer as well as FP and multimedia OPs. Athlon does this work a bit more sophisticated. The units that take care of the out-of-order execution are the Integer Scheduler and the FP Scheduler, both able to hold a quite impressive number of OPs (18/36).
Athlon's Integer Execution Path
Athlon's Floating Point and Multimedia Execution Path