The Pipeline Continued
The next thing we want to find out about is the branch prediction of the two contestants. In the block diagrams of the two CPUs you'll find that Athlon has a BTB (branch target buffer) with no less than 2048 entries, which means that Athlon can store 2048 different branching addresses. The BHT (branch history table) can store 4096 entries. This stands against Pentium III's Dynamic Branch Predictor with only 512 entries. AMD claims that Athlon makes a correct branch prediction with a probability of 95%, which is very high. Intel's Pentium III is estimated to have a probability of 90-92% for correct branch predictions.
Whilst talking about buffers and jumps and predictions we shouldn't forget another nice way for saving execution time, the Return Stack once introduced by Cyrix several years ago. Whoever knows a little bit about machine code or Assembler programming, will certainly remember that each time a function, procedure or other subroutine is called, the program address counter gets pushed onto the stack. Once the procedure or function is finished, the processor pulls the program address from the stack and returns to where it came from. These stack operations may be a nice thing, since they don't require special CPU registers, but they are always very slow and should thus be avoided. The 'Return Stack' is a special storage area inside the processor, which is accessed very quickly (the normal stack is found in main memory). By using the special return stack, the start and finish of a subroutine can be sped up quite nicely. Athlon is equipped with a whopping 12 entry return stack, Pentium-III's return stack is not documented, but it's probably less than half of Athlon's.
Summarizing the length of Athlon's integer and floating point pipeline with its excellent branch prediction unit and the 12-entry return stack earns it performance advantage point No.3.