Upgrading And Repairing PCs 21st Edition: Processor Features

Dynamic Execution

First used in the P6 (or sixth-generation) processors, dynamic execution enables the processor to execute more instructions in parallel, so tasks are completed more quickly. This technology innovation is composed of three main elements:

  • Multiple branch prediction—Predicts the flow of the program through several branches
  • Dataflow analysis—Schedules instructions to be executed when ready, independent of their order in the original program
  • Speculative execution—Increases the rate of execution by looking ahead of the program counter and executing instructions that are likely to be necessary

Branch Prediction

Branch prediction is a feature formerly found only in high-end mainframe processors. It enables the processor to keep the instruction pipeline full while running at a high rate of speed. A special fetch/decode unit in the processor uses a highly optimized branch-prediction algorithm to predict the direction and outcome of the instructions being executed through multiple levels of branches, calls, and returns. It is similar to a chess player working out multiple strategies in advance of game play by predicting the opponent’s strategy several moves into the future. By predicting the instruction outcome in advance, the instructions can be executed with no waiting.

Dataflow Analysis

Dataflow analysis studies the flow of data through the processor to detect any opportunities for out-of-order instruction execution. A special dispatch/execute unit in the processor monitors many instructions and can execute these instructions in an order that optimizes the use of the multiple superscalar execution units. The resulting out-of-order execution of instructions can keep the execution units busy even when cache misses and other data-dependent instructions might otherwise hold things up.

Speculative Execution

Speculative execution is the processor’s capability to execute instructions in advance of the actual program counter. The processor’s dispatch/execute unit uses dataflow analysis to execute all available instructions in the instruction pool and store the results in temporary registers. A retirement unit then searches the instruction pool for completed instructions that are no longer data dependent on other instructions to run or which have unresolved branch predictions. If any such completed instructions are found, the retirement unit or the appropriate standard Intel architecture commits the results to memory in the order they were originally issued. They are then retired from the pool.

Dynamic execution essentially removes the constraint and dependency on linear instruction sequencing. By promoting out-of-order instruction execution, it can keep the instruction units working rather than waiting for data from memory. Even though instructions can be predicted and executed out of order, the results are committed in the original order so they don’t disrupt or change program flow. This enables the P6 to run existing Intel architecture software exactly as the P5 (Pentium) and previous processors did—just a whole lot more quickly!

This thread is closed for comments
    Your comment
  • k1114
    Keep it coming.
  • renzhe
    9412 pins; imagine that.
  • ta152h
    Ugggh, got to page two before being disgusted this time. This author is back to writing fiction.

    The Pentium (5th generation, in case the author didn't know, thus the "Pent"), DID execute x86 instructions. It was the Pentium Pro that didn't. That was the sixth generation.

    CISC and RISC are not arbitary terms, and RISC is better when you have a lot of memory, that's why Intel and AMD use it for x86. They can't execute x86 instructions effectively, so they break it down to RISC type operations, and then execute it. They pay the penalty of adding additional stages in the pipeline which slows down the processor (greater branch mispredict penalty), adds size, and uses power. If they are equal, why would anyone take this penalty?

    Being superscalar has nothing to do with being RISC or CISC. Admittedly, the terms aren't carved in stone, and the term can be misleading, as it's not necessarily the number of instructions that defines RISC. Even so, there are clear differences. RISC has fixed length instructions. CISC generally does not. RISC has much simpler memory addressing modes. The main difference is, RISC does not have microcoding to execute instructions - everything is done in hardware. Obviously, this strongly implies much simpler, easier to execute instructions, which make it superior today. However, code density is less for RISC, and that was very important in the 70s and early 80s when memory was not so large. Even now, better density means better performance, since you'll hit the faster caches more often.

    This article is also wrong about 3D Now! It was not introduced as an alternative to SSE, SSE was introduced as an alternative to 3D Now!, which predated SSE. In reality, 3D Now! was released because the largest difference between the K6 and Intel processors was floating point. Games, or other software that could use 3D Now!, rather than relying entirely on x87 instructions, could show marked performance improvement for the K6-2. It was relatively small to implement, and in the correct workloads could show dramatic improvements. But, of course, almost no one used it.

    The remarks about the dual bus are inaccurate. The reason was that motherboard bus speeds were not able to keep up with microprocessors speeds (starting with the 486DX2). Intel suffered the much slower bus speed to the L2 cache on the Pentium and Pentium MMX, but moved the L2 cache on the same processor package (but not on the same die) with the Pentium Pro. The purpose of having the separate buses was that one could access the L2 cache at a much higher speed; it wasn't limited to the 66 MHz bus speed of the motherboard. The Pentium Pro was never intended to be mainstream, and was too expensive, so Intel moved the L2 cache onto the Slot 1 cartridge, and ran it at half bus speed, which in any case was still much faster than the memory bus.

    That was the main reason they went to the two buses.

    That was as far as I bothered to read this. It's a pity people can't actually do fact checking when they write books, and make up weird stories that only have a passing resemblance to reality.

    And then act like someone winning this misinformation is lucky. Good grief, what a perverse world ...
  • Reynod
    ta152h sir you are correct.
  • spookyman
    Yes you are correct on the bus issue. VESA local bus was designed to overcome the limitations of the ISA bus.

    As for the reason Intel went with a slot design for the Pentium 2 was to prevent AMD from using it. You can patent and trademark a slot design.

    As for the Pentium Pro, it had issues from handling 16bit x86 instruction sets. The solution was to program around it. The was an inherent computational flaw with the Pentium Pro too.
  • Kraszmyl
    I don't think there is a single page that isn't piled with inaccurate or incomplete information.......this is perhaps the worst thing I've ever read on tomshardware and I don't see how you let it get published.
  • therogerwilco
    Kinda nice for generic info, was hoping for more explanation of some of the finer points of cpu architecture
  • Reynod
    Perhaps the most important thing to note from this is just how clever some of our users are ... so get into the forums and help out the n00bs with their problems guys !!

  • Sprongy
    Not to be anal but aren't all Core i3 processors, dual cores (2). Some have Hyper-Threading to make it like 4 cores. The last chart above should read Core i3 - 2 cores. Just saying...
  • ingtar33
    1464403 said:
    Not to be anal but aren't all Core i3 processors, dual cores (2). Some have Hyper-Threading to make it like 4 cores. The last chart above should read Core i3 - 2 cores. Just saying...

    not on mobile. some mobile i3s are single core, same with the mobile i5s... those are all dual core... with hyperthreading.

    there are even dual core i5s in haswell on the desktop. (they are the ones with a (t) after the number)
  • rolli59
    Well although it is full of minor misinformation it is a good insight for a reader that does not know much about it.
  • ezorb
    please stop putting this crap on the site, its better to post nothing on a slow Friday
  • Nintendo Maniac 64
    Llano is not based on Bulldozer but rather is based on a slightly improved K10 (typically dubbed "K10.5").
  • ronch79
    Do AMD processors also feature reprogrammable microcode? I'm using an FX-8350 and before it I was using a Phenom II X4 925 (unlocked X3 720).
  • turboflame
    Yeah, this wasn't particularly well researched. Quite a few minor mistakes, not to mention it reads like an Intel advertisement, with AMD's contribution to modern PCs being either downplayed or omitted entirely.
  • Geef
    After seeing that story they had up a couple days ago about HUBS where the person actually talked about what SWITCHES do, not hubs.
    Since then I make sure I come into Tomshardware articles expecting stuff to be incorrect. It makes me sad, I used to come here for new tech info but now I'm not so sure...
  • catfishtx
    I worked for Intel during the time period that they released the Pentium MMX processors. They told us that MMX stood for Multi Media eXtensions.
  • falcosoft
    "Note: Most applications that formerly used floating-point math now use MMX/SSE instructions instead. These instructions are faster and more accurate than x87 floating-point math."

    Quite the contrary, x87 CAN BE more accurate than SSE but not the way around. X87 knows and uses 80 bit floating point data internally while SEE (and AVX) can only use 64 bit floating point data. This sentence will be true if 128 bit precision is implemented in the future.