Intel's Pentium Performance Hangs on a Hyper-Thread

The Pipeline

Intel is happy to note that each stage of the Pentium 4's 20-stage pipeline is long enough to be able to simultaneously replicate and execute its resources when processing more than one thread.

At the pipeline's front-end, for example, which is responsible for delivering instructions to the later pipe stages, the OS schedules and dispatches threads of code to each processor. When a thread is not dispatched, the associated logical processor is kept idle.

When a thread is scheduled and dispatched to a logical processor, HT utilizes the necessary processor resources to execute the thread.

When a second thread is scheduled and dispatched on another processor, resources are replicated, divided, or shared to execute the second thread. As each thread finishes, the operating system idles the unused logical processor, freeing resources for the running processor.

To optimize performance in multi-processor systems with HT, the OS can be configured to schedule and dispatch threads to alternate physical processors before dispatching to different logical processors on the same physical processor.