Intel's Pentium Performance Hangs on a Hyper-Thread

A History Of Parallelism

Parallelism and multi-threaded computation, concepts on which HT is based, are of course anything but new. In fact, Intel's Xeon server processor has employed HT since its debut earlier this year. Thread-level parallelism is now used by Intel, Sun, IBM, and Compaq processors.

By definition, parallelism boosts performance by performing independent tasks simultaneously. Since the mid-1990s, Intel has employed processor parallelism to milk as much performance as possible from its processors' silicon for server applications.

Specifically, HT is based on Thread-Level Parallelism (TLP), which involves switching utilization of chip resources from the currently executing thread to a new thread when the currently executing thread initiates a long latency operation. This reduces the likelihood of long pipeline stalls by allowing the second thread to execute while the long latency operation of the first thread completes.

Switching processor resources from one thread to another incurs a performance penalty, however, since the current thread's instructions must be flushed or drained from the pipeline. Because the thread's architectural state must be preserved in the pipeline, the new logical processor must be activated, and instructions from the new thread must be provided to the processor's resources. These steps can take up to 40 clock cycles to complete.

With HT, however, multi-processor-capable software applications can run unmodified with twice as many logical processors to use. Each logical processor can respond to interrupts independently. The first logical processor can track one software thread, while the second logical processor can track another software thread simultaneously. Because the two threads share one set of execution resources, the HT can use resources that would be otherwise idle if only one thread was executing. The result is an increased utilization of the execution resources within each physical processor package.

For example, one logical processor can execute a floating-point operation while the other logical processor executes an addition and a load operation. HT is complementary to MP-based systems because the operating system can not only schedule separate threads to execute on each physical processor simultaneously, but on each logical processor simultaneously as well.

This improves overall performance and system response because many parallel threads can be dispatched sooner due to twice as many logical processors being available to the system. Even though there are twice as many logical processors available, they are still sharing one set of execution resources. So the performance benefit of another physical processor with its own set of dedicated execution resources will typically offer greater performance levels. In other words, HT is complementary to multi-processing by offering greater parallelism within each processor in the system, but is not a replacement for dual or multi-processing.