How Hyper-Threading Works
While the Pentium III had a 10-stage instruction pipeline, the Pentium 4 processor increased pipeline length to 20 stages with the Willamette (180nm) and Northwood (130nm) cores. The following Prescott core (90nm) even ran a 31-stage pipeline. The last of its kind, Cedar Mill (65nm), maintained this execution pipeline structure.
The basic idea behind an instruction pipeline is to structure processing into independent steps, and putting more steps into a pipeline translates into higher execution throughput, especially at high clock speeds. However, leaving the pipeline partially empty or loaded with the wrong instructions leads to performance penalties. Program branches are the most critical factor, as the branch prediction unit of a CPU has to guess which branch will be followed in order to load the appropriate instructions.
The 31-stage pipelines of Prescott and Cedar Mill in particular depended on high workload efficiency. Therefore Intel invented and added a "replay unit," which allowed the processor to intercept operations that have been mistakenly sent for execution and replay them once proper execution conditions were granted. A side-effect of the replay system was that some applications would actually slow down with Hyper-Threading enabled, as execution resources were tied up and therefore detracting from the second thread's performance. At the time, Hyper-Threading's value had to be called into question, since it sometimes served as a benefit and sometimes as a detriment.
Today’s implementation of Hyper-Threading is similar in that it presents each physical core to the operating system as a pair of logical processors. If execution resources aren’t used by a current task, the processor’s scheduler can execute something else to increase efficiency or prevent stalling from branch mispredictions, cache misses, or other data dependencies.
Hardware-wise, all that you need to support Hyper-Threading is a platform with BIOS support and a compatible operating system (we take the actual HT-equipped processor for granted here). This has been the case since the days of Windows NT.
In the past, we’ve seen Hyper-Threading provide additional performance, but it also clearly contributed to power consumption (even if, according to Intel, it's a cheap addition with regard to increasing die surface area). Heavily threaded applications and workloads typically take more efficient advantage of many cores and multiple threads than mainstream software that is less-optimized for multiple threads.