When Intel was designing Nehalem and later CPUs, they had 2 goals in mind - boost single or low-threaded apps performance, and boost multithreaded apps performance, using the principle that if something didn't boost performance by at least twice as much as it cost in power or die area increase, it didn't make the grade into the design. So for the first part, Intel used Turboboost to speed up the clocks on the 1 or 2 cores that were actually doing work, and downclocking or turning off the other cores not busy with threads.
For HT, Intel used something under 5% extra die space to duplicate registers and other core resources so that a core could switch from one thread to another and thus appear to be 2 cores (logical cores). According to Intel this would yield up to 30% performance increase if the first thread had a certain percentage of free clock cycles (i.e., where that thread wasn't actually doing anything except waiting on some other thread input). IOW, instead of that high-powered core slacking off and waiting, Intel decided to put it to use and have it switch to another thread.
Where HT doesn't work or actually decreases performance is when the first thread doesn't have any - or many - free clock cycles to make use of. Hence if you force the core to switch to another thread anyway, the first one is going to slow down. And if the second thread is also a heavy thread (no free clocks) then it too won't see as much benefit as it would if it ran on its own physical core.
So with Nehalem and later designs, Intel tried to cover both ends of the thread spectrum - single thread clock boost and multiple thread core boost.