From what I know, here goes (somebody feel free to correct me if I goof up)
1. The CPU must be able to have its multiplier and Vcore dropped while running. In other words, it must be Speedstep-capable. Pentium 3-M, Pentium 4-M, Pentium M, Core Solo/Duo, Pentium 4 6xx, and Pentium D 830, 840, and 840EE chips support this. (Most AMD Athlon 64s do too, and it's called Cool 'n Quiet, but it works the same way.)
2. The chipset must be able to change the multiplier and Vcore while the chip is running. Most chipsets past Intel's 810 and NVIDIA NF4 support this. I think that ATI's chipset may as well.
3. The BIOS generally offers a frontend to enable/disable Speedstep in the chipset.
4. The OS is where most of the work happens. The system is polled for load every X milliseconds and the multiplier/Vcore adjusted to suit the load. High load = raising of multilier and Vcore, decreasing load = lowering of multiplier and Vcore. This polling can happen either in the kernel or in userspace, depending on the utility used to scale the frequency and the OS you're running. The scaling pattern is determined by how the OS's Speedstep governors are set up, either by Power Properties or by a 3rd-party program in Windows XP, or by a built-in utility like powernow, powersave, or cpudyn in Linux.