Efficiency Features On-Board
There is a lot to tell about Sandy Bridge when it comes to power and efficiency. Although this is the first Intel consumer processor that fully integrates a graphics unit with the processor die, the architecture was in fact built to be very modular, despite the high level of integration.
All Sandy Bridge CPUs are split into three sections, which equal three individual power and frequency domains. One is the System Agent, which includes the memory controller and PCI Express, the other contains the cores with shared L3 cache (now called the last level cache) and the ring bus architecture, and the third one is the graphics unit. Each of these units is scalable as well, meaning that six or eight cores (instead of two or four) are already planned on higher-end platforms.
Flexible TDP Usage
A key to high power efficiency is Sandy Bridge’s ability to have each of these CPU sections utilize large portions of the thermal and the power envelopes when needed and if possible, while the others may be switched into a lower power state. No other platform currently shows such a significant difference between idle power and peak load power. Let me give you an example. The Core i7-2600K system with processor graphics requires only 32 W at idle, but jumps to 136 W once we switch on Prime95. This is a 4.25x jump in system power consumption. If we imagine a dual-socket, eight-core system subject to the same increase, this power jump would result in pretty incredible numbers. Lastly, let’s recall that not only the cores, but also the graphics unit, has a dynamic frequency range that can reach 1100 MHz on Core i5 and up to 1350 MHz on Core i7.
Turbo Boost 2.0
One of the key features that helps improve overall efficiency, linked closely to TDP utilization, is the second-generation Turbo Boost technology. Depending on the processor model, it triggers a temporary increase of core clock speeds (up to the maximum Turbo Boost frequency) if there is significant load. Different from earlier implementations, Turbo Boost 2.0 can involve all available cores, and it will increase the clock speed for as long as the thermal and power envelopes aren’t fully utilized. In real-life, this means that a Sandy Bridge-based Core i5 or i7 will run at increased clock speed for a limited period of time. When the maximum thermal power is reached, the processor will reduce the clock speed until power and clock speed even out--in a worst case this is the processor’s rated clock speed.
Turbo Boost alone does not necessarily increase efficiency, but a system that is as low in idle power as the Sandy Bridge generation should stay in such a low power state as long as possible. This means that it should tackle pending workloads as quickly as possible, so it can return to this state quickly.
Modern 32 nm Manufacturing
Although this is not an efficiency-oriented feature by itself, the 32 nm manufacturing process is definitely a key enabler for maximum performance per watt. Smaller gates and transistors translate into lower operating voltage and less power consumption. Eventually, this allows for Intel to flexibly utilize the transistor budget and area in a smart manner. This is where the paradigm change is happening: simply adding cores resembles a brute force approach, which is similar to cranking up clock speeds. It will increase power consumption and eventually it will not scale linearly. We know for a fact that, given the threading optimizations available in software today, eight cores will not provide twice the performance of four cores. However, power consumption will be more than 2x higher. As a result, Intel's engineers had to put a lot of thought into determining the functions they wanted to accelerate in hardware, implementing improvements as efficiently as possible while maintaining or minimizing power consumption. The trick is not to just improve performance per unit of power, but also to consider performance per square millimeter.
Cool and Really Cool
The Intel briefing documents on Sandy Bridge describe features that the company calls either cool or really cool. A lot of the optimization efforts for Sandy Bridge went into finding microarchitecture enhancements that provide a better-than-linear improvement in performance/power. A ‘cool’ feature translates to improving performance, while keeping the corresponding increase in power consumption at a linear level, as a worst-case. Ideally, power consumption should increase less than performance.
A ‘really cool’ feature has an even more significant impact, as it gains performance while actually reducing power consumption. This applies to the upgraded branch predictor paired with the decoded µops cache, which allows the decoders to stay off more often. All other core improvements are referred to as cool.