Each Bulldozer module is on its own clock domain, meaning multiple modules can operate at different frequencies simultaneously. This is new functionality compared to Phenom II, which ran its cores at the same speed (but had a number of intermediate p-states from which to choose). However, AMD tried this approach on its original Phenom.
If you remember back far enough, separate clock domains caused a problem with Phenom processors in Windows Vista when Cool’n’Quiet was enabled. Through a process called migration, the operating system’s scheduler would move threads between cores in an effort to maintain symmetry under load. Why? From my Intel Lynnfield launch story:
“In order to maintain the symmetry of a system under full load, you don’t want I/O to become dependent on just one core. If you keep threads rotating between cores running at their maximum performance (this whole concept goes out the window when you start talking about spinning cores down), you get better responsiveness.
This was an implementation decision made during Microsoft’s Windows NT kernel design, and based on our experiences with both processor vendors' hardware, it wasn't considered a "feature" to either company. Of course, it affected Intel in a much different way than AMD. The problem Intel had in Vista was one of power consumption. For every migration, you had to write-combine the Nehalem architecture’s L3 cache, which cost power.
This changes with Windows 7 and a feature called ideal core. If a task’s needs are being addressed by one core, the operating system will let you stay there. This means two things to Intel: first, you don’t use power on the migration, and second, idle cores are able to remain in a C6 state. Purportedly, this migration fix alone will yield an extra 10 to 15 minutes of battery life on Nehalem-based notebooks, though this won’t become a major issue until the mobile dual-core Arrandale launches later this year. Perhaps more interesting, though, is that processors without C6 will not realize this gain (including AMD’s CPUs).”
So, while Phenom may have been a bit before its time given Vista’s scheduler, Windows 7 should handle AMD’s design in a more elegant manner. But even beyond that, Larry Hewitt, chief SoC engineer for Zambezi, Interlagos, and Valencia, says that the time it takes Bulldozer to spin up from its minimum p-state is less than it was on Phenom.
Naturally we wanted to put Larry’s claim to the test. You can’t see this in the graph up above, but the Phenom II, which fixed Phenom’s migration problem by running cores at the same speed, sees no performance difference in PCMark 7 with Cool’n’Quiet on or off, just as we'd expect. The same goes for FX-8150, confirming that Zambezi and Windows 7 behave themselves. What’s really interesting, though, is how effective AMD’s power-oriented optimizations to Bulldozer really are. The blue and green lines are the FX and Phenom II X6 with CnQ turned on. Black and red are the same two chips with CnQ turned off (again, respectively).
We find that the Phenom II X6 averages 204 W of system power use with CnQ turned off and 191 with it turned on—a 13 W difference. The FX-8150 averages the same 191 W with CnQ enabled, but it jumps up to 240 W with the feature turned off. On average, CnQ cuts system power use by an astounding 49 W during the benchmark run, without negatively affecting performance!
The integrated northbridge and L3 cache complex is on its own clock domain. Additionally, it has its own power domain. Power gating, which was introduced by Intel in its Nehalem design but only just implemented by AMD in the Llano-based APUs, is purportedly used extensively in this SoC to minimize leakage current when parts of the chip aren’t in use.
As with Llano, Zambezi/Valencia/Interlagos-based chips support the Core C6 state, where a Bulldozer module’s cache is flushed, its context is exported to system memory, and all voltage is removed from it. The result is that, for every module you can put to sleep, power consumption and heat dissipation both drop. This is doubly beneficial taken in context with the Windows 7 migration issue I just mentioned, which should allow idle Bulldozer modules to stay off longer (this has to happen at the module—not core—level).
C1E support isn’t anything new for AMD, but it’s improved in that all of the Bulldozer modules can be power gated, as the northbridge, HyperTransport links, and DRAM are all dropped into a very low power state.
- AMD Sets The Stage For FX’s Performance
- Platform Support For FX: Make Sure It’s AM3+
- The Idea Behind AMD’s Bulldozer
- A Shared Front-End And Dual Integer Cores
- Single Floating-Point Unit, AVX Performance, And L2
- Per-Core Performance
- Power Management
- Enabling Turbo Core
- AMD’s Roadmap Through 2014
- Meet AMD Zambezi, Valencia, And Interlagos
- Hardware Setup And Benchmarks
- Benchmark Results: PCMark 7
- Benchmark Results: 3DMark 11
- Benchmark Results: Sandra 2011
- Benchmark Results: Content Creation
- Benchmark Results: Productivity
- Benchmark Results: Media Encoding
- Benchmark Results: Crysis 2
- Benchmark Results: F1 2011
- Benchmark Results: World of Warcraft: Cataclysm
- Overclocking FX-8150 (On Air)
- Power Consumption
- Sneak Peek: AMD’s Bulldozer Architecture On Windows 8
- AMD FX-8150: The Bottom Line