AMD's Bulldozer Architecture: Overclocking Efficiency Explored

The Bulldozer Platform: Using FX-8150 To Test

Within the line-up of Zambezi chip, three models employ two Bulldozer modules totaling four cores, one includes three modules, adding up to six cores, and three are fully-featured four-module SKUs boasting eight cores. We're using the flagship FX-8150 for testing here. It's currently selling for about $270 bucks and, again, drops into the Socket AM3+ interface.

As you no doubt already know, AMD counts cores differently from Intel (or even from its previous architectures). In contrast to chip-level multiprocessing, where each core is complete and distinct, AMD uses the phrase chip-level multi-threading, and draws a distinction between modules and cores. The emphasis here is on efficiency in a multi-core design, positing that the days of single cores operating on their own are over. Rather than cramming in as many cores as possible onto a piece of silicon, brute-forcing the performance story, AMD tries to achieve an optimal balance by duplicating key resources, theoretically adding complexity where it'll be best utilized and avoiding waste in less sensitive parts of the chip.

Again, from our launch coverage of FX-8150:

...the Bulldozer module doesn’t incorporate two complete cores. Instead, it shares certain parts of what we’d expect to find as dedicated resources in a typical execution core, including instruction fetch and decode stages, floating-point units, and the L2 cache.

According to Mike Butler, chief architect behind Bulldozer, this is justifiable because traditional cores operating in a power-constrained environment don’t make optimal use of thermal headroom. That completely makes sense; when you’re trying to pack as many cores into a server as possible, you want to bias in favor of the resources most likely to be used most often, and avoid chewing up die space/power with components that can be shared without negatively impacting performance too severely.

...but simultaneously optimizing for performance and power necessitates sharing of certain resources.

The decision to share only bites you in the butt when both threads need the same resources, at which point performance drops relative to chip-level multiprocessing. But AMD is optimistic: last August, when it started releasing architectural details at the Hot Chips conference, it estimated that a Bulldozer module could average 80% of two complete cores, while only affecting die space minimally. As a result, in heavily-threaded environments, a Bulldozer-based processor should deliver significant efficiency improvements.

This also means AMD has to redefine what actually constitutes a core. To best accommodate its Bulldozer module, the company is saying that anything with its own integer execution pipelines qualifies as a core (no surprise there, right?), if only because most processor workloads emphasize integer math. I don’t personally have any problem with that definition. But if sharing resources negatively impacts per-cycle performance, then AMD necessarily has to lean on higher clocks or a greater emphasis on threading in order to compensate.

Learning To Share

Of course, AMD’s architects were careful in deciding which parts of the core could be shared, keeping power and efficiency in mind. As an example, following a branch misprediction, the front-end of a conventional core has to be flushed, wasting both bandwidth and power. Sharing that hardware between two cores helps improve the utilization of those resources. AMD also looked for areas where it could “afford” to share without hurting the timing of critical paths, hence the shared floating-point scheduler, which wasn’t considered to be as latency-sensitive as the integer units.

To the operating system, the resulting module appears as a pair of cores, similar to how a Hyper-Threaded core would appear. AMD is naturally eager to dispel the idea that Bulldozer will behave anything like Hyper-Threading (or SMT), claiming that its design facilitates better scalability than two threads sharing one physical core. Again, that makes sense—a Bulldozer module really can’t be characterized as a single core because many of its resources are, in fact, duplicated.

Swipe to scroll horizontally
ModelBase ClockTurbo-Core ClockMax. Turbo ClockTDPCoresTotal L2 CacheL3 CacheNorth Bridge Freq.
FX-81503.6 GHz3.9 GHz4.2 GHz125 W88 MB8 MB2.2 GHz
FX-81203.1 GHz3.4 GHz4.0 GHz125 / 95 W88 MB8 MB2.2 GHz
FX-81002.8 GHz3.1 GHz3.7 GHz95 W88 MB8 MB2.0 GHz
FX-61003.3 GHz3.6 GHz3.9 GHz95 W66 MB8 MB2.0 GHz
FX-41704.2 GHz-4.3 GHz125 W44 MB8 MB2.2 GHz
FX-B41503.8 GHz3.9 GHz4.0 GHz95 W44 MB8 MB2.2 GHz
FX-41003.6 GHz3.7 GHz3.8 GHz95 W44 MB8 MB2.0 GHz
  • aznshinobi
    Reading conclusion paragraph, I'd have to agree. I think they probably would've been better of using the STARS arch and just die shrinking it to 32nm.
    Reply
  • Darkerson
    I know I have been critical in my comments here and there, but I really do hope Bulldozer helps AMD learn and refine Piledriver and future CPUs so that they are all better as a result. I know I will be skipping BD, but that doesnt mean I dont ever want to use AMD again. I will always root for the underdog, in hopes that we have another Athlon 64 on our hands again.
    Reply
  • hellfire24
    gulftown=expensive and useless.
    Sandybridges=king of the hill(price to performance)
    Sandybridge-E=expensive sandybridge.
    Bulldozer=budget cpu with multitasking capabilities.
    Reply
  • deadon2
    Fehh... did my build on a 990fx platform with a 955be CPU. Runs plenty fast, and I can upgrade the AM3+ in a year when AMD gets it right.

    Although I appreciate the work done on this article...

    Nothing to see here folks, move along...
    Reply
  • dontcrosthestreams
    im just fine with my 110$ 955be.... 29 deg idle at 3.7ghz
    Reply
  • noob2222
    Is that a typo on page 7 and 8? "Clock Frequency: 4.5 GHz, Multiplier: 22.5x, CPU Voltage: 1.428 V" cpu-z shows 1.380? page 8 cpu z shows 1.44 and not 1.5.

    As for my own efficiency testing, I achieved 1.375V (cpu z), 4.4Ghz out of my 8120 with ease. I upped the NB to 1.115v (+.015V)wich added more stability and clocked the NB to 2600 to match HTT, wich brought another 1gb/s on sandra's memory test. All without disabling C1E or C3 states.

    Would be nice to see some followups with memory testing, BD responds to fast speeds. Hard to read since its in a different language but the graphs are easy enough to see
    http://www.planet3dnow.de/vbulletin/showthread.php?t=401023&garpg=13
    Reply
  • Tom's Hardware finds that overclocking increases speed, power requirements. Film at 11.
    Reply
  • de5_Roy
    yay! another efficiency article from toms. :love:
    sad to see amd's claims about efficiency turn out to be (much) less than accurate.
    some people are definitely gonna complain about the ram used (ddr3 1333) and windows 8 or lack of highly threaded benchmarks like truecrypt encryption or pov ray tracing (as if those are always used by regular users lol) and stuff.
    undervolting does look promising...but it doesn't seem to make any difference compared to sandy bridge systems. worse, bulldozer needs voltage increase to get more clockspeed.. i guess it will be more evident in fx 4100 and 6100 where substantial core voltage increase is necessary to get stock sandy bridge level performance out of them. that's just disappointing.
    Reply
  • memadmax
    It seems to me that Bulldozer is either a AMD bastard child chip, or it's a first gen chip of which subsequent generations of the architecture will be playing "catch up" performance wise. Otherwise, it's typical AMD trying to be efficient rather than a heavy hitter.

    But if you ask me, this is a "defensive" chip in the processor wars. And no war has been won playing defense.
    Reply
  • shinkueagle
    memadmaxIt seems to me that Bulldozer is either a AMD bastard child chip, or it's a first gen chip of which subsequent generations of the architecture will be playing "catch up" performance wise. Otherwise, it's typical AMD trying to be efficient rather than a heavy hitter.But if you ask me, this is a "defensive" chip in the processor wars. And no war has been won playing defense.
    Meaning this war is a TOTAL loss to AMD... SADLY... AMD - ABSURDLY MORONIC DEVICES.
    Reply