Sign in with
Sign up | Sign in

Single Floating-Point Unit, AVX Performance, And L2

AMD Bulldozer Review: FX-8150 Gets Tested
By

Two Cores, One FPU

The shared floating-point unit is separate from the integer pipelines. So, when operations hit the dispatch interface at the end of the decode stage and on the way to the integer units, any floating-point operation part of that stream instead goes to the floating-point scheduler. There, they compete for resources and bandwidth independent of the thread to which they belong.

As you can see in the diagram below, AMD’s floating-point logic is different from the integer side. Its purpose is execution-only; it reports its completion and exception information back to the parent integer core, which is responsible for instruction retirement.

The floating-point unit features two MMX pipelines and a pair of 128-bit fused multiply-accumulate (FMAC) units. Those FMAC pipes support  four-operand instructions, which give you a non-destructive result. Intel plans to incorporate the three-operand format in its Haswell micro-architecture (the one to follow Ivy Bridge). AMD says it’ll also support FMA3 in the successor to Bulldozer, called Piledriver, expected in 2012.

Any time we see vendors take divergent plans like this, we have to wonder how it’ll affect developers. So, we asked Adrian Silasi of SiSoftware what he expected to happen, and he pointed out that most developers won’t want to implement three code paths (one for AVX-only, one for AVX plus FMA3, and one for AVX plus FMA4). This makes good sense. And when you consider that few applications exploit AVX today and none of them utilize FMA, AMD should be in a better position to support all three paths when Piledriver becomes a reality.

The more pressing question today is how will Bulldozer’s AVX support size up to Intel’s? Sandy Bridge gives you two 256-bit AVX operations per clock, while Bulldozer facilitates one.

Leading up to this launch, I started talking to Noel Borthwick, a talented musician and CTO of Cakewalk, Inc., about his company’s work optimizing Sonar X1 for AVX. According to a whitepaper that Noel co-authored, AVX instruction support helps reduce the software load applied by performing audio bit depth conversations while streaming audio buffers through the playback graph, rendering, and mixing. Common conversions include 24-bit integer to 32-bit floating-point, 64-bit double-precision conversion, and 32-bit float to 64-bit double-precision.

To that end, Noel sent over the binary for a test application that compares two of Cakewalk’s AVX-optimized routines to the unoptimized version. Both AMD and Intel have access to this very same metric, so its results shouldn’t come as a surprise to either company.

Architecture
Operation
Result (CPU Cycles Gained/Lost)
AMD Bulldozer
Copy Int24toFloat64
61% Gain
AMD Bulldozer
Copy Float32toFloat64
77% Loss
Intel Sandy Bridge
Copy Int24toFloat6469% Gain
Intel Sandy Bridge
Copy Float32toFloat6414% Gain


In the Copy Int24toFloat64 operation, Intel’s Core i7-2600K sees a 69% gain, while AMD’s FX-8150 realizes an also-impressive 61% gain. What does “a gain” actually constitute? We’re talking about the number of CPU cycles that AVX helps reduce, yielding an increase in potential processor bandwidth. Phrased differently, Sandy Bridge cuts the number of cycles by 1.69x, while Bulldozer reduces them by 1.61x.

On the other hand, in the Copy Float32toFloat64 operation, Core i7-2600K realizes a 14% gain as FX-8150 suffers a 77% loss. In trying to explain that loss, it seems that the native Visual Studio 2010 intrinsics either Cakewalk’s vectorization intrinsics (or, less likely, Microsoft’s) aren’t optimized for AMD’s architecture. In either case, an application patch or Visual Studio service pack could be needed.

If you flip to the Sandra 2011 results, you’ll see that AVX support does help FX-8150’s integer and floating-point performance. Sandy Bridge simply realizes a much larger floating-point gain in this synthetic metric.

Just before we wrapped up testing, AMD forwarded along two versions of x264, the software library behind front-ends like HandBrake (you’ll see us test the latest version of HandBrake shortly). However, these builds incorporate support for AVX and XOP instructions, the latter of which is exclusive to AMD’s architecture.

I modified Tech ARP’s x264 HD Benchmark 4.0 to utilize each of the new code paths, plus CPU-Z 1.58 for system information, and ran FX-8150 through the pair, along with Core i5-2500K through the AVX-optimized build.

The results between AMD’s AVX and XOP code paths are pretty similar. Intel manages to finish the first pass faster, but AMD delivers better performance on the second pass.

Now, bear in mind that the number of AVX-optimized tests is small. It’s going to take a lot of software development work before we get a clearer picture of how AVX instruction support affects each of these architectures.

Sharing The L2

We already mentioned the shared L2 TLB responsible for servicing instruction- (front-end) and data-side (integer core) requests. However, there’s also a unified L2 cache shared between the two cores. This repository is 2 MB per module, giving you 8 MB of total L2 on a four-module FX-8000-series processor.

AMD says the Bulldozer module’s data prefetcher is also the product of significant power and silicon investment, which it gets away with by amortizing across both cores.

Display all 530 comments.
Top Comments
  • 54 Hide
    Homeboy2 , October 12, 2011 4:38 AM
    killerclickAs I said before, it won't come close to beating Intel in performance or price. Now let's hear the fanboys whine.


    Everyone should cry, even the Intel fanboys, this is bad news for everyone, now Intel has absolutely no incentive to lower prices or accelerate Ivy Bridge.
  • 51 Hide
    jdwii , October 12, 2011 4:14 AM
    Been so long and i'm kinda sad.
  • 47 Hide
    gmcizzle , October 12, 2011 4:25 AM
    What I learned: the 2.5 year old i7-920 is still a beast.
Other Comments
  • 51 Hide
    jdwii , October 12, 2011 4:14 AM
    Been so long and i'm kinda sad.
  • 43 Hide
    compton , October 12, 2011 4:16 AM
    Not many surprises but I've been waiting for a long, long time for this. I hope this is just the first step to a more competitive AMD.
  • 29 Hide
    ghnader hsmithot , October 12, 2011 4:16 AM
    At least its almost as good as Nehalem.
  • 40 Hide
    gamerk316 , October 12, 2011 4:17 AM
    Dissapointing. Predicted it ages ago though. PII X6 is a better value.
  • 26 Hide
    Anonymous , October 12, 2011 4:18 AM
    As I expected - failure.
  • 25 Hide
    AbdullahG , October 12, 2011 4:18 AM
    I see the guys from the BD Rumors are here. As many others are, I'm disappointed.
  • 33 Hide
    iam2thecrowe , October 12, 2011 4:20 AM
    for the gaming community this is a FLOP.
  • 25 Hide
    phump , October 12, 2011 4:22 AM
    FX-4100 looks like a good alternative to the 955BE. Same price, higher clock, and lower power profile.
  • 40 Hide
    phatbuddha79 , October 12, 2011 4:25 AM
    Why bring back the FX brand for something like this?
  • 47 Hide
    gmcizzle , October 12, 2011 4:25 AM
    What I learned: the 2.5 year old i7-920 is still a beast.
  • 25 Hide
    Ragnar-Kon , October 12, 2011 4:36 AM
    Looks like solid chips, but I'll admit that the price point isn't low enough to compete in the gaming world with Intel.

    I am rather curious how the FX-4100 will stack up against the current Phenom II X4 chips.

    And even though the FX is a slight disappointment, I am rather impressed by the Windows 8 benchmarks. Having said that, by the time Windows 8 is ready for release I'm sure Intel will have an even better solution.
  • 25 Hide
    Tamz_msc , October 12, 2011 4:37 AM
    So Bulldozer is AMD's version of NetBurst?
  • 54 Hide
    Homeboy2 , October 12, 2011 4:38 AM
    killerclickAs I said before, it won't come close to beating Intel in performance or price. Now let's hear the fanboys whine.


    Everyone should cry, even the Intel fanboys, this is bad news for everyone, now Intel has absolutely no incentive to lower prices or accelerate Ivy Bridge.
  • 12 Hide
    the associate , October 12, 2011 4:41 AM
    killerclickAs I said before, it won't come close to beating Intel in performance or price. Now let's hear the fanboys whine.


    Waaaahhhhhhhhhhhhh!!!!!!!!

    Bah, well, been with AMD since my first pc like 8 years ago...Guess I'll be going intel for the first time ever especially since I can get an overkill cpu for just 300 bucks. Hell that's how much I payed for my phenom II 955...
Display more comments
React To This Article