Sign in with
Sign up | Sign in

More On Bulldozer

AMD’s Bulldozer And Bobcat Architectures Pave The Way
By

In reality, much of what AMD is discussing at Hot Chips is already known, lending to a much-regurgitated slide deck covering the Bulldozer and Bobcat architectures.

Much of the company’s emphasis is on Bulldozer and its approach to threading. AMD draws a clear distinction between conventional simultaneous multi-threading (productized by Intel as Hyper-Threading) and chip-level multi-processing, employed by the six-core Thuban design, for example, where one core operates on one thread.

CMP is pretty straightforward. You replicate physical cores to scale out performance in threaded software, basically. It’s a brute-force approach that yields the best performance, but becomes very expensive for a manufacturer bumping up against the limits of its process technology, especially if execution resources are left idle. This is the exact reason we often recommend quick quad-core processors over slower six-core CPUs for gaming. Unless your workload is properly optimized for parallelism, CMP results in over-provisioning, and the higher clock rates of less-complex dual- and quad-core designs yield better performance.

Intel combats this with Hyper-Threading, which allows each physical core to work on two threads. Over-provisioning is assumed, meaning you rely on under-utilization to extract additional performance from each core. This is a relatively inexpensive technology. But it’s also quite limited in the benefits it offers. Some workloads don’t see any speed-up from Hyper-Threading. Others barely crack double-digit performance gains.  

AMD is trying to define a third approach to threading it calls Two Strong Threads. Whereas Hyper-Threading only duplicates architectural states, the Bulldozer design shares the front-end (fetch/decode) and back-end of the core (through a shared L2 cache), but duplicates integer schedulers and execution pipelines, offering dedicated hardware to each of two threads.

The pair of threads share a floating point scheduler with two 128-bit fused multiply-accumulate-capable units. Consequently, it’s clear that AMD’s emphasis here is integer performance, which makes sense given the company’s Fusion initiative and impending plan to have GPU resources handle floating-point work. Just bear in mind that the first Bulldozer-powered processors won’t be APUs. Despite the fact that it is sharing FP resources here, AMD remains confident in its balance between dedicated and shared components.

None of that is new, though. AMD talked about it all back in November of last year.

Ahead of the Hot Chips presentation, we had the opportunity to refresh what we knew about Bulldozer with Dina McKinney, corporate vice president of design engineering at AMD. According to Dina, the company’s Two Strong Thread approach achieves somewhere in the neighborhood of 80% of the performance you’d see from simply replicating cores. At the same time, sharing some resources helps cut back on power use and die space.

This development, along with a shift to 32 nm SOI manufacturing, is leading AMD to estimate a 33% increase in core count and a 50% increase in throughput (suggesting significant IPC gains) in the same power envelope as Magny-Cours-based Opteron processors. The projections here are based on simulated comparisons between today’s 12-core Opteron 6100-series chips and upcoming 16-core Bulldozer-based models, currently code-named Interlagos.

Duplicated execution resources lead AMD to call this a dual-core implementation.Duplicated execution resources lead AMD to call this a dual-core implementation.

Now, one of the concerns I’ve seen brought up regarding AMD’s taxonomy is that a Bulldozer module looks like an SMT-enabled single-core processor. Only, instead of duplicating registers to store the architectural state, AMD gives each thread its own instruction window and dedicated pipelines. In talking back and forth with AMD’s John Fruehe, it’s clear that the company thinks that the duplication of integer schedulers and corresponding pipelines (disregarding the other shared components) makes each Bulldozer module a dual-core design, distinguishing it from SMT as it’s associated with Hyper-Threading. That gets a little marketing-heavy for me, but I can certainly respect that we’re looking at an architecture that’ll do much more for performance than Hyper-Threading in parallelized workloads.

I was also curious how Bulldozer modules are expected to interact with Windows 7. Intel and Microsoft put a deliberate effort into optimizing for Hyper-Threading. The operating system’s scheduler knows the difference between a physical core and a Hyper-Threaded core. If it has two threads to schedule, Windows 7 and Server 2008 R2 use two physical cores. The alternative—scheduling two threads to the same physical, Hyper-Threaded core—would naturally sacrifice performance. Because Bulldozer modules are still sharing resources, it’d stand to reason that a four-module Zambezi CPU would be best served by similarly handling two threads using different modules. Though AMD wasn’t able to address how it’ll handle this interaction, it assures me that it’s working with OS vendors on optimizations that’ll be ready for Bulldozer’s release.

Zambezi, based on Bulldozer, might just look like this.Zambezi, based on Bulldozer, might just look like this.

I also asked John about the front-end’s instruction/cycle capabilities and the shared L2’s capacity configuration, but neither of those details is available yet. What he could tell me was that the 128-bit FP units are symmetrical, and that, on any cycle, either integer core can dispatch a 256-bit AVX instruction (assuming software compiled to support AVX). Or, both integer cores can dispatch a single 128-bit instruction at the same time.

In addition, John clarified how each integer unit’s pipelines are oriented. Whereas K10 enables three pipelines shared between ALUs and AGUs (effectively 1.5 of each), Bulldozer increases this number to four pipelines—two dedicated AGU and two dedicated ALU. The L1 cache configuration is a bit different, too. Whereas K10 offered 64 KB of L1 instruction and 64 KB of L1 data cache per core, Bulldozer enables 16 KB of L1 data cache per core and 64 KB of 2-way L1 instruction cache per module. It remains to be seen how the smaller L1 affects performance.

Ask a Category Expert

Create a new thread in the Reviews comments forum about this subject

Example: Notebook, Android, SSD hard drive

Display all 85 comments.
This thread is closed for comments
Top Comments
  • 36 Hide
    luke904 , August 24, 2010 7:53 AM
    ok everyone... I swear to god if AMD pulls this one off and bitch slaps Intel again... PARTY AT MY HOUSE!!!
  • 15 Hide
    randomizer , August 24, 2010 10:56 AM
    blink180heightsstill won't be a core i7 920

    Really? Here I was thinking that's exactly what AMD was building.
  • 13 Hide
    buzznut , August 24, 2010 7:34 AM
    Thanks for the great article and valuable information. Someone scolded me for saying I wanted to wait for a bulldozer processor to upgrade. Thanks for clearing this up. Yes, Bulldozer is what I'm waiting for, prolly a Zamboni would be nice.
    By the way, just who the hell comes up with these ridiculous names? I personally think manufacturers would sell more units if they weren't so confusing.
Other Comments
  • 4 Hide
    notty22 , August 24, 2010 6:59 AM
    This will help spur a tick or a tock from Intel :) 
  • 12 Hide
    tacoslave , August 24, 2010 7:23 AM
    dogman_1234No new news here, but I cannot wait until AMD releases "Bulldozer"! The thing that got me was the use of the cores. Really, all AMD needs now is a Commercial. Anybody?


    Put it in a mac, the sheeple will eat this Sh1t up.
  • 13 Hide
    buzznut , August 24, 2010 7:34 AM
    Thanks for the great article and valuable information. Someone scolded me for saying I wanted to wait for a bulldozer processor to upgrade. Thanks for clearing this up. Yes, Bulldozer is what I'm waiting for, prolly a Zamboni would be nice.
    By the way, just who the hell comes up with these ridiculous names? I personally think manufacturers would sell more units if they weren't so confusing.
  • 11 Hide
    Judguh , August 24, 2010 7:49 AM
    dogman_1234No new news here, but I cannot wait until AMD releases "Bulldozer"! The thing that got me was the use of the cores. Really, all AMD needs now is a Commercial. Anybody?

    I'd have to say I partially agree with you. I see way more Intel commercials (many) than AMD (none). My next build: Bulldozer :D 
  • 2 Hide
    SpadeM , August 24, 2010 7:50 AM
    Quote:
    Details being discussed today include a dual-issue x86 decoder and out-of-order execution, perhaps enabling a performance advantage compared to Intel’s Atom CPUs.


    Not quite, the out of order execution WILL enable a performance advantage compared to Atom, + the added bonus that the AMD GPU on the Ontario platform (if similar or better in performance to the ION) WILL again make for a better platform as a hole.

    On the Bulldozer side, Power Gating and Turbo for modules, TST > SMT should be something to look forward.

    PS: On the commercials side I'm looking forward to something like this:
    http://www.youtube.com/watch?v=MK0hU0OYvCI
  • 36 Hide
    luke904 , August 24, 2010 7:53 AM
    ok everyone... I swear to god if AMD pulls this one off and bitch slaps Intel again... PARTY AT MY HOUSE!!!
  • 3 Hide
    joytech22 , August 24, 2010 7:57 AM
    Excellent!! i skipped the Istanbul CPU's because i'm waiting to score on a "Scorpius" platform! and i know 100% that i will be able to afford it on release!
  • 9 Hide
    thomaseron , August 24, 2010 9:10 AM
    Hopefully, Bulldozer will easily outperform my 955BE. So when the time comes, Bulldozer, Northern islands and about 8GB RAM on a new motherboard. :-) I hope Bulldozer kicks the living daylight out of intel for like one or two generations, so we can get som balance on the market.
  • 1 Hide
    thomaseron , August 24, 2010 9:13 AM
    SpadeMNot quite, the out of order execution WILL enable a performance advantage compared to Atom, + the added bonus that the AMD GPU on the Ontario platform (if similar or better in performance to the ION) WILL again make for a better platform as a hole. On the Bulldozer side, Power Gating and Turbo for modules, TST > SMT should be something to look forward.PS: On the commercials side I'm looking forward to something like this: http://www.youtube.com/watch?v=MK0hU0OYvCI


    I totaly agree on the commersial bit. :) 
  • 3 Hide
    XxOsurfer3xX , August 24, 2010 9:43 AM
    Cannot wait for Zambezi i was thinking of buying a 1095T, now im going to wait and see... C'mon AMD surprise everybody with your new architecture!
  • 2 Hide
    L0tus , August 24, 2010 9:43 AM
    luke904ok everyone... I swear to god if AMD pulls this one off and bitch slaps Intel again... PARTY AT MY HOUSE!!!


    Yeh but Intel also has its new 2011 line-up in the works. I really want AMD to do well but Intel has one hell of a lead in terms of clock-vs-clock performance. But i'm still hopeful...so go AMD!
  • 0 Hide
    Anonymous , August 24, 2010 10:50 AM
    AMD had a similar lead before 'core' !
  • 15 Hide
    randomizer , August 24, 2010 10:56 AM
    blink180heightsstill won't be a core i7 920

    Really? Here I was thinking that's exactly what AMD was building.
  • 3 Hide
    decode , August 24, 2010 11:20 AM
    Thanks for that, I think that clarify's a few things, not fully. But enough for me to make limited sense of it ;) 
  • 0 Hide
    Reynod , August 24, 2010 11:28 AM
    To quote another user "the future is AMDeal".

    Intel will need to buy a decent Graphics design team to keep up.

    Goodby NVidia ... your about to be assimilated into the collective.

    Chris, I might have predicted this a few times before ...and got it wrong too.

    Great article btw.
  • 0 Hide
    ta152h , August 24, 2010 11:41 AM
    The main question is, when AMD is saying their 16 "core" processor is 50% faster than their 12 "core" processor, are they saying 16 "real" cores, or eight of these units with two execution engines.

    It's good to see they got away from having ports with both AGUs and ALUs. Hopefully they copied more from Intel than that.

  • 0 Hide
    nforce4max , August 24, 2010 12:00 PM
    I can't wait but hate having to upgrade my board and ram. :( 

    AM2+ user :/ 
  • 0 Hide
    ares1214 , August 24, 2010 12:02 PM
    From this, which i saw a long time ago, so not much new, got me all excited for nothing :(  But from this, i dont expect Bulldozer to compete with Sandy Bridge, or the i7. I expect it to be competing with the 980x. Intels charts have that as top dog up until Q3 2011. Therefore, we can guess SB will be more on a middle range. Bulldozer on the other hand, more specifically Scorpius, will be AMD's highest performing chips. Id expect them to take on the 980x, not really intels lower end. Some other chips will be made to do that. And to be honest, I think AMD has a winner :) 
  • 0 Hide
    jj463rd , August 24, 2010 12:14 PM
    blink180heightsstill won't be a core i7 920

    From the looks of it Zambezi sounds like it would outperform the i7 920 if it just used 4 cores except that it will have 8 cores (4 modules).It looks like it might even match or maybe outperform the 6 core i7-980X Gulftown monster.That is if AMD can really execute this properly.
    Will Intel have faster CPU's by then? Probably.
    The real question is will software catch up with this new hardware?
    PC Gaming will be great really great with high FPS with new GPU's
Display more comments