Sign in with
Sign up | Sign in
Your question

Another Bulldozer thread - what was AMD TRYING to do?

Last response: in CPUs
Share
November 11, 2011 4:20:04 AM

Hi all.

Just want to know what is AMD thinking when they made the Bulldozer.
For argument sake, lets assume there is clear long term path for this architecture and a whole lot of people worked hard to make this great.
It seemed a long time ago, AMD knew many-core and performance per watt was the future. Bulldozer was built around that idea. OK.
But when benchmarks came out, high CPU utilization apps like Handbrake did (a little) better on Sandy Bridge. Also performance per watt wasn't any better than previous generation chips. This was Bulldozers strong area and still didn't do well.
So, what's going on?

What will Bulldozer look like after all the performance was UN-corked ?
There must be something AMD was aiming for.

More about : bulldozer thread amd

a b à CPUs
November 11, 2011 4:37:21 AM

I would like to know this too. I generally like what AMD does, but I'm it bit confused with Bulldozer, but I don't know all of the ins and outs of some of the technology that they put into the latest chips.

Right now I have a really hard time recommending AMD for builds, unless the budget is really tight and around the $400-500 range.
m
0
l
a c 131 à CPUs
a b À AMD
November 11, 2011 5:05:35 AM

They were aiming for server. In which, based on the power envelope at ~2-2.5GHz, Bulldozer is a win. 16 cores. More throughput in the same power envelope as the previous 12-core MCMs.

They hyped up the desktop parts but failed to deliver. If you notice, although load power is ridiculous, idle power is better than previous generation so Bulldozer will not scale high enough in clockspeed for the desktop without shooting power requirements and heat beyond what is acceptable.
m
0
l
Related resources
a c 131 à CPUs
a b À AMD
November 11, 2011 5:06:42 AM

lunyone said:
Right now I have a really hard time recommending AMD for builds, unless the budget is really tight and around the $400-500 range.

You consider that a tight budget? I want your job.
m
0
l
a b à CPUs
November 11, 2011 5:23:25 AM

enzo matrix said:
You consider that a tight budget? I want your job.

That is including the OS, Monitor, and sometimes requiring a keyboard and mouse too. It all depends on the persons needs and funds too.
m
0
l
November 11, 2011 6:06:41 AM

enzo matrix said:
They were aiming for server. In which, based on the power envelope at ~2-2.5GHz, Bulldozer is a win. 16 cores. More throughput in the same power envelope as the previous 12-core MCMs.

They hyped up the desktop parts but failed to deliver. If you notice, although load power is ridiculous, idle power is better than previous generation so Bulldozer will not scale high enough in clockspeed for the desktop without shooting power requirements and heat beyond what is acceptable.


This is interesting, please expand it if you can.
m
0
l
a b à CPUs
November 11, 2011 10:45:23 AM

BD was clearly optimised for the server:

*)A much more effective SMT implementation then intel's HTT
*)Focus on Integer performance
*)Better long-term scaling
*)High clocks

The reason BD fails on teh desktop is becuase unlike servers, most software can't take advantage of BD's extra scaling. Throw in the fact that clock speeds didn't significantly improve while IPC took a major hit, then you see why BD simply isn't that good a performer.

I again note, BD's design lent itself to high clocks/low IPC. This was a decision that was probably made YEARS before teh first chips were even made. I strongly suspect AMD thought they could clock BD a lot higher then they were able to.
m
0
l
a b à CPUs
November 11, 2011 2:22:10 PM

enzo matrix said:
They were aiming for server. In which, based on the power envelope at ~2-2.5GHz, Bulldozer is a win. 16 cores. More throughput in the same power envelope as the previous 12-core MCMs.

They hyped up the desktop parts but failed to deliver. If you notice, although load power is ridiculous, idle power is better than previous generation so Bulldozer will not scale high enough in clockspeed for the desktop without shooting power requirements and heat beyond what is acceptable.


I agree about BD targeted mainly at server. Are there any Interlagos vs. Magny-Cours comparisons out yet?? I haven't seen any, so AFAIK all we have is AMD's latest statement of 35% more throughput with 33% more cores (sounds like a tie to me :p ).

As to why AMD did this, I think it was mainly to distinguish themselves from Intel. If you look at AMD's CPU history since Core2 came out and demolished K8, AMD insisted on designing Barcelona with a "native" quadcore vs. Intel's "double cheezeburger" MCM approach, which cost them a lengthy delay and underperformance when Barcie came out (very similar to what happened with BD), and in their rush to finally get Barcie out the door, the TLB bug sneaked in (similar to BD's cache problems I guess). Also Barcie's 65nm process node initially had a lot of problems with being too slow and too hot - same as with GF's 32nm apparently. This gave Intel almost a whole year to sell their highly successful "glued-together" C2Q quads, and was one of the reasons why AMD lost billions and billions over the next 4 years. It wasn't until the Phenom II CPUs were released that AMD corrected the Barcie problems, although the Phenom I CPUs weren't bad - they just weren't competitive with Intel's at the time.
m
0
l
a b à CPUs
November 11, 2011 4:09:47 PM

^^ True on all fronts.
m
0
l
a b à CPUs
November 11, 2011 4:46:41 PM

I feel that AMD tried a different approach in the attempt to successfully make something "Extraordinary and Unique" that would help reestablish themselves as a serious competitor against Intel. Sadly I feel they seriously undershot development time and expense and as a result were left with an unfinished, rushed, underperforming product. If they had released BD at the time its idea was conceived it probably would have achieved its goals but it ended up being at least a couple years behind other offerings in many areas. Could it end in a very good product with tweaks and revisions? Perhaps but I feel that they will not catch up at this point with this product and will continue its long term financial downward spiral. Its simply to little to late.
m
0
l
a b à CPUs
November 12, 2011 4:31:26 PM

^ Well I wouldn't write off either AMD or BD at this point. After all, they could possibly borrow more money in the form of a stock purchase from ATIC, the Abu Dhabi consortium that bought AMD's fabs and formed Global Foundries, if they need it. I think it would be a difficult sale for AMD at this point, with unfavorable conditions, but possible.. Plus with the layoffs they will save over $100M next year.

IIRC AMD first mentioned Bulldozer some 5 years ago - it was supposed to be a 45nm design at the time. After the Barcelona problems, both Bulldozer and Bobcat disappeared off AMD's roadmaps for a year or so. They re-emerged a couple years ago I think, this time with BD on 32nm and Bobcat on TSMC's 40nm process.

And AMD did fix Barcelona's problems - Phenom II's are currently excellent value for the $$. So there is hope for Bulldozer as well. However, there are some hints about AMD changing their direction sometime in the near future, maybe at their next financial meeting in January I think. Some think AMD might abandon desktop and go with mobile and cellphone/tablet CPUs..
m
0
l
a c 100 à CPUs
November 12, 2011 9:19:13 PM

enewmen said:
Hi all.

Just want to know what is AMD thinking when they made the Bulldozer.
For argument sake, lets assume there is clear long term path for this architecture and a whole lot of people worked hard to make this great.
It seemed a long time ago, AMD knew many-core and performance per watt was the future. Bulldozer was built around that idea. OK.
But when benchmarks came out, high CPU utilization apps like Handbrake did (a little) better on Sandy Bridge. Also performance per watt wasn't any better than previous generation chips. This was Bulldozers strong area and still didn't do well.
So, what's going on?

What will Bulldozer look like after all the performance was UN-corked ?
There must be something AMD was aiming for.


I'll answer the second question first- what's going on? There are three to four things I can see:

1a. The current Windows scheduler is not optimized at all for the Bulldozer module setup. This is demonstrated by recent Linux kernel patches notably improving Bulldozer performance, and also Bulldozer performing better relative to Thuban on Linux compared to on Windows.

1b. Most Windows programs are not well-optimized for Bulldozer, particularly benchmarks compiled with Intel's ICC. ICC still severely limits what SIMD instructions can be used on non-Intel chips despite the DOJ telling Intel to cut that crap out. Bulldozer should be able to run the same SIMD-enhanced code paths as Sandy Bridge but instead gets limited to SSE2.

2. The L1 caches are likely a bit too small at 16 KB, coupled with L2 and L3 caches with high latencies. The cache gets thrashed due to 1a, and the caches being small/slow makes this worse.

3. The GlobalFoundries 32 nm SOI is a brand-new and relatively unoptimized process, so thermals and clock speeds are a bit hot/slow compared to what they will be when GF gets more experience with the process. AMD/GF has a track record for improving processes quite a bit as they use them, so I'd expect Bulldozers to look better with subsequent steppings.

The first question- what was AMD thinking? They were thinking "server." Bulldozer is a high-thread-count, high-bandwidth, high-throughput processor. That is great for a server and the server Bulldozers will likely do very well. However, it's not necessarily the best approach for high-performance desktop since those guys want a small number of cores with maximum single-thread performance. Platform bandwidth and throughput don't really mean much. AMD probably looked at their portfolio and the computer market and thought that low-power, mainstream laptop/desktop, and server were the markets to go for because performance desktop is a small market and not worth spending the money to dedicate a uarch for. Bobcat is the low-power arch, Stars (Husky) in Llano is good for laptop and mainstream desktop, and Bulldozer is for servers.
m
0
l
a b à CPUs
November 13, 2011 12:29:07 PM

MU_Engineer said:
I'll answer the second question first- what's going on? There are three to four things I can see:

1a. The current Windows scheduler is not optimized at all for the Bulldozer module setup. This is demonstrated by recent Linux kernel patches notably improving Bulldozer performance, and also Bulldozer performing better relative to Thuban on Linux compared to on Windows.


IIRC there were at least 2 BD reviews where they used a beta of Win8 with its improved scheduler that is BD-aware, and they noted a small (up to 5%) performance increase. Of course that was a beta and not shipping product but most people don't expect a big boost when Win8 does ship.

Quote:
1b. Most Windows programs are not well-optimized for Bulldozer, particularly benchmarks compiled with Intel's ICC. ICC still severely limits what SIMD instructions can be used on non-Intel chips despite the DOJ telling Intel to cut that crap out. Bulldozer should be able to run the same SIMD-enhanced code paths as Sandy Bridge but instead gets limited to SSE2.


I thought Intel's compiler only accounted for something like 5-10% of the market. If true, then I don't see much of an effect.

Anyway, to rehash the already-stale argument, AMD had 5 years to work with the software devs to optimize for BD.. To me, their product launch is sorta like a car company selling their brand-new sports car without any tires, because they forgot to go to the tire manufacturer and get some :p ..
m
0
l
a b à CPUs
November 13, 2011 12:59:04 PM

here is a better explanation from a review and reiterates what enzo has stated

AMD FX-8150 – why so bad?
Apart from the idle power draw of the FX-8150 – which we’ll point once again is an excellent achievement by AMD considering that the FX-8150 is a high-performance desktop part and its rival Core i5-2500K and Core i7-2600K are both essentially power-efficient laptop processors that have been beefed up a little for desktop PCs – the results show AMD’s latest CPU to be awful at everyday, consumer applications.

It’s a lack of single-threaded performance that holds the FX-8150 back – its efforts in our single-threaded image editing test were dire compared to every other processor on test. Even worse, this supposedly 8-core CPU running at 3.6GHz was hardly much faster than a six-core Phenom II X6 1100T running at 3.3GHz in heavily multi-threaded applications that saturate all available execution cores. In Cinebench R11.5 and WPrime – applications where a 8-core CPU should dominate a 6-core (let alone a quad-core) – we saw a lack of performance.

The answer, we think, comes from Bulldozer’s history. We started this review with a brief history lesson for a reason: we really believe that Bulldozer was intended for servers and workstations, not desktop PC running consumer applications. The lack of grunt-per-core doesn’t matter too much in a server or workstation, as most professional applications are n-threaded and balance that load evenly to saturate every core available. Furthermore, it’s widely assumed that there will be an Opteron based on the Bulldozer design that incorporates eight modules, for 16 execution cores. Bulldozer, we believe, is built for massive parallelism.


http://www.bit-tech.net/hardware/cpus/2011/10/12/amd-fx...
m
0
l
November 15, 2011 1:46:11 AM

Thanks for the posts!

I never expected bulldozer to have good single thread performance. But with 8 cores, I thought it will do much better in high CPU load apps. With 16 cores @ 2.5 ghz, I think I'll finally see the performance I was looking for in a high thread/high load environment. Other performance factors will get uncorked, but that may take a year or more.
Too bad there was no repeat of "The AMD Hammer x2 hits HARD!" in the news..
m
0
l
a c 131 à CPUs
a b À AMD
November 19, 2011 7:27:26 AM

Well, the server benchmarks are up:
http://www.anandtech.com/show/5058/amds-opteron-interla...

Seems it is equal or better than magny cours in virtually all tasks, with an exceptional increase in encryption and decryption performance. Sadly, I cannot see the performance and power efficiency gains being worth the upgrade cost for many. That's just my opinion based on the applications and testing methods in this review, though.
m
0
l
a b à CPUs
November 19, 2011 2:11:11 PM

enzo matrix said:
Well, the server benchmarks are up:
http://www.anandtech.com/show/5058/amds-opteron-interla...

Seems it is equal or better than magny cours in virtually all tasks, with an exceptional increase in encryption and decryption performance. Sadly, I cannot see the performance and power efficiency gains being worth the upgrade cost for many. That's just my opinion based on the applications and testing methods in this review, though.


I thought Interlagos was a drop-in & BIOS flash replacement for MC?

Anyway, we'll see in January how AMD fares this quarter in server sales. If the downward trend continues from the current 4.9% of the market, despite Interlagos being available for the full quarter, then I'd say it was a failure.
m
0
l
a c 131 à CPUs
a b À AMD
November 19, 2011 7:39:08 PM

2400866,18,105836 said:
I thought Interlagos was a drop-in & BIOS flash replacement for MC?

Yes, it is. Even that being the case, the processors themselves will still cost money. Though it is definitely the obvious choice in new systems over the old 12 cores.
m
0
l
!