Sign in with
Sign up | Sign in
Your question

Bulldozer Releasing in q2 2011, and some more info

Last response: in CPUs
Share
November 9, 2010 9:39:32 PM

Well heres the post from JF:
http://blogs.amd.com/work/2010/11/09/server-highlights-...

What i took note of the most were the facts that the 8 core BD chip has a smaller die size than the 6 core Thuban, the larger and faster l3 cache, better memory controller for 50% more throughput, and a 500 Mhz Turbo Core across all threads.
November 9, 2010 10:21:48 PM

yannifb said:
Well heres the post from JF:
http://blogs.amd.com/work/2010/11/09/server-highlights-...

What i took note of the most were the facts that the 8 core BD chip has a smaller die size than the 6 core Thuban, the larger and faster l3 cache, better memory controller for 50% more throughput, and a 500 Mhz Turbo Core across all threads.


I'm clearing some desk room now. TB on all threads? Sweet!
m
0
l
November 9, 2010 10:28:01 PM

Just waiting on the low power notebook parts - will try to keep my atom running till then . . .
m
0
l
a c 84 à CPUs
November 9, 2010 10:53:53 PM

Quote:
What i took note of the most were the facts that the 8 core BD chip has a smaller die size than the 6 core Thuban


Probably because what makes up a "core" changes when you talk about BD. I'm still worried that not having as many FP cores is going to hurt them. As for everything else it reeks of marketing speak. Larger/faster L3/memory is nice. But if its not faster then SB then who really cares? Doesn't need to be faster as long as its competitive.
m
0
l
November 10, 2010 12:39:35 AM

Actually, in 128-bit mode, we have 1 FP per core. Just like intel. When it comes to doing 256-bit AVX, we share resources. Just like intel. We combine 2 FPs, they combine an FP and INT pipelines to get there.

If we don't have "real cores" because we share, then neither does intel.

This "not real cores" argument is getting tired, it's not getting any traction and people realize that. Both AMD and Intel are getting creative to try to bring new technologies to market. Those that insist that technology changes will never get any innovation. Both companies are innovating and it is time to look forward, not backwards.
m
0
l
Anonymous
a b à CPUs
November 10, 2010 1:31:50 AM


I was going to go out and snap up a Sandy Bridge platform once it becomes available in Jan...

Now AMD is saying bulldozer is coming out in Q2 I guess I will wait a bit more.

What I'm really waiting for is single thread performance number vs SB as most apps I use is still single thread. From what I see BD does have a chance to beat SB in multi thread apps but my impression is SB would still destroy BD in single thread performance...

The only reason I'm holding off till Q2 is that I've watch a tech interview on youtube with someone from AMD saying somehow you could "put more resources" to single thread app... This gives me hope for BD single thread performance and is a good enough reason to hold off until Q2

(putting more resources to single thread @2:30)

http://www.youtube.com/watch?v=wxbG2AmdMNY




m
0
l
a c 84 à CPUs
November 10, 2010 3:20:21 AM

Quote:
Actually, in 128-bit mode, we have 1 FP per core. Just like intel. When it comes to doing 256-bit AVX, we share resources. Just like intel. We combine 2 FPs, they combine an FP and INT pipelines to get there.


I don't understand, can you educate me please?

Quote:
This "not real cores" argument is getting tired, it's not getting any traction and people realize that. Both AMD and Intel are getting creative to try to bring new technologies to market. Those that insist that technology changes will never get any innovation. Both companies are innovating and it is time to look forward, not backwards.


All I said was that the die size is smaller because of how a "core" is changed. You can have an 8 core BD be smaller then a 6 core K10 (?) because of the missing FP cores. And by no means do I think this is a bad idea. I do wonder in which cases this will hurt compared to the old chips. But I think this will end up being a great thing. More so once Fusion is working properly and all FP math will be passed to the GPU part of the chip.

I feel that very exciting times are coming. Once its all worked out that is.
m
0
l
November 10, 2010 10:27:33 AM

Intel has a 128-bit FPU. When they have to do AVX, they steal the other 128-bit (to get to 256-bit) from the integer registers.

The FPUs are actually not the huge die savings, there is more about the front end. Current designs have 1 128-bit FPU per core, Bulldozer will have 1 128-bit FPU per core as well, so we have not reduced size based on FPU, we reduced size by the reduction in transistor size (45nm to 32nm) and the front end optimizations.

m
0
l
a c 84 à CPUs
November 10, 2010 10:43:06 AM

Quote:
Current designs have 1 128-bit FPU per Int core, Bulldozer will have 1 128-bit FPU per bulldozer core as well


Fixed? Thats what I mean by the reduction in the number of FP cores.
m
0
l
November 10, 2010 8:43:46 PM

4745454b said:
Quote:
Current designs have 1 128-bit FPU per Int core, Bulldozer will have 1 128-bit FPU per bulldozer core as well


Fixed? Thats what I mean by the reduction in the number of FP cores.

No - if you look at the diagrams, each half of a bulldozer core has its own 128 bit fpu, and they are right next to each other and so can be combined into a single 256 bit unit. So, depending on how it is working, bulldozer will have either (1) 128 bit FPU per 'small core' or (1) 256 bit FPU per 2 'small cores', the two small cores being the two parts of the larger unit which bulldozer is built of - I forget what it's called.
m
0
l
November 10, 2010 10:20:32 PM

Magny Cours is a 12-core processor with 12 128-bit FPUs.

Interlagos is a 16-core processor with 16 128-bit FPUs.

that is a 33% increase in FP units and a 1:1 core to FPU ratio.
m
0
l
a c 84 à CPUs
November 10, 2010 11:45:24 PM

Quote:
So, depending on how it is working, bulldozer will have either (1) 128 bit FPU per 'small core' or (1) 256 bit FPU per 2 'small cores', the two small cores being the two parts of the larger unit which bulldozer is built of - I forget what it's called.


Thanks, that might be the education I needed. It at least makes some sense, and jives with what we have been hearing. So again, my question becomes how and/or when will this come back to bit AMD? Is most of the time spent working in 128bit and 256 is hardly ever used? I would imagine that most of the time is spent doing FP math. BTW, Intel won't have the problem of pulling a second unit to do 256bit as SB will have 256bit units.

http://electronicdesign.com/article/digital/intel_s_avx...
m
0
l
November 10, 2010 11:58:38 PM

FP can be either 128-bit (supported today with essentially all SW) or it can be 256-bit once the new platforms come out.

But to take advantage of 256-bit, your applications need to be recompiled.

In 128-bit mode:
SB = 8 128-bit FPUs
BD = 16 128-bit FPUs

In 256-bit mode:
SB = 8 256-bit FPUs
BD = 8 256-bit FPUs

So, actually, the thing that "comes back to bit(e)" someone is intel, on existing legacy code for future non-AVX code.
m
0
l
a c 84 à CPUs
November 11, 2010 12:11:47 AM

Quote:
In 128-bit mode:
SB = 8 128-bit FPUs
BD = 16 128-bit FPUs


16? An 8 "core" BD CPU will have 4BD modules. This means they will have 8 Int cores, and 8 128bit FP cores, or 4 256bit cores. Where do you pull 16cores from?
m
0
l
November 11, 2010 12:26:18 AM

From a 16-core server product vs. their (top) 8-core server product.

Client SB will be 4 core, client BD will be 8 core, so cut all of those numbers in half, you net out the same.
m
0
l
November 11, 2010 1:44:22 AM

jf-amd said:
From a 16-core server product vs. their (top) 8-core server product.

Client SB will be 4 core, client BD will be 8 core, so cut all of those numbers in half, you net out the same.

Cool. Thanks for explaining!
m
0
l
a c 84 à CPUs
November 11, 2010 3:09:41 AM

I got the feeling you were talking about server chips. Are you talking about a 1P, 2P, or 4P server? Without knowing how many packages you are plugging into the server its hard to figure out what your talking about. I certainly hope you are comparing the same number of packages, and not comparing a 2P BD to a 1P SB.
m
0
l
November 11, 2010 10:22:30 AM

Processor to processor.

Our 16-core 2P product will compete with their 8-core 2P product. I always do apples to apples comparisons.
m
0
l
a c 84 à CPUs
November 11, 2010 10:55:29 AM

So AMD is going to release an 8 BD module server CPU? 16int core and 16 128bit FP/8 256bit FP?

I wonder if Intel has found a way to "fuse" two 128bit calculations into a 256bit one, or in other words do two 128bit calc per tick. If they have then there is no advantage.

For the record I happen to love AMD. I'm sorry if I came off as someone who didn't I seriously hope AMD can get the performance crown, if even for just a bit.
m
0
l
a b à CPUs
November 11, 2010 3:08:58 PM

jf-amd said:
From a 16-core server product vs. their (top) 8-core server product.

Client SB will be 4 core, client BD will be 8 core, so cut all of those numbers in half, you net out the same.


JF, can you comment on some of the rumors going around that BD needs 2 clock cycles to perform a 256-bit AVX instruction, vs. Sandy Bridge's 1 clock cycle? If true, then it seems to me BD's throughput will be around half that of SB.
m
0
l
November 11, 2010 4:58:53 PM

4745454b said:
So AMD is going to release an 8 BD module server CPU? 16int core and 16 128bit FP/8 256bit FP?

I wonder if Intel has found a way to "fuse" two 128bit calculations into a 256bit one, or in other words do two 128bit calc per tick. If they have then there is no advantage.

For the record I happen to love AMD. I'm sorry if I came off as someone who didn't I seriously hope AMD can get the performance crown, if even for just a bit.


No. the only way to do that is with AVX, which is how both of us are getting to 256. They don't have the registers to do that in 128-bit mode because both 128-bit executions would be looking for the same ports.

That is the whole idea behind AVX and 256-bit FP.
m
0
l
November 11, 2010 9:50:37 PM

I think that for the client side, BD will be amazing for gaming. Like JF mentioned in one of his blogs, when a program doesnt use all of the integer cores, a single module can act as a more powerful core, which is good for games that only use up to 4 threads (since the client BD will have 4 modules). As for games that do use 8 cores, BD will also shine i think. Plus since games are mostly integer (i think) that helps.
m
0
l
November 13, 2010 4:17:34 AM

I still feel torn. I am trying to decide whether to just go for the intel i7 now during black Friday or wait for the AMD bulldozer. There are no specs to compare performance vs the hex-core intel. And there was a comparison of the AMD hex core to the intel quad core, where intel still slightly outperformed AMD. Does anyone know about any expected performance difference vs their current chips?
m
0
l
a c 84 à CPUs
November 13, 2010 7:30:50 AM

We can guess, but its really just a guess. We know more about the performance of SB then BD. We know some things about BD, but there is so much we don't know that effects the performance of the chip that it could go either way. I personally believe that it will be better then the PhII, but not quite enough to catch SD on the high end.
m
0
l
November 13, 2010 6:35:07 PM

jf-amd said:
Magny Cours is a 12-core processor with 12 128-bit FPUs.

Interlagos is a 16-core processor with 16 128-bit FPUs.

that is a 33% increase in FP units and a 1:1 core to FPU ratio.

So do you have any figures for performance increase vs the current Phenom II x6 ? And any idea about price range for the high end processor to compete with the i7's performance?
m
0
l
November 13, 2010 11:13:05 PM

No performance until launch. That is standard company policy.
m
0
l
November 14, 2010 1:56:53 AM

I certainly appreciate Mr. Fruehe's time and information but I'm not clear on one aspect. Am I correct in assuming that the vast majority of current software is not optimized for the 256 bit aspect of Bulldozer? Would the maximum performance of the Bulldozer CPU be restricted only to specific software that is directly written for it?
m
0
l
November 14, 2010 2:10:05 AM

According to the information he has given in this thread, Bulldozer will be just as effective in 128 bit mode as 256 bit mode.
m
0
l
November 14, 2010 2:12:20 AM

Thank you for the clarification. However, then, it begs the question as to what the advantages are for the 256 bit mode.
m
0
l
November 14, 2010 10:59:18 AM

Actually no software is optimized for 256-bit FPU today because nobody has it. Once platforms are out you will see applications start to come on line.

Look at the move to 64-bit. First the platforms got there. then the OS vendors got there. Some at launch, some after hardware was available. Then the applications got there.

With AVX it will be the same thing. Apps that are very FP heavy will probably migrate first, the others will follow eventually.

FP commands are actually 32-bit. 128-bit FP allows you to handle 4 at once. 256-bit will allow you to handle 8 at once.

The fact that we can do 256-bit FPU by merging FP units means that our 256-bit throughput and 128-bit throughput will be the same, just executed in different ways.

In simple terms, we can execute 64 single-precision floating point executions, whethere we are in 128-bit or 256-bit.

Sandybridge (on the server side) will be able to do 32 single precision executions in 128-bit mode and 64 in 256-bit mode.
m
0
l
November 14, 2010 1:54:36 PM

I would like to thank you for taking the time to answer my query and I appreciate your time and effort. I always buy close to the top of the line processor for my own system and never keep a CPU for more than two years, so it's important for me to know that during the life span of that system I will be able to gain its advantages. I'm emphatically not clear on the FMA3 incompatibilities between the future Intel offerings and Bulldozer, but from what I understand the implementation which Intel is taking will be different than Bulldozer's. I'm hardly an expert in these matters therefore my question may seem improper: Will Bulldozer require different software than, say, Ivy Bridge? If so, to what extent? Thank you.
m
0
l
November 14, 2010 7:07:44 PM

AMD is utilizing FMA4 in 2011. Intel will (supposedly) add FMA3 with Haswell, which is sometime well after us (I don't know the year).

FMA is fused multiply accumulate, so it need 4 variables A = B + C x D.

With FMA4, you utilize 4 registers, so the operation is "non destructive", meaning it can all be done in one cycle.

FMA3 only utilizes 3 registers. This makes it easier to program, but unfortunately only using 3 registers means that it is "destructive", it has to erase one register and replace it with a new variable. Because it has to go through the destructive step, it takes an extra cycle to complete an FMA instruction.

So, AMD is doing the harder one earlier, intel is opting for the easier (and less powerful) one later.

But to keep compatibility we will also support FMA3 in the future, about the same time that Intel supports it. So, theoretically, software should be fine. Software compiled specifically for intel will have only FMA3 support in the future. Software compiled for AMD will support both FMA3 and FMA4.

I am actually not sure that ivy bridge will have any FMA support.
m
0
l
November 16, 2010 12:46:23 PM

Thank you again for your reply. I'm still not clear on one aspect, though. Let's take a Core i7 2600 (which I plan to buy within days of its premiere) and a first edition Bulldozer. Am I correct in assuming that the software (let's say Photoshop CS5) will run at its maximum capacity on the Intel, but will require programming modifications to do so on the AMD?
m
0
l
November 16, 2010 2:09:04 PM

No. Intel is adding new instructions as well. Both platforms would require software modification to support 256-bit floating point (AVX).
m
0
l
November 16, 2010 2:11:49 PM

Thank you for the clarification. I appreciate your time and effort.

m
0
l
a c 127 à CPUs
November 16, 2010 11:10:24 PM

jf-amd said:
Processor to processor.

Our 16-core 2P product will compete with their 8-core 2P product. I always do apples to apples comparisons.


I would assume you are comparing it to current Intel 8 core parts seeing as we have very little on Intels next gen server parts besides that they plan a 10 core (based on Westmere EX I would assume).

It would be nice to get some numbers though. I think that hurts AMD in the long run since some ITs would look towards the future and plan and sometimes wont want to wait.
m
0
l
November 17, 2010 12:43:37 AM

No, in 2011 Intel will have Sandy Bridge in the 2P space with 8 cores and Nehalem EX in the 4P with 8 cores unless they pull in Westmere EX with 10 cores.

As for numbers, those IT customers get benchmark comparisons under NDA already.

We don't put them out in public until launch.
m
0
l
a c 127 à CPUs
November 17, 2010 4:00:17 AM

http://www.bit-tech.net/news/hardware/2010/09/14/intel-...

As I said, Westmere EX. BUT this is also a 4P part. Its set to fit in the LGA 1567 socket so it shouldn't be any trouble to BD server parts unless Intel got an idea to make it 2P. But that I doubt.

Still, I wouldn't hold Intel to its word any more than I do AMD or any other company. For all we know, Intel could be hiding something else. After all they did throw that 48 core based on terascale without anyone really expecting it:

http://www.engadget.com/2010/04/10/intels-48-core-proce...

Still you never know. Companies like to be tricky. Especially tech companies.
m
0
l
November 17, 2010 10:36:22 AM

Highly unlikely. In this business you work on roadmaps and roasmaps become pretty public no matter how you try to keep them confidential. You can't get OEMs to build platforms around a processor unless you are disclosing everything to them.

For instance, if I came in at the end of the launch process and brought in a new bin speed that was 500MHz above my top speed a week before launch, the OEMs would not touch the part beacuas it has not been through their qualification process and all of their testing.

Yeah, Intel sprung a 48-core product on the world last year. Here is a complete list of all of the platforms you can buy it in:




There are no surprises in this business in terms of parts. There are surprises in schedules, surprises in performance and surprises in price.

There aren't even surprises in features because if we want the features supported by software we have to starte telling people years ahead of time. And word gets out.
m
0
l
a c 127 à CPUs
November 17, 2010 7:56:28 PM

^Oh I know you cannot buy the 48 core CPU. Its been set out for testing and is geared towards cloud computing. All I was saying is that although roadmaps are the normal for Intel and AMD, they are not 100% bound to it.

And considering that SB is already out to devs I doubt them pushing more cores or anything off of the same arch would be improbable. Most software that can utilize more than 4 cores tend to be optimized for quite a few more.

Still I wouldn't be suprised if either has a suprise in store. I think it would be nice TBH. Sometimes its nice to see something so random thats nice than to see something months ahead and go through the BS hype.
m
0
l
November 17, 2010 9:07:39 PM

Anything inside of 12 months is locked down solid because you have masks and validation to deal with.

The fastest we ever moved was ~20 months in terms of adding a new product to the roadmap and getting it to market.

Typically you tape out ~12-16 months prior to launch, and at that point everything is locked. There is nothing added, only things that the vendor has been quiet on - but they were there the whole time.

Nobody can add more cores in less than 18 months. In this business you place your bets very early.
m
0
l
!