AMD to bring SMT/CMT Solution to Bulldozer

Fuell

Distinguished
Aug 29, 2010
20
0
18,510
I will not be bringing you any new information really, but the pieces are there. Just read.

AMD will likely implement their own form of SMT in Bulldozer, and it will be close to the CMT solution rumored early on in the year.

I'm sure those of you reading this are anxiously awaiting legit new information on Bulldozer that can give us a better guess on what kind of performance to expect, and there was a lot of talk about AMD finally implementing SMT, but in their own way. That way was rumored to be CMT (Cluster-based Multi Threading). I will assume that those reading here are at least a little familiar with SMT and CMT. (If not, google is your friend)
As I said, there were many rumors going on early this year with speculation about AMD using CMT, and how it would be done and perform. But with new information released, AMD hasn't given us a definitive answer, but they have strongly indicated that there will be 1 thread per core. I was disappointed when I read that. Until I realized something...

Well, it seems like BD will have a form of SMT, but it will not run more threads than cores. Oh, and the performance of the SMT solution should be much more scalable than intels when relating threads to performance. So, how will AMD do this?

Its very simple actually. Readers here know very little about BD really, but thats because AMD is very tight lipped about this new arch, and for good reason. It will be a monster! (Imho)
BD will feature up to 8 cores per CPU for desktop, this mean 4 modules, 2 cores per module. AMD has also stated outright that each core will run a single thread, no more. So how could they do SMT/CMT? The answer could easily be found in Thuban.

Thuban is the 6 core desktop chip in the Phenom II line of CPU's from AMD, and currently AMD's flagship line. Each 6 core CPU has some new features not found in Phenom II x4 CPU's. Some of these features are an early version of Turbo Core (basically c'n'q in reverse). This is where 3 or more cores basically shut down when not being used so that the few threads actually being used can be overclocked for better low thread performance. Bascially if you need to run 2 threads, rather than waste 4 of those 6 by doing nothing, you shut idle cores to overclock the used cores of each module, thus increasing performance inside the same power consumption and thermal levels. This is where the modular design of BD, along with the turbo core feature come together to form AMD's very nice solution to SMT.

As we all know, CPU's have hit a speed wall at about 4GHZ, so we then decided that the best way to achieve more performance, is to add multiple cores. The problem though, is that not all software is threaded to take advantage of the extra cores, so the performance is stuck at the number of cores/threads and speed of those cores.

This is where it gets exciting to me.
AMD is clearly aware that software for mainstream applications are not threaded beyond a few cores, and in most cases, a quad core is more than you will ever need, and this will be the case for some time, till higher core count CPU's become the norm and software vendors are forced to thread more to increase performance(no more relying on hardware alone to get performance). Bulldozer won't bring us 16 threads per 8 core CPU, but it could in theory run 8 threads at turbo core speeds. This is where its gets blurry in describing this to the average Joe. Sure an 8 core CPU can run 8 threads anyway, why should I care right? Lets say theres a BD 8 core 2.8ghz CPU with a turbo rating of 3.4ghz. In standard practice the AMD CPU would use its 8 real cores to process 8 threads. AMD's SMT/CMT solution will, in a way, run 8 threads on 4 turbo'd cores. Huh?

Since each BD module shares its resources, and the resources allow for 2 threads, they can "turn off" one core in each module, turbo the core, and run 2 threads across (because each BD module will have the resources to accommodate 2 cores remember). This means that rather than run 8 threads on 8 cores at 2.8ghz, you'll basically be running 8 threads on 4 cores at 3.4ghz, in similar power consumption and thermal levels.

So with software playing catchup in threading to accommodate the use of higher core count CPU's BD will launch during a time where quad core support of threads should be on its way of becoming the norm, anything beyond 4 cores will likely be few in numbers and used by much fewer people. AMD needs to target the maximum number of customers to maximize their profit and gain market share.

If benchmarking software is threaded the way it is seen today, benchmarks might favor SB noticeably, but in real world performance and in the 2-4 thread "sweet spot" BD's modular design and new form of Turbo Core should be more than a match for Intel. SB isn't doing anything major in terms of changes from nehalem, so performance increases will likely be modest. Bulldozer on the other hand is a completely new arch with a very unique way of doing things. And if everything goes well for AMD, this should be the answer they were looking for.

And yes, I am aware that many of you have likely firgured this out and I am not the first to post this, but to be honest, I haven't been able to come across anything myself that spells it out like this. Correct me if I am wrong. But for those of you who didn't see this info elsewhere, I hope this has been a good read.

No new info so no sources, all facts above are pretty common now, only my opinion of the BD SMT/CMT implementation is any different, and thats just what I think. So the only source for new info is my mind. Yay!

------------------------

Edit> After thinking about this I asked myself, "then why would they even run it at 8 core stock speeds?" Well I think that is due to power consumption/thermal levels. When I first wrote this post I had it in my head that both modes would use basically the same power and generate about the same heat. I don't think this will be the case however. After thinking about it I thought this: Maybe running at stock speeds they can maintain a lower TDP and produce less heat. But, by using the Turbo mentioned above, one core is almost turned off, only doing what it needs to do to allow the module to still process 2 threads. The other core which will be the main functioning core in the module in Turbo will bring in extra power and generate more heat. This would be a reasonable explanation for having 2 modes. Stock to pass 8 threads at lower speeds with lower TDP and heat (I'll use 95w as an ex.) And Turbo mode will be a much faster mode but will increase TDP and heat(125w for ex.). A CPU that can be tuned for the job at hand, sounds like the best of both worlds.

Bulldozer will be the Jack of all Trades, but the Master of None.
 
you shut idle cores to overclock the used cores, thus increasing perfornce inside the same power consumption and thermal levels. This is where the modular design of BD, along with the turbo core feature come together to form AMD's very nice solution to SMT.
How is that remotely related to SMT?

Since each BD module shares its resources, and the resources allow for 2 threads, they can "turn off" one core in each module, turbo the core, and run 2 threads across (because each BD module will have the resources to accommodate 2 cores remember). This means that rather than run 8 threads on 8 cores at 2.8ghz, you'll basically be running 8 threads on 4 cores at 3.4ghz, in the same power consumption and thermal levels.
In standard practice the AMD CPU would use its 8 real cores to process 8 threads. AMD's SMT/CMT solution will, in a way, run 8 threads on 4 turbo'd cores. Huh?
I'd love to see how you explain that being able to increase performance with 8 threads. Sounds to me like it would decrease performance vs 8 threads on 8 cores. If I am wrong, then what is the point of having 8 cores and why doesn't BD just always run in the way you describe? Because according to you, it would increase performance in both multi-threading and single threading.

As we all know, CPU's have hit a speed wall at about 4GHZ
What are you talking about? The "wall" was about 2.5GHz when AMD came out with their first dual cores. About 3.0GHz in 2007 and about 4.0GHz today. That means there is no wall so far. Also, GHz is not really a measure of performance; it is only related. Don't forget IPC, which has been improved at a much faster rate than clockspeed.

Maybe I'm just misunderstanding what you are trying to describe.
 
OP:
AMD has also stated outright that each core will run a single thread, no more.

and

Since each BD module shares its resources, and the resources allow for 2 threads, they can "turn off" one core in each module, turbo the core, and run 2 threads across (because each BD module will have the resources to accommodate 2 cores remember).

Assuming even AMD has not yet found a way to actually run a thread on a "turned off" core, then I don't follow how it can run 8 threads on 4 cores, given the first statement.

Admittedly I haven't paid a lot of attention to AMD's CMT vs. SMT statements, but from what I've gleaned here & there, it seems AMD is using 2 full integer pipes in each module, as opposed to Intel's solution of just adding extra hardware to enable a single integer core to switch to another thread. AMD's full extra pipe is an additional 12% die space per module, vs. Intel's <5% per core (module). As for performance, AMD is promising a lot more than 12% boost in multithreaded apps, whereas Intel's solution seems to get between -5% and+25%, depending on how many free clock cycles are in each thread being executed.

However in the future when more software is better threaded, I would think the solution would be to throw more complete cores (or AMD modules) at it. This is why AMD introduced the 6-core 1090T Thuban, and Intel the 6-core Westmere CPUs, for those who can use the extra full cores on the apps they run.
 
Tho, this half ways solution is a good choice, for thermals and power usage, die size etc, plus it may be very versatile, for many solutions, and somewheres I saw a graph showing its FP to be climbing higher than previous graduations/generations, meaning more of a bump than even the Int
 
G

Guest

Guest
For some reason Toms won't send me a conf email... had to make a new account to reply...

Ok, first off I don't claim this is fact, its my wording of how I think AMD will address the SMT issue. I'll also say that its obvious that I have a hard time to put my words on paper... thats why I rambled the same stuff over and over.

Enzo: When I said "turn off" I didn't mean it actually turned off literally... thats why I used the quotes. It'll be similar to Thuban I think, where the other cores underclock I guess...
I'm not saying that this is SMT either, but AMD's answer to intels SMT solution. As fazer stated, intels SMT adds only a small percentage of extra performance to each core that has SMT. For ex. lets say intels SMT HT makes each core work like 1.2 - I'm assuming that with AMD's solution using Modules, they will have better scalability. Instead of using SMT for a 0.2 boost, AMD could OC 1 core in the module while underclocking the 2nd. That enables the OC'd core to have the resources needed to pass 2 threads. But since the core is overclocked, its faster, and I'd best that a single core running 2 threads at 2.8ghz in SMT is significantly slower than 1 Module running 2 threads at 3.4ghz(again, both threads won't run at the given ghz, but the hardware is there to make this perform faster (2 physical cores to handle 2 threads with one thread handling most of the resources in Turbo mode, the other core underclocked but providing the hardware to allow 2 fully functioning integer pipelines.)
This obviously won't translate to 2.0 scalability compared to 1.2 SMT, but I'd think it would be significantly better than 1.2

I think you made your second point before I edited again, basically the Turbo mode would have the same threads, but essentially running at higher speeds, but drawing more power and causing more heat. Ideal for gaming or other intense proggys that need good performance for only a short period of time... Basically like how someone overclocks for gaming, but run lower clock for daily use like browsing... that sort of thing.

As to fasers, like I said, I didn't mean that 1 of 2 cores would literally go to 0 and shut off... I was just wording it that wasy for the reg joes... thought I made that clear, especially with the Thuban reference... in actuality, one core OC's and handles most of the module, the other core underclocks to save power/heat while leaving the ability for 2 threads.

I dunno, maybe I'm just confused as I've never been into this kind of thing before... never even knew what an integer piepline was till a couple of months ago... learning as I go, I'm just addicted to learning about this BD arch, and CPU engineering in general.

I wrote this at work and now I'm trying to get myself back into the same mindset... was a long day.

Basically 2 cores per mod - 1 underclocks to save power/heat, the other overclocks to have a faster speed core while pulling the mod's resources - OC'd core runs the show, UC'd core allows 2 threads. So without enabling SMT in the literal sense, they both have an SMT solution, while using real cores. So I'm kinda hoping that the module is designed so that the OC'd core running most of the module will run everything at the higher clock rate while the UC'd core is simply there to enable 2 threads on physical hardware. My whole theory banks on that hope. As I said I'm not an expert by any chance.
So I guess my question back at you guys would be, if the OC'd core took the modules resources and used the second core to hardware enable a second core, would they be able to run at the OC'd cores speed? Or is each individual thread tied performance wise to its own physical core?

But yea, just wanna say I'm no expert and none of this is being offered as an explanation to the BD module design, more of my own curious idea of how it could work and me asking if this makes sense... I'm starting to think it might not heh
 
G

Guest

Guest
Sorry, forgot to address this:

"As we all know, CPU's have hit a speed wall at about 4GHZ
What are you talking about? The "wall" was about 2.5GHz when AMD came out with their first dual cores. About 3.0GHz in 2007 and about 4.0GHz today. That means there is no wall so far. Also, GHz is not really a measure of performance; it is only related. Don't forget IPC, which has been improved at a much faster rate than clockspeed."

I didn't say or mean the 4GHZ wall was hit and then we went to dual core... but I'd use 4GHZ as my solid number as we currently have one heck of a time getting around that range per core. But as for there being no wall at all, there most certainly is. With current technology its too difficult to run cores at such high speeds as it will require too much power and produce too much heat to be cost effective for mainstream use. Single cores could get to 3+ when duals were released IIRC, but duals weren't immediately available at those speeds. It took time and engineering to get the process down right and to improve performance. When duals were at 3+ and quads were released, they were at aroun 2GHZ to 2.5-ish... So we now have Singles, duals, triples, quads and hexa's on the market, but how many are rated above 4ghz out of the box?
That is essentially why AMD decided to go with multi cores while Intel tried to get us flaming hot 10ghz CPU's with a TDP of 1K. There are limits to how much power you can/should draw and how much heat is acceptable, this is what is causing the wall. Might be exaggerating a bit, but you know what I'm saying...

And I haven't forgotten IPC, and I know theres more to CPU performance than the rated mhz. I simply tried to keep it simple, and not focus on every little detail. Cause I'm not writing a review/preview or anything like that, just voicing my ideas on what I think the main feature of BD's arch might be about... in a very simple nutshell.
 
^ OK, that makes more sense now. However I thought AMD could only turbo or downclock whole modules, not parts of one. Seems to me there might be some sync issues with the decoders and out-of-order hardware if they have to service an integer core at 3+ GHz and another at 1GHz.
 
G

Guest

Guest
"^ OK, that makes more sense now. However I thought AMD could only turbo or downclock whole modules, not parts of one. Seems to me there might be some sync issues with the decoders and out-of-order hardware if they have to service an integer core at 3+ GHz and another at 1GHz. "

This is where I'm mostly lost at how it will work. I certainly get your points, and they potentially throw my whole theory out the window. But if the OC'd core is controlling the resources of the module, couldn't the second UC'd core just be there for the extra int pipeline while the performance aspect of the work is handled by the much faster OC'd core?

I'm JUST getting into CPU engineering as I said, so although I understand how having 2 real cores with 2 int pipelines is much faster than 1 with SMT, I don't know the exact science behind it, therefore I have no idea if I'm making sense.

So I'm back at my other question, if 1 core took control of the modules resources at a higher clock rate due to turbo, would the other core be able to just allow the second int pipeline for the other core to essentially have 2 pipelines to work with? Or would underclocking the other core neg affect the int pipeline performance?

And I thought I read somewhere that since each module shared resources, that each core could be clocked differently depending on the needs. Is this not the case? I could swear I read somewhere John Fruehe said individual core control is possible, just like how I can clock each of my 1090T's cores now. Maybe I mixed up a few diff things I read and am running in the totally wrong direction.... I'm starting to feel like I'm in Gr.10 Accounting again where on a major test I messed up the second number in a massive chart which threw off every single calculation I did... Although the tech told me if that number was right I would have gotten a 92%, instead I walked away with 0.05%. Thats right, not 1/2 a percent, but have a single decimal percentage. I should get that framed :)
 

jf-amd

Distinguished
Mar 3, 2010
238
0
18,690
I don't believe the original poster has it right.

If the module is only supporting one thread, then the other core is idle. This is not as much about Turbo CORE boost and more about resources. In this environment the first thread has access to 100% of the shared L2 cache, so this can bring a good bump in performance.

Each core runs at stock speeds but can move up due to Turbo CORE, but we are not giving away details on that yet; I anticipate that will be around launch because that is the kind of data that helps the competitor figure out the performance of the processor.
 
G

Guest

Guest
So the module design is more about minimizing costs while enabling more cores to be added as they share resources? And that instead of all the mumbo jumbo I said its simply to increase lower thread performance?
I guess it runs the 8 threads at the estimated 1.8 rather than having Intels SMT of 1.2.

So its not that AMD is trying to make a full 8 core monster but a 8 thread monster due to running them through 8 real cores/8 int pipelines?
If thats the case, it makes a lot more sense, and I've been way, way off.

I think I get it now. Each module is not 2 cores sharing resources but rather a hardware enabled single core with another core/int pipeline for CMT. So the 8 core/8 thread BD CPU acting essentially how a Intel Quad core would work if SMT worked like 1.8 while dropping the neg aspects of intels current SMT implementation that can actually "trip" the core and slow performance. The extra core/int pipeline is just to keep the flow going uninterupted. If thats the case, this sounds impressive.

If it works similar to as its expected I would assume it would crush a Nehalem Quad. Maybe not crush, but be obviously in front. And with SB supposedly being a small improvement over Nehalem, I'm really excited about how these will compare in the real world.
 

yannifb

Distinguished
Jun 25, 2009
1,106
2
19,310
I have a feeling Bulldozer will have pretty high stock clocks. At this "Bulldozer 20 questions" thing by Fruehe on AMD's website, he specifically said that since BD's approach takes less die space and because of its power saving features, Bulldozer will have much higher clock speeds than current AMD offerings. That was close to word for word of what he said. And if he means higher stock clocks than chips like the 965, then that's going to be interesting.
 
I didn't say or mean the 4GHZ wall was hit and then we went to dual core... but I'd use 4GHZ as my solid number as we currently have one heck of a time getting around that range per core. But as for there being no wall at all, there most certainly is. With current technology its too difficult to run cores at such high speeds as it will require too much power and produce too much heat to be cost effective for mainstream use.
Ahhh. I misunderstood what you meant by wall. I thought you meant that even with the next CPU architecture we'd still be at a 4GHz wall.

So this image is a single bulldozer module (ie, two cores). I'm not sure, but from what I gathered, the two FP schedulers could act as a single 256 bit scheduler? Dunno, it's been a while since I read up on BD. I remember either anandtech or bitech having a very detailed article on the bulldozer architecture not too long ago.
Here it is:
http://www.anandtech.com/show/3863/amd-discloses-bobcat-bulldozer-architectures-at-hot-chips-2010/5

amd_2010_bulldozer-1023x575.png
 

Fuell

Distinguished
Aug 29, 2010
20
0
18,510
After reading a lot more stuff about the new Arch I think I know what BD actually is now... not what I posted in the orig. Not really sure how I came to think that way... thought I had a eureka moment when it was more like a highway pileup of info that somehow turned into what I thought was a real idea...
I know full well now that BD will have a turbo feature, not sure how its being implemented at all at this moment (haven't read any new articles in a week).
I also know its basically targeted at a 4 core/8 thread intel CPU as BD's design is to group 2 integer cores together in a module with shared resources. The shared resources cut down on redundancy and allows you to add a full 2nd integer core per "traditional CPU core" to have the physical resources to push 2 threads, while only increasing a fraction of the size (more performance with less die space)
So instead of being like intel and using a real core and a "pretend" core to run 2 threads, AMD will use one "traditional CPU core" with a second integer core attached to physically push a second thread. And while not scaling as well as 2 physical "traditional" cores, it should scale far better than intels 1 real/1 fake.
Intel - 1 "traditional" core / 2 threads : 1.2 - 1.3 performance (should be tweaked upward for SB)
AMD - 1 "traditional"core / 2 physical integer cores running 2 threads : 1.6-1.8 (speculation, but an educated guess on the limited info available)
Intel has boasted SMT in the form of HT for a while now and love to throw it in AMD's face. But its going to kick them in the butt big time cause AMD wasn't sitting there doing nothing while Intel continued to taunt AMD about lack of threading. So AMD watched, they examined and they found a better way, a much better way. And after a long wait we will see it.

AMD is always playing catchup, BD should really be facing Nehalem, but AMD fell behind and now have to face SB. But AMD didn't just upgrade their arch, or go for the easy gains, cause for intel, most of the low hanging fruit have been picked. AMD knew this, and came up with what I feel is a revoutionary way of designing CPU's. Modules are the future. Although each int core doesn't have its own full resources to call it a CPU core as we know it today, it really doesn't need it. AMD found a great middle ground between SMT and more cores, and its a beautiful one at that.
B4 BD came to be and changed the way we look at a CPU (even more so), we might look at it as an 8 core BD vs a 4 core SB with SMT. When you think about it in that perspective and think about the speculated 1.2-1.3 of intels SMT vs the 1.6-1.8 of AMD's extra int core per "CPU core" and you get a very one sided battle.
If it turns out anywhere close to AMD's favour as it looks to be, and AMD will have a huge winner on its hands.
I HOPE this is another Athlon 64 X2.
For AMD to be even close to in this fight is beyond me. When you think about how Intel brought us x86 and how AMD was picked to be the #2 supplier, how Intel did everything they could right from the start to handicap AMD using both legal and illegal methods, all of which were unethical given the agreement. Think about how much marketshare intel illegally built up, that marketshare cannot just be given back to AMD so its become a perm handicap from which AMD has had to innovate its way out of.
For everything that has happened, with AMD being a generation behind, we're still looking at BD as being a contender against SB, if not better. We'll wait for real samples to say which is better though (I gotta say I'm leaning AMD though to be honest, but so few facts are out there).
How can AMD even compete with so much holding them back? I don't know. But this to me demonstrates that AMD is a much better company. Not only do they have a smaller marketshare, they have a much smaller company, smaller R&D, less engineers, less fabs, less everything, yet they stay neck and neck with a much richer, resource heavy Intel.
If you look at how much time and money each company spent and what they are producing for the price/time/resources, Intel should be years and years ahead, but they aren't. In fact they had to play catchup on more than one occasion, and that is funny to me.
If BD is what we hope it will, Intel SHOULD be releasing a 16 core/32 thread 6.5GHZ per core CPU. Not really but I like to take shots at Intel. But to be clear, I'm not an AMD fanboy, I just hate Intels business practices.
 

Fuell

Distinguished
Aug 29, 2010
20
0
18,510
I think most of the people on forums like this think that way ares, but we do like to discuss the possibilities or try to analyze the small amount of info we do have. I might make a lot of assumptions or guesses like many others, but I fully disclose that its pure speculation or opinion.

But just because we don't know all the facts doesn't mean we should just shut our traps :p It's fun and even if your totally off, the discussions help people better understand the engineering aspects. I've never even read about FPU or Int cores or int pipelines or anything till BD caught my eye. It just seemed so cool to me, I had to learn more.

I'll agree that most of the talking is speculation or guessing, educated guess or not, but its far from useless.
 

ares1214

Splendid
Oh dont get me wrong, it is fun and all. However, it seems every so often, everybody changes their mind, like BD went from being praised as an architectural materpriece to maybe not as good as intel in just a week. Maybe because of the Sandy Bridge info and the news, more like gossip, of it being delayed. The best person to ask is JP, and he has said FP, while may look weak, will be massively improved. From all info we have now, we know for more or less sure:

FP will be a big jump. Keep an eye out for a Bulldozer blog about FP in the upcoming weeks. I am about halfway through it at this point. I have some engineers that are working with me on it because floating point micro ops are not my strong point, I am not an engineer.

■ Its 32nm
■ 4 modules, 8 threads, for more or less 6 full cores
■ Likely fairly high clock speed due to pipes
■ AM3+
■ Seems to be refined Turbo Core
■ Really quite good thermals and power consumption
■ Sampling begins 2010 Q4, release sometime in 2011
■ And we know we dont know too much

Beyond those, we really dont know much at all, and we dont even know enough to make accurate predictions from just a photoshoped die shot and a few arch slides.

 

ares1214

Splendid
Just like with Hyper Threading. Sure you have 8 threads, but those 8 threads are sharing the 4 cores resources. So while it is a 8 thread, its closer to 5-6 core performance in only a few things that utilize it, and utilize it well. Here, you have 2 Integer Schedulers, and 1 FP scheduler. While if you had 4 modules, you would think, 4 modules, 2 cores, 8 cores. Not so. Each module isnt really 2 full cores, as it is just a different (and far better) take on HT. They share L2 cache, front end engine, and FP. Therefore a "dual core" module is really sharing resources, and therefore doesnt really equal 2 cores. 1 module looks to be no less than 1.5 real cores equivalent, and maybe getting up to 1.75. Do the math, and 4 modules, you get basically 6-7 core performance. However it is a much more efficient way of doing things. While it isnt exactly a true octo core, it cuts down on costs a lot, as well as heat and energy consumption. Other things can be done to get octo core performance. Like i said, we dont know enough about this to say. At first glance, you think it would be weak on FP, as it has 1 for 2 Integer schedulers. Now fruehe is saying it will actually be strong with the FP, which one would think makes no sense. It all depends on a lot of things. If you ask me, AMD hit this one out of the ballpark in theory, they just have to get it out there, and get it at a good price. All we can do is wait, see benchmarks, and wait for further explanation.
 

ares1214

Splendid
Also, AMD was going into some fairly crazy talk, and i think this is what you were getting at. We all know turbo core shuts down x cores, bump frequency up x amount. What AMD was talking about was how several parts of the CPU remain idle at points in time. AMD wanted to put these together, and do some complicated form of shutting down parts of CPU's, and making a 4 module/8 thread setup transform into a super 4 core setup, by disabling parts of the modules, then combining them together. Ill try to find the link to what AMD was saying, but its a proposition that could dramatically increase efficiency.
 

Fuell

Distinguished
Aug 29, 2010
20
0
18,510
Ok I wasn't sure how you meant 6 cores. But I get what your saying. Same thing I think, just different wording. I basically sum it up as a module = 1 core with physical SMT. Thats how I'd explain it to someone who only knows the basics.
The problem is AMD has changed the way we view the CPU with BD so much that its hard to make a universal explanation of how this will work.

I think its closer to 8 cores than 4 cores with SMT. If you had to pick one of the two. Because although each module doesn't have the resources to run 2 independent cores, I don't think its necessary. We all know that not every part of the core is in constant use, some parts get next to no use.

Right now its hard to identify what actually is a "core". So AMD decided that since most calculations are integer based, that is the best choice. So AMD looked at what was needed resource-wise for integer "cores". In using this approach you can merge the resources of 2 integer cores as having separate resources for each integer core (IC) was redundant and wasted die space. In merging the resources of 2 integer cores you decrease the amount of die space required per IC dramatically while maintaining very high throughput. This means that each IC is getting the resources it needs almost all the time, while working and acting similar to 2 cores.
2 IC's sharing resources allows for 2 threads to be passed simultaneously with a close relation to 2 separate cores.
Intel uses a single IC with its own resources to force 2 threads across. And although it can work at a decent gain, it can be almost useless or even cause decreases in performance.

It's like trying to get 2 people from A to B using bikes. Sure you can put two people on a single bike and get both people to B quicker than 1 at a time, but one or both riders could fall off or slow down the trip.
But AMD decided that they don't need 2 bikes to get to B, so they designed a double bike. Now both riders get from A to B much faster, but not quite as fast as if each had their own bike.
But If you look at what's expected from both CPU's (SB and BD) its hard to think of a 4 core/8 thread SB competing with a 4 module/8 integer core BD.
Based on that, BD should thread better almost guaranteed, while Intel may or may not have a decent lead in clock for clock performance. If AMD can stay close in c4c they should win. But thats just my opinion based on speculation from limited information.

So for the record I THINK BD will beat SB soundly, but I'll hold my tongue till the facts are out.
 

bobdozer

Distinguished
Aug 25, 2010
214
0
18,690
Saying 6 cores is false, off by a long long way. AMD has said numerous times that the 2nd BD core in a module is worth 80% of the first when resources are being shared.

That means 8 cores is worth 7.2 cores, using up the die space of 5 cores. It's NOT hyperthreading and never will be.