I will not be bringing you any new information really, but the pieces are there. Just read.
AMD will likely implement their own form of SMT in Bulldozer, and it will be close to the CMT solution rumored early on in the year.
I'm sure those of you reading this are anxiously awaiting legit new information on Bulldozer that can give us a better guess on what kind of performance to expect, and there was a lot of talk about AMD finally implementing SMT, but in their own way. That way was rumored to be CMT (Cluster-based Multi Threading). I will assume that those reading here are at least a little familiar with SMT and CMT. (If not, google is your friend)
As I said, there were many rumors going on early this year with speculation about AMD using CMT, and how it would be done and perform. But with new information released, AMD hasn't given us a definitive answer, but they have strongly indicated that there will be 1 thread per core. I was disappointed when I read that. Until I realized something...
Well, it seems like BD will have a form of SMT, but it will not run more threads than cores. Oh, and the performance of the SMT solution should be much more scalable than intels when relating threads to performance. So, how will AMD do this?
Its very simple actually. Readers here know very little about BD really, but thats because AMD is very tight lipped about this new arch, and for good reason. It will be a monster! (Imho)
BD will feature up to 8 cores per CPU for desktop, this mean 4 modules, 2 cores per module. AMD has also stated outright that each core will run a single thread, no more. So how could they do SMT/CMT? The answer could easily be found in Thuban.
Thuban is the 6 core desktop chip in the Phenom II line of CPU's from AMD, and currently AMD's flagship line. Each 6 core CPU has some new features not found in Phenom II x4 CPU's. Some of these features are an early version of Turbo Core (basically c'n'q in reverse). This is where 3 or more cores basically shut down when not being used so that the few threads actually being used can be overclocked for better low thread performance. Bascially if you need to run 2 threads, rather than waste 4 of those 6 by doing nothing, you shut idle cores to overclock the used cores of each module, thus increasing performance inside the same power consumption and thermal levels. This is where the modular design of BD, along with the turbo core feature come together to form AMD's very nice solution to SMT.
As we all know, CPU's have hit a speed wall at about 4GHZ, so we then decided that the best way to achieve more performance, is to add multiple cores. The problem though, is that not all software is threaded to take advantage of the extra cores, so the performance is stuck at the number of cores/threads and speed of those cores.
This is where it gets exciting to me.
AMD is clearly aware that software for mainstream applications are not threaded beyond a few cores, and in most cases, a quad core is more than you will ever need, and this will be the case for some time, till higher core count CPU's become the norm and software vendors are forced to thread more to increase performance(no more relying on hardware alone to get performance). Bulldozer won't bring us 16 threads per 8 core CPU, but it could in theory run 8 threads at turbo core speeds. This is where its gets blurry in describing this to the average Joe. Sure an 8 core CPU can run 8 threads anyway, why should I care right? Lets say theres a BD 8 core 2.8ghz CPU with a turbo rating of 3.4ghz. In standard practice the AMD CPU would use its 8 real cores to process 8 threads. AMD's SMT/CMT solution will, in a way, run 8 threads on 4 turbo'd cores. Huh?
Since each BD module shares its resources, and the resources allow for 2 threads, they can "turn off" one core in each module, turbo the core, and run 2 threads across (because each BD module will have the resources to accommodate 2 cores remember). This means that rather than run 8 threads on 8 cores at 2.8ghz, you'll basically be running 8 threads on 4 cores at 3.4ghz, in similar power consumption and thermal levels.
So with software playing catchup in threading to accommodate the use of higher core count CPU's BD will launch during a time where quad core support of threads should be on its way of becoming the norm, anything beyond 4 cores will likely be few in numbers and used by much fewer people. AMD needs to target the maximum number of customers to maximize their profit and gain market share.
If benchmarking software is threaded the way it is seen today, benchmarks might favor SB noticeably, but in real world performance and in the 2-4 thread "sweet spot" BD's modular design and new form of Turbo Core should be more than a match for Intel. SB isn't doing anything major in terms of changes from nehalem, so performance increases will likely be modest. Bulldozer on the other hand is a completely new arch with a very unique way of doing things. And if everything goes well for AMD, this should be the answer they were looking for.
And yes, I am aware that many of you have likely firgured this out and I am not the first to post this, but to be honest, I haven't been able to come across anything myself that spells it out like this. Correct me if I am wrong. But for those of you who didn't see this info elsewhere, I hope this has been a good read.
No new info so no sources, all facts above are pretty common now, only my opinion of the BD SMT/CMT implementation is any different, and thats just what I think. So the only source for new info is my mind. Yay!
------------------------
Edit> After thinking about this I asked myself, "then why would they even run it at 8 core stock speeds?" Well I think that is due to power consumption/thermal levels. When I first wrote this post I had it in my head that both modes would use basically the same power and generate about the same heat. I don't think this will be the case however. After thinking about it I thought this: Maybe running at stock speeds they can maintain a lower TDP and produce less heat. But, by using the Turbo mentioned above, one core is almost turned off, only doing what it needs to do to allow the module to still process 2 threads. The other core which will be the main functioning core in the module in Turbo will bring in extra power and generate more heat. This would be a reasonable explanation for having 2 modes. Stock to pass 8 threads at lower speeds with lower TDP and heat (I'll use 95w as an ex.) And Turbo mode will be a much faster mode but will increase TDP and heat(125w for ex.). A CPU that can be tuned for the job at hand, sounds like the best of both worlds.
Bulldozer will be the Jack of all Trades, but the Master of None.
AMD will likely implement their own form of SMT in Bulldozer, and it will be close to the CMT solution rumored early on in the year.
I'm sure those of you reading this are anxiously awaiting legit new information on Bulldozer that can give us a better guess on what kind of performance to expect, and there was a lot of talk about AMD finally implementing SMT, but in their own way. That way was rumored to be CMT (Cluster-based Multi Threading). I will assume that those reading here are at least a little familiar with SMT and CMT. (If not, google is your friend)
As I said, there were many rumors going on early this year with speculation about AMD using CMT, and how it would be done and perform. But with new information released, AMD hasn't given us a definitive answer, but they have strongly indicated that there will be 1 thread per core. I was disappointed when I read that. Until I realized something...
Well, it seems like BD will have a form of SMT, but it will not run more threads than cores. Oh, and the performance of the SMT solution should be much more scalable than intels when relating threads to performance. So, how will AMD do this?
Its very simple actually. Readers here know very little about BD really, but thats because AMD is very tight lipped about this new arch, and for good reason. It will be a monster! (Imho)
BD will feature up to 8 cores per CPU for desktop, this mean 4 modules, 2 cores per module. AMD has also stated outright that each core will run a single thread, no more. So how could they do SMT/CMT? The answer could easily be found in Thuban.
Thuban is the 6 core desktop chip in the Phenom II line of CPU's from AMD, and currently AMD's flagship line. Each 6 core CPU has some new features not found in Phenom II x4 CPU's. Some of these features are an early version of Turbo Core (basically c'n'q in reverse). This is where 3 or more cores basically shut down when not being used so that the few threads actually being used can be overclocked for better low thread performance. Bascially if you need to run 2 threads, rather than waste 4 of those 6 by doing nothing, you shut idle cores to overclock the used cores of each module, thus increasing performance inside the same power consumption and thermal levels. This is where the modular design of BD, along with the turbo core feature come together to form AMD's very nice solution to SMT.
As we all know, CPU's have hit a speed wall at about 4GHZ, so we then decided that the best way to achieve more performance, is to add multiple cores. The problem though, is that not all software is threaded to take advantage of the extra cores, so the performance is stuck at the number of cores/threads and speed of those cores.
This is where it gets exciting to me.
AMD is clearly aware that software for mainstream applications are not threaded beyond a few cores, and in most cases, a quad core is more than you will ever need, and this will be the case for some time, till higher core count CPU's become the norm and software vendors are forced to thread more to increase performance(no more relying on hardware alone to get performance). Bulldozer won't bring us 16 threads per 8 core CPU, but it could in theory run 8 threads at turbo core speeds. This is where its gets blurry in describing this to the average Joe. Sure an 8 core CPU can run 8 threads anyway, why should I care right? Lets say theres a BD 8 core 2.8ghz CPU with a turbo rating of 3.4ghz. In standard practice the AMD CPU would use its 8 real cores to process 8 threads. AMD's SMT/CMT solution will, in a way, run 8 threads on 4 turbo'd cores. Huh?
Since each BD module shares its resources, and the resources allow for 2 threads, they can "turn off" one core in each module, turbo the core, and run 2 threads across (because each BD module will have the resources to accommodate 2 cores remember). This means that rather than run 8 threads on 8 cores at 2.8ghz, you'll basically be running 8 threads on 4 cores at 3.4ghz, in similar power consumption and thermal levels.
So with software playing catchup in threading to accommodate the use of higher core count CPU's BD will launch during a time where quad core support of threads should be on its way of becoming the norm, anything beyond 4 cores will likely be few in numbers and used by much fewer people. AMD needs to target the maximum number of customers to maximize their profit and gain market share.
If benchmarking software is threaded the way it is seen today, benchmarks might favor SB noticeably, but in real world performance and in the 2-4 thread "sweet spot" BD's modular design and new form of Turbo Core should be more than a match for Intel. SB isn't doing anything major in terms of changes from nehalem, so performance increases will likely be modest. Bulldozer on the other hand is a completely new arch with a very unique way of doing things. And if everything goes well for AMD, this should be the answer they were looking for.
And yes, I am aware that many of you have likely firgured this out and I am not the first to post this, but to be honest, I haven't been able to come across anything myself that spells it out like this. Correct me if I am wrong. But for those of you who didn't see this info elsewhere, I hope this has been a good read.
No new info so no sources, all facts above are pretty common now, only my opinion of the BD SMT/CMT implementation is any different, and thats just what I think. So the only source for new info is my mind. Yay!
------------------------
Edit> After thinking about this I asked myself, "then why would they even run it at 8 core stock speeds?" Well I think that is due to power consumption/thermal levels. When I first wrote this post I had it in my head that both modes would use basically the same power and generate about the same heat. I don't think this will be the case however. After thinking about it I thought this: Maybe running at stock speeds they can maintain a lower TDP and produce less heat. But, by using the Turbo mentioned above, one core is almost turned off, only doing what it needs to do to allow the module to still process 2 threads. The other core which will be the main functioning core in the module in Turbo will bring in extra power and generate more heat. This would be a reasonable explanation for having 2 modes. Stock to pass 8 threads at lower speeds with lower TDP and heat (I'll use 95w as an ex.) And Turbo mode will be a much faster mode but will increase TDP and heat(125w for ex.). A CPU that can be tuned for the job at hand, sounds like the best of both worlds.
Bulldozer will be the Jack of all Trades, but the Master of None.