Closed

Intel Silvermont Architecture: Does This Atom Change It All?

We sat in on a Silvermont deep-dive, the architecture powering Intel's next-generation Atom processors. Manufactured at 22 nm, armed with an out-of-order execution engine, and optimized for power, will this be what buries the ARM-based competition?

Intel Silvermont Architecture: Does This Atom Change It All? : Read more
60 answers Last reply
More about intel silvermont architecture atom change
  1. Nice article as always C.A. I would really like to see this chip on a smartphone. If the performance and power utilisation is as good as it looks then Qualcomm will really feel the heat. Intel has the money and R&D to pull off a big move and compete. Time will tell.
  2. I wonder if there are any plans to release Windows Phone 8 smartphones with these SoCs over the next 12-24 months? That would really solidify the eco-system for both Intel and Microsoft in one fell swoop.
  3. Much needed upgrades in here. Hopefully they allso deliver what they promise in these slides. Any devices out in this year or do we have to wait untill 2014 we see something based on these. But very promising indeed! A windows pro tablet based on these at desent price would be first candidate to start good move to Windows based tablets. Then there would be three good alternatives in tablets.
  4. bulldozer!
    .. is the first thing came to my mind when i started reading about the cores. but it's not exactly like bd, it's different. still.. it made me chuckle. amd deserves the credit.

    i wonder if future intel cpus ($330+ core i7) will have the same core system instead of htt.... :whistle: :ange: :lol:

    edit2: rodney dangerfield FTW! \o/
  5. de5_Roybulldozer!.. is the first thing came to my mind when i started reading about the cores. but it's not exactly like bd, it's different. still.. it made me chuckle. amd deserves the credit.


    Well, it's just the cache that's shared in this one, no actual execution resources.
  6. Finally intel is getting serious. Ditching hyperthreading is the best thing they could have possibly done. Now with OoO and real cores these atoms are looks pretty powerful. They will probably beat Kabini no problem with higher clocks with slightly less IPC. The 22nm trigate will drop power consumption especially without the shitty hyperthreading in the way.
  7. esrever said:
    Finally intel is getting serious. Ditching hyperthreading is the best thing they could have possibly done. Now with OoO and real cores these atoms are looks pretty powerful. They will probably beat Kabini no problem with higher clocks with slightly less IPC. The 22nm trigate will drop power consumption especially without the shitty hyperthreading in the way.

    i noticed the lack of information on the integrated graphics part. having a powerful cpu isn't enough for atom. the gpu part has always been the weakest point for intel. kabini otoh, will have gcn-based, hsa enabled, low power igpu.
  8. So they still have an off die memory controller. I would have thought they would have moved that on die by now.

    Any more info on this "system agent" and IDI? I'm also surprised the cores can't talk directly to each other. If you want to use many small cores to tackle a problem together that's fine. But give them the ability to do it quickly.

    It seems Intel is getting the ball rolling on their smaller chips. I just hope that when they finally do they ditch the Atom name. Bad chips, get a new name for those that aren't.
  9. de5_Royi noticed the lack of information on the integrated graphics part. having a powerful cpu isn't enough for atom. the gpu part has always been the weakest point for intel. kabini otoh, will have gcn-based, hsa enabled, low power igpu.

    Too true. Not a single mention of it probably means it won't be anything to brag about. Intel isn't really the type of company that likes to hide breakthroughs anywhere. Im expecting them to finally be able to do 1080p tablets and thats about it.
  10. No, it won't, regardless of what Intel's press release says. If I've learned anything in the past few years, is never take what Intel says in the PR at face value, because it never turns out true.

    Silvermont may arrive a few months before the 20nm process for ARM chips is ready, but will that be enough, considering Intel's chips cost 2-3x more than the ARM equivalent? Probably not.
  11. I think the article said something about its going to be the IGP from the IB chips. If so that should be much more then enough on a phone.
  12. Contrary to what most enthusiasts believe, the current Atom processors on the market are more than enough horsepower when paired with an HDD and a gig or two of ram for grandma to check her email and figure out how to facebook.

    They're not a bad processor, you just need to properly impliment them. Of course they won't work if you demand split-second responsiveness or are looking to play games, but for somebody looking to set up a basic windows or linux box they're more than acceptable.
  13. That's a bit like saying AMDs CPUs are good enough. Yes, both are mostly true. But both also ignore other possible solutions that are options. Why buy an Atom that performs similar to other chips, but because of the higher power usage will drain a battery and only last 75% as long as other SoCs? If Atom really was "good enough", then Intel could just put it out and not worry about it anymore. But they didn't, because I'm sure even Intel knows that at this point they aren't there yet.
  14. I've never seem a problem with AMD processors when it comes to reliability, Yes maybe they can't quite compete with Intel in performance but I have noticed even at that AMD is beginning to close the gap some. If anybody has seen an AMD road map of their upcoming advancements then they know AMD is about to pull a big rabbit out of the hat. Not to mention they give Intel a run for the money with price to performance, And "NO" I am not an AMD fanboy. I use Intel In all three of my home PC's, But I am pulling for AMD. We need AMD to keep Intel In check and I have been Impressed lately by the Piledriver's Improvement's over Bulldozer and AMD's future potential.
  15. With a low power chip on 14nm the Atom could have tons of applications for things that aren't just PCs or consoles. I'd like to see this chip put to use on things like auto navigation systems, smart phones, tablets, things of that nature. Intel could own the market with the right applications.
  16. Silvermont is releasing a half-year later than Jaguar, and it will still be poorer GPU-wise and likely only marginally better CPU-wise. Meanwhile, both AMD and ARM licensees will already be on their next platform.

    Assuming we see Jaguar-based SoCs in some decent tablets after June, I'm definitely going to be picking one of those up.

    Knock off the personal attacks now, or bans will be issued. - G
  17. This makes me wonder if companies that make in-house SoCs (I guess Apple in specific, since Samsung also sells them to others while Apple just does it for themselves) will ever switch mobile devices to Intel if they just can't match the performance per watt of this and future Atom cores.
  18. if I only wish this shared power thing between CPU and GPU is available on Haswell. Haswell have high performance GPU, but I dont need GPU. Would be nice if Haswell automatic turbo CPU when GPU is OFF.
  19. 4745454bSo they still have an off die memory controller. I would have thought they would have moved that on die by now. Any more info on this "system agent" and IDI? I'm also surprised the cores can't talk directly to each other. If you want to use many small cores to tackle a problem together that's fine. But give them the ability to do it quickly. It seems Intel is getting the ball rolling on their smaller chips. I just hope that when they finally do they ditch the Atom name. Bad chips, get a new name for those that aren't.


    Don't be ignorant.

    "Incidentally, Intel identifies its IDI as one of the keys to the modularity of the Nehalem/Westmere generation, and it’d seem that a lot of work from the “big” core space is affecting Atom here today."

    You could consider previous Atoms as off-die, but it won't be in Silvermont.
  20. kyuuketsukiPeople, stop feeding the troll (maddoctor).Silvermont is releasing a half-year later than Jaguar, and it will still be poorer GPU-wise and likely only marginally better CPU-wise. Meanwhile, both AMD and ARM licensees will already be on their next platform.Assuming we see Jaguar-based SoCs in some decent tablets after June, I'm definitely going to be picking one of those up.


    Really?

    http://www.notebookcheck.net/Review-Fujitsu-Stylistic-Q572-Tablet.91078.0.html

    Atom outperforms the Hondo chip by 30%
    Battery life of Atom is 2x at same battery capacity
    Needs active cooling

    AMD is expecting 15% IPC gain with Temash(Jaguar's Tablet version) but still clocked at 1GHz. I'd expect even Clover Trail to have an advantage over Temash.
  21. Quote:
    "Incidentally, Intel identifies its IDI as one of the keys to the modularity of the Nehalem/Westmere generation, and it’d seem that a lot of work from the “big” core space is affecting Atom here today."

    You could consider previous Atoms as off-die, but it won't be in Silvermont.


    Care to expand on that? I guess I'm too ignorant to see the point you were trying to make.

    Quote:
    Yes, you are right that is why I like you because you are always using Intel's product


    I'm insulted. I actually love AMD. Used them for many years. Their CPUs do make sense in some situations and your posts are not appreciated.
  22. You rang? ;) Try to keep the threads civil please.
  23. internetladContrary to what most enthusiasts believe, the current Atom processors on the market are more than enough horsepower when paired with an HDD and a gig or two of ram for grandma to check her email and figure out how to facebook. They're not a bad processor, you just need to properly impliment them. Of course they won't work if you demand split-second responsiveness or are looking to play games, but for somebody looking to set up a basic windows or linux box they're more than acceptable.


    Not really. Youtube videos at 480p do not play smoothly. Multitasking is horrid. Scrolling through complex web pages is not smooth, which would irritate any user. Hopefully with 22nm/14nm Atom it'll be better.
  24. DavidC1Really?http://www.notebookcheck.net/Revie [...] 078.0.htmlAtom outperforms the Hondo chip by 30%Battery life of Atom is 2x at same battery capacityNeeds active coolingAMD is expecting 15% IPC gain with Temash(Jaguar's Tablet version) but still clocked at 1GHz. I'd expect even Clover Trail to have an advantage over Temash.

    The z60 beat all the atoms in every single useful cpu benchmark in the link you gave. PCmark is endorsed by intel, the numbers don't mean jack shit compared to actual use. Also AMD quotes a 50-100% increase in performance from hondo to temash due to adding 2 more core. Please stop sprouting bull.
  25. Quote:
    Also AMD quotes a 50-100% increase in performance from hondo to temash due to adding 2 more core.


    LOL, details, details. Considering some to the stuff AMD has said, I'm not sure what to believe from them anymore. But then that's why we have review sites.
  26. maddoctorYes, you are right that is why I like you because you are always using Intel's product and even you have not and vow to not buy any AMD based product in your lifetime. Even I'm using Intel Pentium based notebook because it's better than unreliable buggy crap like current AMD A* processors. That is why when I'm looking for notebook, I will look for Intel's blue sticker in the laptop, because it always superior.

    ... how old are you?
  27. Is that sans2212 again?
  28. So, Java and C# apps would work fine on an Intel SoC, apps using native languages would at least have to be recompiled, and distributed as a separate bundle from ARM ones, too much hassle for not enough customers. I guess Ubuntu or Win 8 is Intels best bet.
  29. de5_Roybulldozer!.. is the first thing came to my mind when i started reading about the cores. but it's not exactly like bd, it's different. still.. it made me chuckle. amd deserves the credit.i wonder if future intel cpus ($330+ core i7) will have the same core system instead of htt.... edit2: rodney dangerfield FTW! \o/

    The shared L2 cache resembles the Kentsfield processors exactly, and given that Conroe was released 5 years before Bulldozer, I think Intel deserves the credit.
  30. de5_Royi wonder if future intel cpus ($330+ core i7) will have the same core system instead of htt....

    Why would Intel ever drop HyperThreading? It provides ~30% extra performance for ~5% more power and transistors by enabling more efficient use of existing computing resources in heavily threaded code. It is one of the most cost-effective and power-efficient tweaks in modern CPUs and GPUs.

    My bet is that it will go the other way around: Intel putting HT in Atoms and increasing the number of threads per core on i7/Xeon. I bet Atom will get HT within the next two years.

    Extracting more performance out of a given core's execution units per clock through simultaneous multi-threading is much simpler than extracting more performance out of a single instruction stream with deep out-of-order execution. AMD and ARM will likely get on-board the SMT train eventually.
  31. I had to use 1E9's compatibility mode to finally log in to this page. Chrome and IE9's normal modes wouldn't work.

    Site needs serious work, it's like Win 8 currently: Change for change's sake with core functionality completely disrupted.

    Anyway, on topic, you might also want to read AnandTech for more detail on this. He's also covered graphics.

    On Jaguar (my thoughts): It's not low enough for phones yet. Tablets, yes, but not phones. Also, having solid GPU performance is fine, but the CPU needs to be there too. Remember, this is mobile, efficiency is paramount. And you're not going to see 170 GB/s on a standard Jaguar-containing SoC.

    With Merrifield/Bay Trail, I think 10-20GB/s memory bandwidth might be a reality. With Broadwell, Intel might end up bringing Core to Atom more directly and integrating it with the main tick-tock cycle...ARM will be on 20nm when Intel's going to be doing 14nm....and when Skylake hits, ARM would still be on 16nm at the most.

    2014 marks the beginning of the end of ARM's domination in mobile. 2015 onwards x86, both Intel and AMD will have a solid lead. I think the only real competition will be from Nvidia, and I don't expect Qualcomm to not fight back. Samsung, Apple will probably exit the market and use Intel's stuff.

    Anyway, AT link:
    http://www.anandtech.com/show/6936/intels-silvermont-architecture-revealed-getting-serious-about-mobile

    Quote:
    What we know about Baytrail is that it will be a quad-core implementation of Silvermont paired with Intel’s own Gen 7 graphics. Although we don’t know clock speeds, we do know that Baytrail’s GPU core will feature 4 EUs - 1/4 the number used in Ivy Bridge’s Gen7 implementation (Intel HD 4000). Ultimately we can’t know how fast the GPU will be until we know clock speeds, but I wouldn’t be too surprised to see something at or around where the iPad 4’s GPU is today.

    Quote:
    Intel’s eDRAM approach to scaling Haswell graphics (and CPU) performance has huge implications in mobile. I wouldn’t expect eDRAM enabled mobile SoCs based on Silvermont, but I wouldn’t be too surprised to see something at 14nm.

    Quote:
    On single threaded performance, you should expect a 2.4GHz Silvermont to perform like a 1.2GHz Penryn. To put it in perspective of actual systems, we’re talking about around the level of performance of an 11-inch Core 2 Duo MacBook Air from 2010. Keep in mind, I’m talking about single threaded performance here. In heavily threaded applications, a quad-core Silvermont should be able to bat even further up the Penryn line. Intel is able to do all of this with only a 2-wide machine (lower IPC, but much higher frequency thanks to 22nm).

    Quote:
    That’s the end of the Intel data, but I have some thoughts to add. First of all, based on what I’ve seen and heard from third parties working on Baytrail designs - the performance claims of being 2x the speed of Clovertrail are valid. Compared to the two Cortex A15 designs I’ve tested (Exynos 5250, dual-core A15 @ 1.7GHz and Exynos 5410 quad-core A15 @ 1.6GHz), quad-core Silvermont also comes out way ahead. Intel’s claims of a 60% performance advantage, at minimum, compared to the quad-core competition seems spot on based on the numbers I’ve seen.


    Also: Tech Report: http://techreport.com/review/24767/the-next-atom-intel-silvermont-architecture-revealed
  32. This dude usually has some pretty in-depth stuff as well:
    http://www.realworldtech.com/silvermont/
  33. Just so everybody knows we do need AMD to continue to operate and produce their products and maybe even improve the quality a bit because if they don't and the only one left standing is Intel then we are going to pay a lot more for their CPU'S which are already expensive. Right now if you want a top of the line CPU from Intel then it's going to cost you $1000, I can't imagine what that price will jump to with no AMD in the picture.

    Maddoctor , you need to stop with your negative comments about AMD or I will start to think that your trolling, so far no AMD fans have responded to this post and when one does there will be a flame war and you will be the cause of it.
  34. If I understand Chris' last article, didn't Intel say their research showed that some changes that made the chips perform better in real life actually make them worse in benchmarks, and vice-versa? If that's the case, Intel will have to make a big push to show the devices responsiveness instead of benchmark numbers.
  35. RedJaron said:
    didn't Intel say their research showed that some changes that made the chips perform better in real life actually make them worse in benchmarks, and vice-versa?

    That is the same as with any other form of benchmarking. Different benchmarks have different characteristics which may or not be representative of real-world computing loads. Each architectural change may have positive effects on some real and synthetic workloads while they may have adverse effects on others.

    Not every architectural tweak translates into a win across the board under all circumstances so the objective is to achieve the biggest gains where they matter most.
  36. InvalidError said:
    RedJaron said:
    didn't Intel say their research showed that some changes that made the chips perform better in real life actually make them worse in benchmarks, and vice-versa?

    That is the same as with any other form of benchmarking. Different benchmarks have different characteristics which may or not be representative of real-world computing loads. Each architectural change may have positive effects on some real and synthetic workloads while they may have adverse effects on others.

    Not every architectural tweak translates into a win across the board under all circumstances so the objective is to achieve the biggest gains where they matter most.


    Of course. But to the mindless, uneducated masses, the numbers mean the most, even if they don't know what the numbers actually mean. A few years ago a lot of people panned Windows Phone 7 because they were all single-core CPUs. However, since the OS was designed and optimized for single cores, it was every bit as fast and responsive as multi-core Android phones. If all the general public sees are lopsided benchmark numbers or that the Atoms still aren't quad-core, it won't matter how much performance they really have, they won't sell. Intel will have a big marketing/educating challenge to overcome.

    Personally, the idea of a full x86 phone is incredibly appealing to me.
  37. Quote:
    didn't Intel say their research showed that some changes that made the chips perform better in real life actually make them worse in benchmarks, and vice-versa?


    I don't know if true or not. If true it's a problem with the synthetic benchie. As it obviously isn't doing what real programs do. I've said this before and I'll say it again. I don't even look at the synthetic benchmark page(s) on new parts. I don't care how many bungholio marks your new part can make. All that matters to me is speed. And when the 2900XT came out and we all saw massive synthetic scores higher then the 8800ultra but gaming scores lower then the 8800GT, I stopped paying attention to synthetic scores. You don't play 3DMark or superpi. Why do you care what those results are?
  38. Quote:
    > Since AMD already published the Software Optimization Guide, you can make the comparison
    > yourself, or just assume that usually Jaguar have a bit more than Silvermont, for example:

    Umm. You missed the most important number. L2 size/latency.

    Intel tends to do a good memory subsystem, and people seem to always dismiss that. The numbers I've seen for Jaguar are a 24 cycle load-use latency of the 512kB L2. David claims (I think) 14 cycles for the 1MB shared L2 on Silvermont. That looks like a big advantage for Silvermont.

    Things like ROB sizes are almost unimportant compared to the really fundamental things. I'm a huge proponent of OoO, but it doesn't even need to be all that deep to get the low-hanging fruit. You need to have good caches: low latency and reasonable associativity. If those are bad, no amount of core improvements will ever make up for it (see the majority of the ARM cores), and if the caches are really good, you can make do with shallower queues.

    (Side note: I'd like to see actual benchmark numbers for the Silvermont cache accesses. Sometimes CPU people quote the latency after the L1 miss (rather than the full load-to-use latency), sometimes they quote the "n-1" number of cycles, sometimes it turns out that pointer following adds another few cycles that they don't mention, yadda yadda, just to make their numbers look better than they are).

    So if the Silvermont 14 cycles are for "14 cycles after the L1 miss", then that is much worse than if it's a true "14 cycle pointer-to-pointer chasing" latency. But I'm assuming it's true load-to-use latency for now, since that's in the same ballpark that Intel did back in the Merom/Yonah days.

    Linus (Trovalds)

    http://www.realworldtech.com/forum/?threadid=133235&curpostid=133311
  39. I'd have to agree. AMDs Athlon with the IMC was faster then the P4 in part because by no longer using a FSB they were able to get info from the cache MUCH faster then Intel could. Along with the much shorter pipeline their chips were quite a bit better.

    Problem is Intel learned a lot from tweaking the FSB so that by the time they started using the IMC theirs was a lot better then AMDs. And for some reason AMD hasn't really brought the latency down much at all. At least not that last time I checked.
  40. I owned 2 netbooks with atom CPU's. From my experience, they were really crappy. If super crappy got 50% better, then it's still kinda crappy. Although any improvement is a good improvement in my mind. I'll just make sure to steer clear of atom cpus (and all junk surrounding it)until they are PROVEN to be competitive not only in power consumption and performance, but price as well.
  41. InvalidErrorWhy would Intel ever drop HyperThreading? It provides ~30% extra performance for ~5% more power and transistors by enabling more efficient use of existing computing resources in heavily threaded code. It is one of the most cost-effective and power-efficient tweaks in modern CPUs and GPUs. ...


    It's amazing how often people criticise the modern HT implementation,
    which is strange because as you say in heavily threaded code it can be
    extremely effective (not always - AE is an example so I've read - but
    usually). Best example I came across recently was my 4.3GHz i7 870
    giving a much higher 3DMark11 Physics score (9752) than a 4.5GHz
    3570K (9297); now that was a surprise, ie. the HT made all the difference
    (compare any 870 to a 760 at the same clock, similar effect). Of course if
    a task doesn't benefit from HT, then turning it off gives more thermal
    headroom and a higher oc (eg. another 870 I have can do 4.1 with HT on
    vs. 4.44 with HT off).

    If Intel does ever add HT to Atom, what would be good is if each core
    could exploit HT on the fly, ie. use it if a task gains from its use, turn it
    off if not & thus save power aswell. Hmm, dynamic HT... bit too complex
    perhaps.

    Ian.
  42. mapesdhs said:
    If Intel does ever add HT to Atom, what would be good is if each core could exploit HT on the fly, ie. use it if a task gains from its use, turn it off if not & thus save power aswell. Hmm, dynamic HT... bit too complex perhaps.

    The way I understand it, most of the extra power use with HT simply comes from increased activity in decode and execution units. If HT is enabled but the extra thread is sleeping (not decoding or issuing any instructions), it does not cause extra CPU activity and does not use any more power than disabling it. So HT is already sort-of dynamic.

    The main thing it would need to be more power-efficient is a kernel driver or hack to analyze each thread's typical CPU requirements to sort out which ones are worth waking a CPU core for and which ones should be scheduled on secondary threads as much as possible.
  43. 4745454b said:
    Quote:
    didn't Intel say their research showed that some changes that made the chips perform better in real life actually make them worse in benchmarks, and vice-versa?


    I don't know if true or not. If true it's a problem with the synthetic benchie. As it obviously isn't doing what real programs do. I've said this before and I'll say it again. I don't even look at the synthetic benchmark page(s) on new parts. I don't care how many bungholio marks your new part can make. All that matters to me is speed. And when the 2900XT came out and we all saw massive synthetic scores higher then the 8800ultra but gaming scores lower then the 8800GT, I stopped paying attention to synthetic scores. You don't play 3DMark or superpi. Why do you care what those results are?


    Read the whole thing I wrote. I didn't say the chip would suck because of benchmarks, or that I give extra weight to synthetic benchmarks. I said since so many other people do base their judgment on those numbers, Intel may have a harder time marketing these chips against Tegra and Snapdragon.
  44. I gotcha. I was trying to say I wish more people would look at ACTUAL programs rather then just some stupid number of marks. But of course that takes time...
  45. Ahh, gotcha, and yes, I would like to see the same.
  46. About HT: 1) Current gen Atom already has HT

    2) Silvermont won't get HT because Intel figured moving to OoOE would use the same die space and power, but provide much better performance.

    3) In Core, you have enough space and power to do both HT and OoOE.
  47. ojas said:
    3) In Core, you have enough space and power to do both HT and OoOE.

    Not much of an argument considering how little extra power and space HT needs vs how much computing power it adds: ~30% more performance for ~5% more of anywhich one other parameter, making it ~6X more efficient than adding an equivalent amount of brute-force processing power through extra cores or higher clocks.

    HT is more efficient than OoOE but requires sufficiently threaded workload to actually leverage. HT's real problem is that most software today is still fundamentally single-threaded so most programs benefit a lot more from single-threaded performance optimizations than extra cores or SMT.
  48. InvalidError said:
    ojas said:
    3) In Core, you have enough space and power to do both HT and OoOE.

    Not much of an argument considering how little extra power and space HT needs vs how much computing power it adds: ~30% more performance for ~5% more of anywhich one other parameter, making it ~6X more efficient than adding an equivalent amount of brute-force processing power through extra cores or higher clocks.

    HT is more efficient than OoOE but requires sufficiently threaded workload to actually leverage. HT's real problem is that most software today is still fundamentally single-threaded so most programs benefit a lot more from single-threaded performance optimizations than extra cores or SMT.

    Well...
    Quote:
    The move to an OoO paradigm generally comes with penalties to die area and power consumption, which is one reason the earliest mobile CPU architectures were in-order designs. The ARM11, ARM’s Cortex A8, Intel’s original Atom (Bonnell) and Qualcomm’s Scorpion core were all in-order. As performance demands continued to go up and with new, smaller/lower power transistors, all of the players here started introducing OoO variants of their architectures. Although often referred to as out of order designs, ARM’s Cortex A9 and Qualcomm’s Krait 200/300 are mildly OoO compared to Cortex A15. Intel’s Silvermont joins the ranks of the Cortex A15 as a fully out of order design by modern day standards. The move to OoO alone should be good for around a 30% increase in single threaded performance vs. Bonnell.

    Quote:
    Previous versions of Atom used Hyper Threading to get good utilization of execution resources. Hyper Threading had a power penalty associated with it, but the performance uplift was enough to justify it. At 22nm, Intel had enough die area (thanks to transistor scaling) to just add in more cores rather than rely on HT for better threaded performance so Hyper Threading was out. The power savings Intel got from getting rid of Hyper Threading were then allocated to making Silvermont an out-of-order design, which in turn helped drive up efficient use of the execution resources without HT. It turns out that at 22nm the die area Intel would’ve spent on enabling HT was roughly the same as Silvermont’s re-order buffer and OoO logic, so there wasn’t even an area penalty for the move.


    That was AnandTech.

    Quote:
    Intel claims that the core and system level microarchitectural improvements will yield 50% higher IPC for Silvermont versus the previous generation. Comparing the two microarchitectures in Figure 7, that is a highly plausible claim as out-of-order scheduling alone should be worth 30% and perhaps 5-10% for better branch prediction. Additionally, the 22nm process technology has reduced Vmin by around 100mV, and increased clock frequencies by roughly 20-30%. In total, this paints a very attractive picture for Silvermont.

    Quote:
    One of the biggest advantages of Silvermont’s out-of-order scheduling is that the whole concept of ‘ports’ or ‘execution pipes’ becomes irrelevant. The microarchitecture will schedule instructions in a nearly optimal fashion, and easily tolerate poorly written code. Of course, the resources for out-of-order execution are not free from a power or area standpoint. However, Intel’s architects found that the area overhead for dynamic scheduling is comparable to the cost for multi-threading, with far better single threaded performance.

    Quote:
    The microarchitecture of Silvermont is conceptually and practically quite different from Haswell and other high performance Intel cores. The latter decode x86 instructions into simpler µops, which are subsequently the basis for execution. Silvermont tracks and executes macro operations that correspond very closely to the original x86 instructions, a concept that is present in Saltwell. This different approach is driven by power consumption and efficiency concerns and manifests in a number of implementation details. Other divergences between Haswell-style out-of-order execution and Silvermont are dedicated architectural register files and the distributed approach to scheduling.


    That was from Real World Tech
    http://www.realworldtech.com/silvermont/

    What i have to say is that: the HT you're looking at is based on a OoO pipeline, while the original Bonnell/Saltwell pipeline was in-order.
    From whatever testing i've done of the Xolo X900, and the other benchmarks i've read for Clover Trail and Medfield, HT had little effect for Atom (this includes Linpack, which showed no improvement under ICS or Gingerbread for Medfeild).
  49. ojas said:
    What i have to say is that: the HT you're looking at is based on a OoO pipeline, while the original Bonnell/Saltwell pipeline was in-order.

    The general principles behind HT/SMT remain the same regardless of the presence of OoOE or superscalar execution: provide one or more alternate independent instruction stream(s) to help fill execution slots when execution is about to stall from lack of eligible instructions.

    The main problem with OoOE is that its complexity increases almost exponentially with look-ahead depth while the chances of finding extra instructions to squeeze in decrease due to valuable and expensive resources getting tied up behind conditional branches, mispredicts, speculative execution, cache misses, memory fetches, etc. which OoOE cannot efficiently do anything about so the cost vs benefits of pursuing deeper OoOE keeps getting worse as you attempt to extract more ILP per thread.

    With SMT on the other hand, the costs are almost directly proportional with the hardware thread count and chances of finding instructions to fill ports with also increase linearly with the number of active threads loaded on each core.

    SMT and OoOE are not mutually exclusive. They are synergistic: SMT drastically reduces the depth/complexity of OoOE required to keep most ports busy on every clock tick under sufficiently threaded workloads (may span multiple applications) and OoOE increases the likelihood of finding something to do in each individual hardware thread. Both contribute significantly to extracting the most work out of the least surface area and power, which would be a significant advantage on platforms with limited power supply if pervasive threading became more common.

    Another nice benefit of HT on a limited power budget is that you do not need to rely as much on branch prediction, speculative execution and various other expensive (both power and area) tricks. You can simply execute instructions from other threads and hope dependencies will start resolving themselves before every thread gets stuck, which means less wasted work/power - that was the goal behind the original Atom and reality disagreed with Intel's vision of how much of a good idea that was.

    HT becomes more effective than increased OoOE once you have enough execution resources to actually start worrying about OoOE failing to keep them busy most of the time - that's where the OoOE and support structures start ballooning out of proportions like they do on mainstream desktop/laptop CPUs. If you look at chips designed for massive parallelism (GP/GPUs, Xeon Phi, UltraSparc T5, etc.), they tend to have much more primitive OoOE (sometimes none whatsoever) than chips designed with somewhat of an obsession for single-threaded performance like Haswell.

    The optimal balance between multi-core, OoOE and HT/SMT is all about availability of threaded system workload (not necessarily all from a single game/application) or lack thereof. The balance just happens to still be heavily weighed towards single-threaded performance's favor.

    The main problem with the old Atom: nobody cares how much more efficient an SMT design can be if they thoroughly suck at extremely common single-threaded tasks like parsing HTML and calculating layouts to render web pages. Not having OoOE left the old Atom severely crippled in that department, which is clearly not acceptable if Intel wants to gain market share in mobile devices.

    Give it maybe two years. Atom will likely grow to quad-port execution and HT will come back to help keep them full without expanding the OoOE circuitry too much nor sacrificing the all-so-important single-threaded performance.
Ask a new question

Read More

Smartphones Tablets CPUs