AMD Talks Steamroller: 15% Improvement Over Piledriver
At today's Hot Chips Symposium, Mark Papermaster, Senior Vice President and CTO, AMD, talks about the upcoming "Steamroller" Microarchitecture
We are getting our first look at the "Steamroller”, which is the core for the "Kaveri" APU, among others. AMD is expecting to see a 15 percent improvement in performance per watt over the "Piledriver" core. The improvements are seen through design-level improvements rather than process-level improvements.
The design-level improvements are driven by the microarchitecture's design to feed the cores faster, improve single-core execution and a push on performance per watt.
To "Feed the Cores Faster", AMD has increased the instruction cache size, enhanced instruction prefetch, and has a more efficient dispatch. In addition, Steamroller has a dedicated decode for each integer pipe. These design improvements have resulted in a 30 percent reduction in i-cache misses, an increase of 25 percent on max-width dispatcher per thread and a 20 percent reduction in mispredicted branches, which results in a 30 percent increase in overall ops delivered per cycle.
Steamroller improves single-core execution by tuning up the integer execution bandwidth and decrease average load latency. The integer execution bandwidth is tuned up with the improvement seen with "Feed the Cores Faster", along with more register resources (same latency) and intelligent scheduling. Average load latency is decreased by not just minimizing latency but with faster handling of data cache misses and accelerated store-to-load forwarding. These design improvements have resulted in a 5 to 10 percent increase in scheduling efficiency, along with major improvements in store handling.
As with any design improvement, companies are trying to get more performance out the process with equal or less power requirements. We are seeing this not only with processors but with graphics cards as well. AMD improves Steamroller's performance per watt with its power optimization, floating point rebalance and dynamic resizing of L2 cache. The dynamic resizing of L2 cache has allowed the shared L2 cache to work in an adaptive mode based on workload. The floating point rebalance allows it to streamline execution hardware and adjust to application trends, which add to the efficiency of the design. In addition, its power optimization offers lower average dynamic power and is optimized for loop behaviors.
Look for more details from AMD during Hot Chips Symposium on its Surround Computing and High Density (Thin) Libraries.




I want AMD to be able to give intel a run for its money performance-wise. Doesn't have to match the i-xxs, just provide a decent, reliable and affordable option.
Cheers.
It sounds like they've learned a lot from Bulldozer. If they can keep pulling off 15-20% improvements in 10-12 month intervals than perhaps AMD can still be a viable competitor to Intel
I want AMD to be able to give intel a run for its money performance-wise. Doesn't have to match the i-xxs, just provide a decent, reliable and affordable option.
Cheers.
It sounds like they've learned a lot from Bulldozer. If they can keep pulling off 15-20% improvements in 10-12 month intervals than perhaps AMD can still be a viable competitor to Intel
Good troll there...
Even if this generation does fail, I have a feeling they will be doing something right in the next 5 years.
It sounds like they've learned a lot from Bulldozer. If they can keep pulling off 15-20% improvements in 10-12 month intervals than perhaps AMD can still be a viable competitor to Intel
The trouble is, Intel is not standing still. If the improvements in Intel's subsequent gen chips keep pace with AMD's, then the performance gains in AMD chips will be a moot point. In theory, Intel may remain the better buy.
Don't get me wrong, I'd like to see AMD maintain viability because of the competition it gives Intel, however, at this point, AMD is a reasonable margin behind Intel and has a lot of ground to make up. Many hardware sites, including this one, maintain that Intel currently has the "value buy" processors at most price points.
I am holding out hope that AMD's rehiring of the guy behind K6 and AMD64 means that they are going back to the drawing board for future generation processors that have yet to be announced. If AMD learned its lesson that marketing hype alone cannot make a great processor, then they might just keep such a plan quiet.
So where's the fingers crossed smilie??
In the meantime, it will be a while before I build again, so I will keep an eye on developments, however, without plans to buy.
Better start learning CPU giberish
As always if the end result means "faster" then I´m a happy reader.
AMD Talks Steamroller: 15% Improvement Over Piledriver : Read more
I'm still waiting for AMD to release somthing that runs 15% faster than my Phenom II 980BE , sorry , I have been a long time fan of AMD , but I call BS on this until I see benchies!
Look at Trinity benchmarks. That's Piledriver without any L3 and it beats Bulldozer. Piledriver on the desktop is probably a considerably better improvement than AMD's 10-15% number.
Probably 32nm. AMD doesn't need to move on because they can simply keep fixing Bulldozer's many design implementation flaws and AMD can't move on until other fab companies are ready for that anyway. Unlike Intel, AMD has to rely on other companies doing their job to get a new process node ready.
More performance per watt means more thermal headroom and that probably means more overclocking potential. AMD is likely to increase stock clocks as necessary anyway even if they don't improve performance per Hz by much. Unlike Netburst, AMD's modular architecture is actually good at what it is intended to do and can clock extremely high without a problem while having a lot more performance at a given clock frequency than Netburst did. AMD can increase the clocks every time that they decrease power consumption and they can do other things too. Performance, not just power efficiency, will almost definitely continue improving from now on with AMD.