AMD Talks Steamroller: 15% Improvement Over Piledriver

We are getting our first look at the "Steamroller”, which is the core for the "Kaveri" APU, among others. AMD is expecting to see a 15 percent improvement in performance per watt over the "Piledriver" core. The improvements are seen through design-level improvements rather than process-level improvements. 

The design-level improvements are driven by the microarchitecture's design to feed the cores faster, improve single-core execution and a push on performance per watt.  

To "Feed the Cores Faster", AMD has increased the instruction cache size, enhanced instruction prefetch, and has a more efficient dispatch. In addition, Steamroller has a dedicated decode for each integer pipe. These design improvements have resulted in a 30 percent reduction in i-cache misses, an increase of 25 percent on max-width dispatcher per thread and a 20 percent reduction in mispredicted branches, which results in a 30 percent increase in overall ops delivered per cycle.

Steamroller improves single-core execution by tuning up the integer execution bandwidth and decrease average load latency. The integer execution bandwidth is tuned up with the improvement seen with "Feed the Cores Faster", along with more register resources (same latency) and intelligent scheduling. Average load latency is decreased by not just minimizing latency but with faster handling of data cache misses and accelerated store-to-load forwarding. These design improvements have resulted in a 5 to 10 percent increase in scheduling efficiency, along with major improvements in store handling.

As with any design improvement, companies are trying to get more performance out the process with equal or less power requirements. We are seeing this not only with processors but with graphics cards as well. AMD improves Steamroller's performance per watt with its power optimization, floating point rebalance and dynamic resizing of L2 cache. The dynamic resizing of L2 cache has allowed the shared L2 cache to work in an adaptive mode based on workload. The floating point rebalance allows it to streamline execution hardware and adjust to application trends, which add to the efficiency of the design. In addition, its power optimization offers lower average dynamic power and is optimized for loop behaviors.

Look for more details from AMD during Hot Chips Symposium on its Surround Computing and High Density (Thin) Libraries.

 

Contact Us for News Tips, Corrections and Feedback

Create a new thread in the US News comments forum about this subject
This thread is closed for comments
124 comments
Comment from the forums
    Your comment
    Top Comments
  • Maxx_Power
    That would be nice. Hopefully it will be launched on time.
    34
  • fedelm
    First and foremost, let's hope Piledriver can deliver on the desktop, especially after the bulldozergate. If it does, then this is excellent news.

    I want AMD to be able to give intel a run for its money performance-wise. Doesn't have to match the i-xxs, just provide a decent, reliable and affordable option.

    Cheers.
    31
  • Pinhedd
    Good.

    It sounds like they've learned a lot from Bulldozer. If they can keep pulling off 15-20% improvements in 10-12 month intervals than perhaps AMD can still be a viable competitor to Intel
    30
  • Other Comments
  • Maxx_Power
    That would be nice. Hopefully it will be launched on time.
    34
  • fedelm
    First and foremost, let's hope Piledriver can deliver on the desktop, especially after the bulldozergate. If it does, then this is excellent news.

    I want AMD to be able to give intel a run for its money performance-wise. Doesn't have to match the i-xxs, just provide a decent, reliable and affordable option.

    Cheers.
    31
  • Pinhedd
    Good.

    It sounds like they've learned a lot from Bulldozer. If they can keep pulling off 15-20% improvements in 10-12 month intervals than perhaps AMD can still be a viable competitor to Intel
    30