Send In The Marketers
Here’s the part of the discussion where our more technical readers may let out a collective groan. Last generation, AMD referred to its x86 and graphics shaders independently. A10-6800K had four cores (actually, two Piledriver modules with four distinct integer clusters) and 384 shaders.
This time around, the company takes the fundamental graphics building block—the Compute Unit—which is replicated over and over to give us GCN-based GPUs like Hawaii with up to 2816 shaders, and dubs it a Compute Core. By definition, a Compute Core is HSA-enabled, programmable, and capable of running at least one process in its own context and virtual memory space, independent of other cores.

Of course, this gives AMD the ability to sum its CPU and GPU resources, yielding Kaveri-based APUs with eight and 12 compute cores, all with access to the same unified coherent memory. That’s compelling nomenclature when your competition is selling dual- and quad-core mainstream processors. Fortunately, the company’s legal department insists on a specific breakdown of CPU and GPU resources any time core count is used to describe a Kaveri-based APU.
AMD (validly) posits that it wants the technical community to think of up to 12 threads running concurrently, which is why it talks about Kaveri as a 12-core device. The APU does, in fact, address parallelism in a new and interesting way. We simply want to see the company use its messaging for good. At a time when AMD talks about CPUs in terms of their highest Turbo Core frequencies and rates high-end GPUs for clock rates they simply cannot sustain, the mainstream customers most likely to buy an APU aren’t going to understand the advanced implications of its nomenclature.
A New x86 Architecture: The First Steamroller CPU

At least most folks seem comfortable with the definition of AMD’s module-based approach to x86 cores, right? Kaveri represents the first outing for the Steamroller architecture, succeeding the Piledriver design at the heart of AMD’s Richland APUs. Although some of those previous-gen parts sported one module (or two cores), the just-introduced Kaveri models include two. AMD calls this a four-core configuration, though we know each module exposes two integer clusters and a shared floating-point unit.
Back when AMD introduced the Bulldozer architecture, we immediately took note of the big step back in per-cycle performance. Piledriver helped a little, but IPC remained painfully low compared to Intel’s Sandy Bridge, Ivy Bridge, and Haswell architectures. Steamroller was designed to help make up some of the difference, and engineers claim instruction throughput is up as much as 20%. Unfortunately, manufacturing decisions temper that gain.

The changes made to Steamroller predominantly improve efficiencies at the front-end of the pipe to minimize stalls and, according to AMD, get single-threaded performance back up to more competitive levels. The L1 instruction cache, previously 64 KB and two-way set associative, is now 96 KB and three-way set associative, reducing misses by 30%. AMD’s engineers similarly went after mispredicted branches by increasing the L2 branch target buffer from 5000 to 10,000 entries and augmenting the branch predictor itself. Instruction scheduling is made 5 to 10% more efficient through a jump to 48 entries (from 40). And company reps say that both integer clusters can access the microcode ROM simultaneously now, where they couldn’t before. Steamroller can issue two stores at once; the Piledriver architecture would only do one. Finally, the load/store units in each integer cluster feature ~20%-larger queues, further benefiting efficiency.
To test AMD’s claims, I dialed in a Core i5-4670K, A10-6800K, and A10-7850K to exactly 4 GHz, then ran our single-threaded iTunes and LAME benchmarks.

In iTunes, Steamroller gets exactly zero benefit. The Haswell-based Core i5 is naturally quite a bit faster. LAME actually reflects a tiny gain, but again, Intel’s architecture enjoys a commanding lead.
Frustrated at the lack of single-core speed-up, I decided to add our threaded 3ds Max 2013 render project. Only then, after spinning up both Steamroller modules, does the architecture demonstrate significantly better results. At 4 GHz, the A10-7850K is 22% faster than the A10-6800K. Some of that is eroded in practice by the Richland-based APU’s higher shipping clocks. However, it does appear that improvements made to Steamroller show up selectively, depending on the workload.
- Steamroller, GCN, HSA, 28 nm: Oh My!
- Meet The Compute Core
- A More Capable GPU: GCN Surfaces In Kaveri
- Enabling HSA On The Kaveri APU
- Test Hardware And Software
- Gaming: BioShock Infinite And Grid 2
- Gaming: The Elder Scrolls V: Skyrim And World Of Warcraft
- Dual Graphics: Does Kaveri Fix CrossFire's Problems?
- Results: Synthetics
- Results: Content Creation
- Results: Adobe CC
- Results: Productivity
- Results: Compression Apps
- Results: Media Encoding
- Results: Power Consumption And Efficiency
- Hoping The Best Is Yet To Come
Of course, the other part of this story will be the adoption of HSA and Mantle. In this regard, I think AMD is playing its cards right. If you want to provide incentive for game developers to invest in developing for Mantle, that economic incentive is not going to come from providing a high-end part that tries to compete with high-end discrete GPUs. That economic incentive, and I believe it's huge, is in lowering the cost of entry to play your game.
With the A8-7600, I believe AMD is providing a tremendous market opportunity and incentive if, with the combination of Kaveri plus embedded technologies (Mantle & True Audio), you can provide a playable gaming environment for the mass market. Admittedly, it may not be a "playable gaming environment" from an enthusiast standpoint, but as an entry point, it is quite good enough. It will be important for AMD to show that the release of Mantle for BF4 impacts performance for the Kaveri APUs in particular. More specifically, they will need to show that Mantle makes BF4 playable on a 7600. If they are successful in that regard, then I think they may really have something exciting here.
I'm hoping AMD is successful in this, because it's obvious that the desktop CPU performance race has reached a point of diminishing returns. Kudus for AMD for potentially changing the game in the industry.
All that said, they screwed up the pricing for the high-end. It needs to be $30 cheaper, and what is even the point of the 7700K? The 7850K at ~$145 and the 7600 where it is would have made much more sense if they want to incent adoption of this technology. The other point is they need to get motherboard manufacturers on-board with bringing more ITX FM2+ motherboards to market at different price points.
I got the opposite impression. Which graph are you looking at?
Given that AM3+ looks like it's done, it would have been nice to see a 6-core chip. Still, one of these may end up in my next laptop.
I got the opposite impression. Which graph are you looking at?
I really like where AMD is going (HSA, GCN and TrueAudio).Too bad the manufacturing process of GlobalFoundries just can't match Intel's.
Also, it would be interesting to see the new Bay Trail Pentium or Celeron CPUs (whichever is closer in performance) in the Efficiency graphs.
28nm SHP from GlobalFoundries. AMD bought over $1 billion worth of wafers from them in december...
I guess you have been reading the articles from a year ago about AMD still using TSMC despite promises of GlobalFoundries' new 28nm SHP process.
Of course, the other part of this story will be the adoption of HSA and Mantle. In this regard, I think AMD is playing its cards right. If you want to provide incentive for game developers to invest in developing for Mantle, that economic incentive is not going to come from providing a high-end part that tries to compete with high-end discrete GPUs. That economic incentive, and I believe it's huge, is in lowering the cost of entry to play your game.
With the A8-7600, I believe AMD is providing a tremendous market opportunity and incentive if, with the combination of Kaveri plus embedded technologies (Mantle & True Audio), you can provide a playable gaming environment for the mass market. Admittedly, it may not be a "playable gaming environment" from an enthusiast standpoint, but as an entry point, it is quite good enough. It will be important for AMD to show that the release of Mantle for BF4 impacts performance for the Kaveri APUs in particular. More specifically, they will need to show that Mantle makes BF4 playable on a 7600. If they are successful in that regard, then I think they may really have something exciting here.
I'm hoping AMD is successful in this, because it's obvious that the desktop CPU performance race has reached a point of diminishing returns. Kudus for AMD for potentially changing the game in the industry.
All that said, they screwed up the pricing for the high-end. It needs to be $30 cheaper, and what is even the point of the 7700K? The 7850K at ~$145 and the 7600 where it is would have made much more sense if they want to incent adoption of this technology. The other point is they need to get motherboard manufacturers on-board with bringing more ITX FM2+ motherboards to market at different price points.
Yesterday there was an HD7770 so low that you could get that and an FX 6300 for like $5 more than what newegg is asking for the 7850k. You can get an HD 7750 in that general price range with an FX 6300 now. In desktop, APU's still hold no appeal to me at all. Mobile, they have promise for sure.