Piledriver: Half Of The Trinity Story
AMD is eager to deemphasize the importance of x86 performance, instead focusing on the potential of workloads accelerated by its powerful graphics architecture. The company willingly dubs its implementation “good enough,” pointing out that basic productivity-oriented workloads reliant on user input aren’t sped up at all by a faster CPU.
On the other side of the fence, synthetic benchmarks and diagnostics easily quantify the potential delta between architectures like Ivy Bridge and Bulldozer.
As with most debates, the truth lies somewhere in the middle. Many (if not most) of the benchmarks in our suite measure the alacrity of x86 computing resources in a very real-world way. Others focus more intently on graphics performance. And we’re increasingly adding tests able to leverage what AMD calls heterogeneous computing—improving performance by drawing from multiple subsystems concurrently.
The point is that x86 cores are still first-class citizens in the APU world, and there is such a thing as performance that’s not good enough. That’s part of the reason why so many of us want to know how the Piledriver architecture improves upon Bulldozer. So let’s get that out of the way first.
We took the A10-5800K, set it to 3.8 GHz, turned off Turbo Core and any power-saving feature that’d spin the chip down. Then, we took FX-8150, overclocked it to 3.8 GHz, and disabled all of the same features. By running a single-threaded workload like iTunes, we could neutralize the difference in core count (though, if anything, FX could have benefited from its 8 MB L3). Nevertheless, Piledriver clearly completes our workload much faster, yielding a 15% improvement, per clock cycle, over Bulldozer.
Turning off two of FX-8150's Bulldozer modules gives us the opportunity to run a threaded workload like 3ds Max without slanting the result toward Bulldozer. And once again, the Piledriver-based APU wins by roughly 15%.
Ivy Bridge was only about 4% faster at a given clock rate than Sandy Bridge. So, while we’re fairly certain that a Piledriver-based FX wouldn’t overtake the newest Core i7s, it should be more competitive than today’s Bulldozer-based CPUs. Where does the speed-up come from? Doesn't appear to be cache latency; Sandra shows the same results for Bulldozer and Piledriver.
As far as its role in Trinity, the benchmarks will show that the Piledriver architecture generally outperforms Llano’s Stars design, particularly in applications that emphasize integer math. When you start taxing Piledriver’s shared floating-point resources, older Llano-based APUs still wind up delivering better performance, though generally by slim margins.