The very foundation of AMD’s current x86 architecture was covered in great depth back when I reviewed the FX-8150 (AMD Bulldozer Review: FX-8150 Gets Tested). All of those tenets carry over to the company’s Piledriver update. However, we know that AMD’s engineers learned a number of lessons as they took the original Bulldozer concept from theories and diagrams to actual silicon. We also know that process technology evolved over the last year, even if the company continues to use a 32 nm node for manufacturing its Vishera-based CPUs. It should come as no surprise, then, that today’s reformulation is the result of several tweaks flagged for improvement a long time ago.
Front-End Improvements
In the days that followed AMD’s Bulldozer introduction, branch prediction was suggested as one of the architecture’s possible weaknesses. The module concept involves certain shared resources feeding two execution threads, and the architects attempted to minimize bottlenecks in the front-end by implementing one prediction queue per thread behind a 512-entry L1 and 5000-entry L2 branch target buffer. For Piledriver, the company claims the accuracy of its predictor is better.
Piledriver adds support for a couple of ISA extensions that we first covered in our mobile Trinity-based APU coverage. The fused multiply-add was introduced a year ago in Bulldozer. That specific version was called FMA4, though, and allowed an instruction to have four operands. But Intel only plans to support a simpler three-operand FMA3 instruction set in its upcoming Haswell architecture, so AMD preempts that addition with Piledriver. The other extension, F16C, enables support for converting up to four half-precision to floating-point values at a time. Intel’s Ivy Bridge architecture already includes this, so its implementation on Piledriver simply catches AMD up. Not that Bulldozer was suffering without FMA3/F16C; compiler support was only added in Visual Studio 2012.
Inside The Integer Cluster
The two integer clusters in each compute module feature an out-of-order load/store unit capable of two 128-bit loads/cycle or one 128-bit store/cycle. AMD discovered that there were certain cases where Bulldozer wouldn’t catch store data already in a register file. By rectifying this, instructions are fed into the integer clusters more quickly.
Within each integer core, we’re still dealing with two execution units and two address generation units (referred to simply as AGens). Those AGens are more capable this time around in that they’re able to perform MOV instructions. When AGen activity is light, the architecture will shift MOVs over to those pipes.
One of the most notable changes is a larger translation lookaside buffer for the L1 data cache, which grows from 32 entries to 64. Because the L2 TLB has fairly high 20-cycle latency, improving the hit rate in L1 can yield significant performance gains in workloads that touch large data structures. This is particularly important in the server space, but AMD’s architects say they noticed certain games demonstrating sensitivity to this too, which isn’t something they had expected.
L2 Cache Optimizations
Hardware prefetching into the L2 is improved as well. Minimum latency doesn’t change, which is why cache latency doesn’t look any better in our Sandra 2013 benchmark. However, as the prefetcher and L2 are used more effectively, average latency (much more difficult to measure with a diagnostic) should be expected to drop, AMD claims. The same Sandra 2013 module also reflects very little change in L3 latency, and Vishera’s architects confirm that no changes were made to the L3 cache shared by all modules on an FX package.
Putting It All Together: Five Architectures At 4 GHz
What effect do all of those adjustments have on Piledriver's per-cycle performance? We ran five different architectures at 4 GHz to compare their relative results.

In iTunes, which we know to be single-threaded, the FX-8350 demonstrates significant gains over the Bulldozer-based FX-8150. But a Phenom II X6 1100T operating at the same frequency is still faster. And that's before we look at the Sandy and Ivy Bridge architectures, which jump way out in front of anything from AMD.

Notice that the Core i7 is listed as a quad-core CPU capable of addressing four threads. I disabled Hyper-Threading in this test to isolate core performance. Had it been turned on, Intel's client flagship would have likely finished in first place.
Nevertheless, we're most interested in the gain realized by shifting from FX-8150 to FX-8350, and it is significant. Again, though, Thuban's six cores manage to outmaneuver Vishera's quad-module configuration. AMD is using a clock rate advantage to keep this latest architecture in front of its older design. Thuban really doesn't want to run at such high frequencies, even as it's able to get more done per cycle.
- Meet AMD’s Piledriver-Based FX Line-Up
- Overclocking And Platform Compatibility
- The Piledriver Architecture: Improving On Bulldozer
- Hardware And Software Setup
- Benchmark Results: PCMark 7
- Benchmark Results: 3DMark 11
- Benchmark Results: SiSoftware Sandra 2013 Beta
- Benchmark Results: Content Creation
- Benchmark Results: Adobe CS 6
- Benchmark Results: Productivity
- Benchmark Results: Compression Apps
- Benchmark Results: Media Encoding
- Benchmark Results: Battlefield 3
- Benchmark Results: The Elder Scrolls V: Skyrim
- Benchmark Results: World Of Warcraft: Mists Of Pandaria
- Power Consumption And Efficiency
- FX-8350: Still Not The FX Us Old-Timers Remember…




I now really don't see people purchasing it though....people will be buying the 8320.
I now really don't see people purchasing it though....people will be buying the 8320.
If more games / daily use apps start using more cores these new AMD's could really take off.
Thanks for the review.
Btw Chris, how many cups of joe did you had to take for the overclocking testing?
5-12% performance increase 12% less power - sound familiar?
the only difference this time was less hype before the release. (lesson well learned AMD!)
You seem to forget that unlike Intel's Ivy compared to Sandy, Vishera versus Zambezi leaves Vishera the superior overclocker as well as cooler-running and with superior overclocking price/performance ratios. There's also the fact that AMD did this on the same process node, not that that matters as anything other than a foot note.
One really big one. Kept me up till 5AM this morning ;-)
Anyway it good upgrade for owner with am3+ board... (including me
If you are paying that much, why would you let it set idle, turn it off instead!
But wow! at only 195$ this 8350 looks like a clear winner! Nice Comeback AMD !
But wow! at only 195$ this 8350 looks like a clear winner! Nice Comeback AMD !
It really isn't a cut & dry black & white situation. Depends on the workloads and purpose...
for now i'll pass. if it was truely under $200 i would consider it for my next low end system, but so far the price is well over $200 and not worth it.
amd fx-8350 for $219.99
http://www.newegg.com/Product/Product.aspx?Item=N82E16819113284&name=Processors-Desktops
intel i5-3470 for $199.99
http://www.newegg.com/Product/Product.aspx?Item=N82E16819115234
intel i5-3570 for $214.99
http://www.newegg.com/Product/Product.aspx?Item=N82E16819115233
intel i7-3770 for $299.99
http://www.newegg.com/Product/Product.aspx?Item=N82E16819116502