Sign in with
Sign up | Sign in

The CPU Side: An All-New Piledriver Core

AMD A10-4600M Review: Mobile Trinity Gets Tested
By

An APU is an amalgamation of x86 cores and graphics resources. So, let’s start by exploring the component of the die traditionally referred to as the CPU.

When Llano was introduced a year ago, we already knew that its Stars architecture was on its last legs. AMD’s plans for the future clearly centered on Bulldozer, a design that wouldn’t make it into a desktop-oriented product until last October.

Well, the situation is reversed for Trinity’s introduction. This time, AMD’s most modern processor architecture is being shown off in an APU—a mobile APU at that. Dubbed Piledriver, we’re faced with the update to Bulldozer that won’t find its way onto the desktop until later in 2012.

What are the main differences between the Husky cores in AMD’s Llano architecture and the Piledriver-based cores in Trinity? Whereas a quad-core APU built on the Llano design employs four distinct execution cores, quad-core Trinity chips feature two Bulldozer modules. Each module boasts two integer cores. However, they share some of the resources that you’d typically find duplicated on more traditional multi-core implementations, such as the fetch and decode stages, floating point units, and L2 cache. Again, you can read more about the Bulldozer architecture in AMD Bulldozer Review: FX-8150 Gets Tested.

The most obvious difference between AMD’s desktop FX processors and the CPU component of Trinity is cache. While each of the APU’s modules still shares 2 MB lf L2, Trinity lacks the 8 MB shared L3, leaving this module architecture with 4 MB of L2 and no L3, matching Llano’s on-die memory.

AMD engineers made it clear that one of their main design goals for Piledriver was to improve IPC compared to Bulldozer. We knew this as far back as AMD’s original Bulldozer briefing, so it’s not a surprise. With FX, we saw that the architecture gave up significant per-clock performance compared to its predecessor, and that clearly needed to be addressed. The engineering team didn’t use just one magic bullet in its quest, but rather a variety of strategies that result in improved performance per clock.

Here are the main improvements implemented in the Piledriver core:

First, the branch predictor was significantly re-vamped and split into a two-level structure. Keeping the instruction pipeline flowing is a critical job when performance is the target, and while AMD didn’t disclose anything more specific, it did make it clear that branch prediction plays a significant role.

In addition, engineers increased the size of the instruction window to allow a larger group of instructions to be processed; this improves performance, and helps process operating system-level code more efficiently. In addition, more ISA instructions were added, including a fused multiply-add (FMA3) and a floating point 16-bit convert (F16C). The Bulldozer architecture already supported FMA4, so the inclusion of FMA4 enables support for a capability that Intel will introduce in its next-gen architecture as well. According to AMD, instruction executable times were improved, resulting in faster floating-point and integer divide results in addition to calls and returns, changes that are critical to get in and out of subroutines quickly. Page translation has also been improved and optimized.

The memory subsystem is another key component of performance, and we saw early on that high cache latencies were one of Bulldozer’s key weaknesses. AMD engineers claim to have invested a lot of effort to improve Piledriver’s L2 cache and hardware prefetcher, purportedly reducing latencies when memory is read.  Stream prediction is purportedly improved significantly since the previous generation of APUs.

The Load/Store unit has also been targeted as a place where latency can be reduced, so store-to-load reordering has been improved with follow-up reads to better anticipate compiler requests and reduce load latency. The L1 translation lookaside buffer (TLB) has been doubled to 64-entries to avoid associated latency increases if possible, as a larger TLB provides a more efficient structure. Finally, both the integer and floating-point schedulers have been improved to better utilize all of the hardware units that Piledriver has to offer.

With improvements in clock rate (something we’ll talk about a little later), AMD claims its Trinity-based A10-5800K offers a 26% improvement over the Llano-based A8-3850 on the desktop, and that its A10-4600M shows a 29% improvement over the A8-3500M in notebooks.

Those are pretty aggressive improvements, and we’ll be keeping them in mind as we run through our tests. But first, let’s take a look at the graphics segment of Trinity.  

Display all 165 comments.
This thread is closed for comments
Top Comments
  • 41 Hide
    cleeve , May 15, 2012 3:18 PM
    duckwithnukesWhere is the Intel HD 4000 vs. AMD Trinity comparison? Lazy reviewing at its finest.


    A10-4600M laptops will be int eh $600-$700 neighborhood, and we're still waiting for Ivy bridge Core i5 to arrive in this price range.

    We go over this. We also talk about how we'll do a follow up as soon as an appropriate product is available.

    You need to read for it to make sense.
  • 34 Hide
    fazers_on_stun , May 15, 2012 3:58 PM
    ^ ^ Anandtech reviewed the A10-4660M with its HD7660G igp vs. the i7-3720QM with its HD4K igp and over a total of 15 games, the 7660G averaged 20% faster than the HD4K. Against the Llano 6620G igp, it was just short of 20% faster. Against the HD3K (Sandy Bridge igp), it was a whopping 80% faster. So, the conclusion is that if you want mobile gaming on a budget laptop, Trinity is the way to go...
  • 32 Hide
    JAYDEEJOHN , May 15, 2012 2:47 PM
    Hope its only the beginning of much more
Other Comments
  • 32 Hide
    JAYDEEJOHN , May 15, 2012 2:47 PM
    Hope its only the beginning of much more
  • -4 Hide
    Anonymous , May 15, 2012 3:08 PM
    Based on this, gaming is much better than old i5, but everything else including application performance is still better on the old Sandy architecture. I'm not really sure why I would buy a Trinity other than for a casual gaming laptop. Unfortunately, budget says that my laptops have to be used for business first, play time later.
  • 8 Hide
    beenthere , May 15, 2012 3:11 PM
    Nice to see that Trinity and AMD have delivered the goods. I want a Trinity powered Ultrathin. Intel can stick their crap where the Sun don't shine.

    BTW, Charlie @ SemiAccurate is not an AMD fanbois IME. He just calls it like it is. Reality bites sometimes be it Nvidia, AMD or Intel's problems. Denial never changes reality. It is what it is.
  • 41 Hide
    cleeve , May 15, 2012 3:18 PM
    duckwithnukesWhere is the Intel HD 4000 vs. AMD Trinity comparison? Lazy reviewing at its finest.


    A10-4600M laptops will be int eh $600-$700 neighborhood, and we're still waiting for Ivy bridge Core i5 to arrive in this price range.

    We go over this. We also talk about how we'll do a follow up as soon as an appropriate product is available.

    You need to read for it to make sense.
  • -2 Hide
    Anonymous , May 15, 2012 3:19 PM
    FlippyFlap, Apple doesn't use AMD and an HD4000 can power a retina display. I'm sure Apple has worked with Intel engineers to get the drivers right for retina displays which is HD4000's problem. HD4000 is still lacking in terms of driver support (one can see that from the OpenCL benches around the net where only 1/2 get acclerated on HD4000). When the drivers work right, there isn't much difference between Ivy and Trinity.
  • 12 Hide
    Anonymous , May 15, 2012 3:21 PM
    I agree with Cleeve and I personally hate comparing a reference system to a selling system anyway. Review 2 actual selling systems with similar parts and that gives you the benchmark.
  • 15 Hide
    DRosencraft , May 15, 2012 3:29 PM
    This looks like a very nice effort from AMD. I really, really need to replace my notebook. It's a six year old Toshiba Satelite with an AMD 1.9 GHz Turion 64 X2 with intergrated X2100 graphics.... yeah. Ancient now, I know. I've been trying to figure out a sweet spot in power since my needs are kind of complex. Typically I don't need it to do much more than handle MSOffice and web surfing. But I also tend to use it for video gaming when am interesting game comes around and some work in PaintShop when I'm out of the house, or don't feel like sitting at my desktop. This may be a little closer to what I'd like. It would be nice to get a notebook that combines this with a really good discrete card (sort of like how some MacBook Pros have their dual graphics setup). Nevertheless, Trinity looks to be just about enough power and performance, but the question is price. If tradition holds, it should be a good price competitor with Intel, which is the most important part, otherwise I'd just buy a core I7 already.

    In a related question, does Trinity's details and specs lead to any conclusions about what Piledriver desktop processors will be like?
  • -7 Hide
    neoverdugo , May 15, 2012 3:33 PM
    So this means that AMD can kick Intel's ass in the gpu department for the moment while AMD suffers greatly in the CPU portion of the apu battle. Didn't I said before that Intel is trying to make an (proprietary) Intel only PC with no third party strings attached? We all know that there is no competition in the CPU battle when it comes to Intel. Still, i would like to see that the morons of intel to drop the price of their hardware for once and for all and drop ridiculously low end hardware out of production.
  • 2 Hide
    dgingeri , May 15, 2012 3:54 PM
    No WoW benchmarks this time? I was wondering if this might make a good laptop for WoW, but you guys failed me. :( 
  • 34 Hide
    fazers_on_stun , May 15, 2012 3:58 PM
    ^ ^ Anandtech reviewed the A10-4660M with its HD7660G igp vs. the i7-3720QM with its HD4K igp and over a total of 15 games, the 7660G averaged 20% faster than the HD4K. Against the Llano 6620G igp, it was just short of 20% faster. Against the HD3K (Sandy Bridge igp), it was a whopping 80% faster. So, the conclusion is that if you want mobile gaming on a budget laptop, Trinity is the way to go...
  • 4 Hide
    blazorthon , May 15, 2012 4:04 PM
    AMD is stuck with~1333MT/s for this, so they get screwed over in the reviews because they are stuck with lower frequency RAM... Hopefully, this problem will be fixed and they will be able to use 1600MT/s and 1866MT/s with the notebooks that hit the markets. Honestly, I'm a little under-whelmed by Trinity... I was hoping for more. Granted, it is on the same process, so that it is significantly faster and uses less power than it's predecessor that uses the same process is a pretty substantial gain, but still... I was expecting a little more. It might just be the memory frequency problem.
  • 11 Hide
    Anonymous , May 15, 2012 4:05 PM
    neoverdugo

    just a side note, what you described is not an apu, it's a cpu with on die gfx. AMD's apu have not hit their full stride yet, once we have mature implementation of gpu assisted processing (opencl directcl et al) then the disparity may become significantly less, AMD strategy was always to leverage the massive computing power of the gfx core to bolster cpu performance in areas other than gaming unfortunately there was a fragmentation of the market with competing standards, once all that mess gets sorted out AMD can really flex the power of the apu
  • 12 Hide
    cleeve , May 15, 2012 4:24 PM
    Quote:
    AMD is stuck with~1333MT/s for this, so they get screwed over in the reviews because they are stuck with lower frequency RAM...


    Hey Blaze:

    Llano's BIOS was uncooperative and limited the memory to 1333, but Trinity was benched at 1600 MHz. :) 

    - Cleeve
  • 12 Hide
    Wisecracker , May 15, 2012 4:28 PM

    CleeveA10-4600M laptops will be int eh $600-$700 neighborhood, and we're still waiting for Ivy bridge Core i5 to arrive in this price range.We go over this. We also talk about how we'll do a follow up as soon as an appropriate product is available.You need to read for it to make sense.


    I hope you are wrong :) 

    A10-4600M laptops in $600-$700 neighborhood in dual graphics with a Radeon HD 7670M, please.



  • -8 Hide
    Anonymous , May 15, 2012 4:29 PM
    Actually just A10 get the 7660G igp, the rest of the line get reduced version like A8 with 7640G while all the ivy mobile version equipped with HD 4000. A review on computerbase.de shows that HD 4000 totally outperform 7640G. So if you want mobile gaming on amd laptop, A10 is the only way to go.
  • 11 Hide
    CaedenV , May 15, 2012 4:45 PM
    So with HD3000 being ~1/2 the GPU horse power of Trinity, and HD4000 being ~2x as powerful as HD3000 I guess that Intel will be slightly behind Trinity on gaming, while still holding the crown for all other performance metrics. All that is left to be seen is what kind of premium you will have to pay for the new IB laptops compared to the Trinity ones. Can't wait to see a review of both platforms in a head-to-head competition! I'm still an Intel fan boy at heart, but I would love nothing more than for AMD to give Intel a run for their money again :) 

    @Flap
    It is not hard to push high resolution displays for most things. People have been using extremely old Matrox GPUs (g450 and g550) to do 4 high res monitors for ~10 years now with no issues. the problem comes when you want to game on that high resolution screen, and honestly neither side has a good solution for that yet. But at the same time, Macs are really not made with games in mind (other than web content which I am sure both Trinity and HD4000 would be more than capable of displaying).

    @article
    AMD is absolutely right; there are uses of a product that cannot be measured by benchmarks. However, the more interesting thing to me is what we are seeing in the desktop game benchmarks, that is slowly reaching into other areas of processing (and what we have seen in media playback benchmarks for years... or rather why we no longer have media playback benchmarks), where there is a level of speed impracticality.
    For gaming on a 60FPS monitor, it no longer matters if you are running 61+FPS because you simply do not see it, and anything above 30FPS is generally considered 'acceptable'.
    For office work on an SSD it does not matter if it takes your computer .5sec to open Word on a 5 year old PC, or .2sec on a new PC because there is simply no time for the human mind to react so quickly to move from the mouse to the keyboard and start typing. And anything slower than an SSD will rely on the bottleneck of the HDD anyways, making the CPU a moot point.
    The same goes for browsing the web where your internet speed is so slow (even on 'fast' internet connections) that there is no practical/perceivable difference between running an old system vs a brand spanking new system (much less AMD vs Intel).
    Media playback is another area where so long as you reach the requisite 12-30fps (depending on the source material) it does not matter if you are running on an Atom, or a high end duel 2011 platform. there is simply no difference so long as you reach a specific threshold of 'good enough' for the specific application
    For larger projects of video editing, 3D design, mass data compression, etc. There is still a need for benchmarks, but the markets that need these high demand applications for everyday use are willing to shell out the money for whatever is fastest because the lost productivity time is much more expensive than the hardware investment (and 'the fastest' hardware is not expensive like it use to be for end-user workstations).
    The point is that we need to find a new way to benchmark that looks at threshold requirements like we do with gaming benchmarks where there is a threshold of usefulness, and a threshold of imperceptible performance gains, and then finding a way to compare the relative usefulness of 'unbenchmarkable' feature sets (Like the value of CUDA vs Direct Compute, hardware based acceleration for specific software titles, and proprietary features such as Intel's Lightpeak/Thunderbolt technology). I think it means an evolution of doing hardware-centric benchmarks to more use-centric benchmarks, and even specific title benchmarks.
    As an example: What does it look like to use Adobe premiere on an AMD or Intel platform of similar cost? What features are available on one platform over the other? What performance gains are made by adding an SSD/RAID or dedicated GPU to the system? And which platforms use these additions most effectively? What types of tasks run better or worse on each platform (Is one better at specific filters than others? Is one better for production use while the other is better at exporting a final product?)?
    We are getting to a point where what matters more is the feature set/limitations of the motherboard and platform, than the speed of any individual component on the platform when it comes to the final experience of the end user. There is still a need for specific part reviews, but AMD is right; the individual parts many times do not paint an accurate picture for the speed or usefulness of a platform, and it is a trend that will only become more pronounced with time.
  • -9 Hide
    blazorthon , May 15, 2012 4:46 PM
    CleeveHey Blaze:Llano's BIOS was uncooperative and limited the memory to 1333, but Trinity was benched at 1600 MHz. - Cleeve


    Well, that's even worse. Trinity just doesn't seem like a good enough leap over Llano.
  • 8 Hide
    blazorthon , May 15, 2012 4:48 PM
    The noobActually just A10 get the 7660G igp, the rest of the line get reduced version like A8 with 7640G while all the ivy mobile version equipped with HD 4000. A review on computerbase.de shows that HD 4000 totally outperform 7640G. So if you want mobile gaming on amd laptop, A10 is the only way to go.


    Intel's graphics gets weaker on lower end models... For example, the HD 3000 on the i7s is FAR faster than the HD 3000 on the i3s and is considerably faster than the HD 3000 on the i5s (although even within each family, there can be differences, all of this is because although they have the same graphics hardware, the clock frequency of the IGP differs). The same is probably true for the HD 4000. The cheaper i5s and i3s will probably have weaker graphics performance than the top i5s and the i7s do.
  • 0 Hide
    deanjo , May 15, 2012 4:58 PM
    Quote:
    and the x264 front-end HandBrake are all able to take advantage of AMD’s programmable shader hardware and fixed-function VCE logic for accelerated video transcoding.


    Ummm, no it can't. Handbrake is 100% cpu.
Display more comments