AMD's Kabini: Jaguar And GCN Come Together In A 15 W APU

Jaguar: A Low-Power x86 Core

We've already introduced you to a number of AMD's APU designs, which combine general-purpose and graphics processing resources onto a single die. First it was Llano in the mobile space with The AMD A8-3500M APU Review: Llano Is Unleashed. Then it was Trinity on the desktop in AMD Trinity On The Desktop: A10, A8, And A6 Get Benchmarked! But both of those APU designs followed AMD's more performance-oriented roadmap with the Stars- and Piledriver-derived CPU architectures.

For an example of the company's low-power efforts, we have to go all the way back to January of 2011 for ASRock's E350M1: AMD's Brazos Platform Hits The Desktop First. The Brazos platform came armed with a Zacate APU. Within Zacate, AMD integrated two 1.6 GHz Bobcat-based x86 cores and its Cedar (Radeon HD 5450ish) GPU. 

The Jaguar architecture we're looking at today is an iterative improvement over Bobcat. In approaching Jaguar, AMD says it had three design goals. First, improve IPC. Bobcat was (in)famously slow, barely outperforming Intel's 2008-era Atom 330. Second, bring the ISA's functionality up to more modern standards, introducing instruction sets like SSE4.1/4.2 and AVX. Third, augment portability for the future, making Jaguar easier to take to new process technologies and fab partners.

As end users, that last point isn't our problem. The modern list of features is nice, but once you know what Jaguar supports, it's easy to anticipate the gains in specific, optimized workloads. AMD's efforts to improve IPC are much more interesting, though.

Let's start with the basics. Jaguar (as it shows up in the SoCs we're talking about today) is available in dual- and quad-core configurations. Bobcat-based SoCs were limited to dual-core arrangements. The quad-core variants based on Jaguar require active cooling, while the dual-core chips should run cool enough for passive cooling. 

The CPU core is manufactured using 28 nm technology, and AMD's chief technology officer, Joe Macri, points out that the x86 design team leveraged some of the software tools used to build GPUs, squeezing more resources into smaller area than more custom previous-gen cores. As a result, each Jaguar core occupies 3.1 square millimeters of die space. That's notably smaller than the 4.9 square millimeters each Bobcat core monopolized.

Now, where does Jaguar improve over Bobcat? In the front-end, Jaguar's instruction cache offers similar throughput, though it delivers this bandwidth at a lower power cost thanks to a selective read process that only activates one-fourth of the banks. A 4x32B loop buffer is also added; when the execution pipelines can use information stored there, the instruction cache can stay powered-down, yielding the double benefit of lower latency.

In addition, the instruction buffer is about 30% larger than it was on Bobcat, circumventing some of the hit you might take after a cache miss.

Finally, the execution pipeline grows by one decode stage. As we saw so painfully when Intel introduced Pentium 4, longer pipelines are actually detrimental to IPC. However, breaking the pipeline up does help improve scalability. The assumption is that AMD is countering the IPC hit with higher clock rates.

The integer pipeline is augmented with a divider unit pulled over from Llano's Stars architecture and modified for Jaguar. Support for a number of familiar complex operation (cops) instructions is included, in addition to hardware CRC units to help the CPU's x86 code execution efficiency. Schedulers and re-order buffers are anywhere from 30 to 70% larger, improving the parallelism of code executed out-of-order.

The L2 cache and its interface with the execution cores is completely redesigned. It is now shared, 2 MB-large (broken up into 512 KB banks), and 16-way associative, no longer 512 KB dedicated to each core. AMD says this is a nod to efficiency, as software can take advantage of a little or a lot, depending on a thread's needs. 

Bobcat's L2 cache ran at half of the CPU's clock rate. Jaguar's interface runs at full processor frequency. Pre-fetching is improved; AMD's algorithm pays better attention to data patterns, assisting the predictor in making better choices. Sixteen additional L2 snoop entries act as a probe filter to avoid look-ups whenever possible, again, saving power and improving latencies. According to AMD, its shared L2 is one of the greatest contributors to IPC improvements in Jaguar compared to Bobcat.

The load/store unit between the the execution pipeline and L2 cache, and the data cache, are improved to help make AMD's L2 enhancements more tangible. Jaguar combines loads, utilizing a much bigger buffer to avoid store data shuffling and perform load bypasses at lower latencies.

The sum of AMD's changes to Jaguar add up to a 22% single-threaded IPC increase over Bobcat, the company says. That's a per-clock improvement, so optimizations for clock rate should push that number upwards as this architecture hits higher frequencies. Naturally, we'll be putting those claims to the test in just a few pages...

  • zeek the geek
    This is was we expect on the new consoles, I sure as heck can't wait to see what improvements we'll have on games ported over to PC are. I'm tired of these makeshift ports... Glad to see AMD has their hands in the console field, now maybe we'll see a huge influx of cash on their end to help improve their line and drivers that will give Nvidia a good run for so we can see "OUR money" go to good use. To better technology and innovation!
  • slomo4sho
    With Haswell around the corner claiming models with TDP of 15, 13.5, and 10 watts, the lack of performance in this chipset is discouraging to say the least.
  • dragonsqrrl
    This is the best CPU architecture to come out of AMD in a very long time. It has so many things going for it in comparison to the current competition from Atom. Far superior overall performance, improved power consumption and FP performance over its predecessor (weak points of Brazos), much better graphics performance, broader x86 instruction support, and an actual process advantage (28nm vs 32nm). AMD has a huge opportunity here, and I sure hope they capitalize on it quickly because it won't last long. Atom's based on Intel's upcoming Silvermont architecture will likely outperform Jaguar and reverse most of the advantages AMD currently has.
  • BringMeAnother
    Its performing well in all the wrong areas. If I'm going to play games, I'd rather play with at least high settings with decent resolution. I'm perfectly willing to give up mobility for a gaming machine.
  • mcx2500
    Given that the AMD Temash and Kabinis are priced in the range of Atoms, it is illustrative that the Tom's reviewer used two Pentium and i3 CPUs that cost over $130 and $200 respectively.

    To see the Intel chips utilizing dramatically more watts than the Kabini brings up issues discovered by other reviewers. Just look at the graph of the i3-3217u rated at "17 watt TDP" playing F1-2012 at what is 100% or nearly 35 watts! This means that AMD Kabini A6-5200 which is being released in June will outperform Intel's $225+ i3-3217u for price-performance per watt, you can be on it.

    While running the range of applications, the AMD Kabini remained cool while the Intel chips heat up dramatically. This heat has to be dissipated from the laptop and it takes a toll on both the machine and user.

    HP just announced 10 point touchscreen laptops that utilize AMD Jaguar Kabinis for a breakthrough price of $399 and that is just a start of a flood of good old competition (hello AMD Kaveri APU Xmas).
  • dragonsqrrl
    mcx2500To see the Intel chips utilizing dramatically more watts than the Kabini brings up issues discovered by other reviewers. Just look at the graph of the i3-3217u rated at "17 watt TDP" playing F1-2012 at what is 100% or nearly 35 watts!This is because the i3-3217u is not an SOC, it's just an ULV dual core Ivy Bridge. Many of the controllers and other supporting hardware are located off die on the mother board, which increases power consumption over the CPU/GPU's rated 17W TDP.

    Kabini will have to compete with Intel's upcoming ULV Haswell, which will go as low as ~10W TDP and will be an SOC. This is why I said in my previous comment that I feel AMD has a rare advantage right now and a narrow window of opportunity to make an impact. Jaguar will overlap Silvermont on the low end of its TDP range, and Haswell on its upper end. Both will likely outperform it in their given segments.
  • cleeve
    mcx2500Given that the AMD Temash and Kabinis are priced in the range of Atoms, it is illustrative that the Tom's reviewer used two Pentium and i3 CPUs that cost over $130 and $200 respectively.
    AMD told us the Kabini laptop they gave us would be priced $500 on the market, and that cheaper versions would be as low as $350.

    We used the cheapest comparison laptops we could find. The only thing it illustrates is that we were trying to give Kabini the best chance of strutting its stuff.
  • amdfangirl
    AMD Kabini follows the idea of a tablet - people buy them because they are good enough. That's what is causing the downturn in the PC industry. With the performance advantage over ARM chips and Intel Atom, I really see this as a viable alternative in netbooks and Windows tablets.

    AMD Kabini sleekbook. I am just drooling at the idea of that.
  • amdfangirl
    dragonsqrrl Kabini will have to compete with Intel's upcoming ULV Haswell, which will go as low as ~10W TDP and will be an SOC
    No, Kabini competes in the Intel Atom price range like its predecessor, AMD Brazos.

    Sure they compete in a similar TDP range, but you wouldn't expect people to compare the chips that go into $999 ultrabooks with chips that will (ultimately) go into the same form factor as them, but are priced at <$400.

    ULV processors from Intel are priced at a premium - because Intel is unchallenged in that space. AMD would be insane to try and price Kabini anywhere near IVB or Haswell ULV parts, because AMD will never win by overpricing their products.

    "There's no such thing as a bad product, just a bad price point"
    Edit: Not entirely sure why my comment got cut off, but here it is. Please note this comparison was made about the ultraportable area of the market, where the main concerns are weight, screen size and battery life. If we start comparing a CPU designed for primarily 11.6" or 10.1" screens with say 35W CPUs in a 15" form factor, you've lost the whole point of the comparison you're doing ultraportable vs. desktop replacements. Sure, if a manufacturer wants to put Kabini in a 15" form factor then it's fair game, but for the majority of Kabini chips, we'll see them in ultraportables, not desktop replacements.
  • ta152h
    Comparing Kabini with SB/IB is like comparing a four cylinder car with an eight cylinder car. It's plain silly, and kind of obnoxious.

    This was a poor review because of the choice made there. I think a lot people were curious about how improved it was over the Bobcat. No data. How about the Atom? No data. Let's just compare it with chips the Piledriver competes with, instead of those it does. It makes no sense.

    In case you guys haven't figured it out, Piledriver is the competitor for SB/IB, not Kabini. Two different markets. That you justify this so poorly by saying one particular notebook would cost x amount of dollars, is borderline insane. From one notebook, which are based on things other than the cost of the processor as well, you would assume all will cost the same? Strange.

    The comparisons with SB/IB aren't worthless, but they should have been in addition to the processors in their market, and also with AMD's Trinity line. Maybe four or five processors, instead of just two that are addressing a higher performance market, and architecturally quite close.

    You lost this one to other sites. Normally, especially when Chris writes them, Tom's ends up having the best information. Not this time. Not even close.