Sign in with
Sign up | Sign in
AMD Trinity On The Desktop: A10, A8, And A6 Get Benchmarked!
By ,
1. Trinity: Coming Soon To A Desktop Near You

This story, a preview of AMD's Trinity-based A10, A8, and A6 families, was originally published on June 14, 2012. It was not condoned, supported or sponsored in any way by AMD. The piece appears here, unchanged, with the same information presented nearly four months ago.

About a month ago, AMD took the wraps off of its Trinity-based APU. Hotly-anticipated, all eyes turned to see how the second-generation amalgamation of x86 cores and graphics processing resources performed. Why were so many enthusiasts interested in a decidedly mainstream piece of hardware?

Let’s just say Trinity’s composition is…unique.

AMD’s new APU is the first component sporting its Piledriver x86 architecture. After the disappointment that was Bulldozer, which we first evaluated in FX-8150, hopes had to be pinned on a follow-up. And Trinity has it. Back when we were first briefed on Bulldozer, AMD showed us roadmaps with a new architectural revision pushing 10-15%-higher performance each year. Now, power users want to know if Trinity’s Piledriver-based cores deliver on the company's promise.

Moreover, Trinity employs a newer graphics architecture than Llano. Instead of the VLIW5 arrangement, which also sat at the heart of Radeon HD 6800 and older GPUs, it utilizes the VLIW4 design that went into AMD’s Radeon HD 6900-series cards. Everything after the 6900s swapped over to Graphics Core Next, so VLIW4 isn’t a very prolific implementation. But it’s supposed to be more efficient. Naturally, then, we all want to see how Trinity’s on-die GPU compares to what came before.

We’re Going Mobile, Mav

There was just one problem with last month’s introduction: it only covered the mobile implementation of Trinity.

That was the right move for AMD, no question. It doesn’t take a page of analysis to figure out how putting a CPU and GPU on the same piece of silicon can help address the physical, thermal, and power-oriented issues that laptop manufacturers have to overcome as they design new products.

But enthusiasts were left with questions. Most obviously, how might Piledriver be expected to behave in an FX-branded device? Does it ameliorate Bulldozer’s weaknesses? Given a similar 100 W ceiling on the desktop and the same 32 nm manufacturing process, does Piledriver/VLIW4 deliver an appreciable benefit compared to Stars/VLIW5?

Answering those questions requires the freedom to tweak and tune around in a motherboard BIOS. So, we got our hands on a trio of Trinity-based desktop APUs and set out to preview their performance.

I say preview because hardware based on the Trinity design isn’t going to be available in the channel until later this year. It has been reported that there are still a lot of Llano-based APUs out there, which AMD needs to sell off. So, it’s making Trinity available to OEMs designing notebooks and desktops in time for back-to-school. But you won’t be able to buy these chips for a while. Moreover, the motherboards supporting them with Socket FM2 interfaces aren’t fully-baked, either.

Meet The Desktop Trinity Line-Up


Radeon HD
GPU (MHz)
Shaders
TDP
Cores
Base CPU (GHz)
Turbo Core (GHz)
L2 Cache
Unlocked
A10-5800K
7660D
800
384
100 W
4
3.8
4.2
4 MB
Yes
A10-5700
7660D
760
384
65 W
4
3.4
4.0
4 MB
No
A8-5600K
7560D
760
256
100 W
4
3.6
3.9
4 MB
Yes
A8-5500
7560D
760
256
65 W
4
3.2
3.7
4 MB
No
A6-5400K
7540D

192
65 W
2
3.6
3.8
1 MB
Yes
A4-5300
7480D

128
65 W
2


1 MB
No


Of the six models purportedly planned, we have three of them: the A10-5800K, the A8-5600K, and the A6-5400K.

The A10-5800K will be AMD’s flagship. A pair of Piledriver modules technically makes this a quad-core APU, though, as we know, each module shares certain resources. The top-end A10 operates at a 3.8 GHz base clock that scales up to 4.2 GHz via Turbo Core, though our sample spent most of its time at 4 GHz (an intermediate P-state). Each of the -5800K’s Piledriver modules has its own 2 MB shared L2 cache, adding up to 4 MB across the chip. And AMD arms its two A10 APUs with Radeon HD 7660D graphics—a 384-shader engine operating at 800 MHz on the -5800K (and, reportedly, 760 MHz on the -5700).   

A small step down, the A8-5600K also leverages two Piledriver modules and 4 MB of total L2 cache (none of the Trinity-based APUs have L3). Its base frequency is 3.6 GHz with a 3.9 GHz Turbo Core ceiling. Both of the A8s come with Radeon HD 7560D graphics (256 shaders running at 760 MHz).

AMD’s A6-5400K represents a more significant departure from the other K-series SKUs. To begin, it bears a 65 W TDP, instead of 100 W. It’s also a single-module APU armed with two integer cores and a single floating-point unit. And instead of a shared 2 MB L2 cache, the -5400K is trimmed down to 1 MB of shared space. Radeon HD 7540D graphics are composed of 192 shader cores operating at an undisclosed frequency.

2. Piledriver: Half Of The Trinity Story

AMD is eager to deemphasize the importance of x86 performance, instead focusing on the potential of workloads accelerated by its powerful graphics architecture. The company willingly dubs its implementation “good enough,” pointing out that basic productivity-oriented workloads reliant on user input aren’t sped up at all by a faster CPU.

On the other side of the fence, synthetic benchmarks and diagnostics easily quantify the potential delta between architectures like Ivy Bridge and Bulldozer.

As with most debates, the truth lies somewhere in the middle. Many (if not most) of the benchmarks in our suite measure the alacrity of x86 computing resources in a very real-world way. Others focus more intently on graphics performance. And we’re increasingly adding tests able to leverage what AMD calls heterogeneous computing—improving performance by drawing from multiple subsystems concurrently.

The point is that x86 cores are still first-class citizens in the APU world, and there is such a thing as performance that’s not good enough. That’s part of the reason why so many of us want to know how the Piledriver architecture improves upon Bulldozer. So let’s get that out of the way first.

We took the A10-5800K, set it to 3.8 GHz, turned off Turbo Core and any power-saving feature that’d spin the chip down. Then, we took FX-8150, overclocked it to 3.8 GHz, and disabled all of the same features. By running a single-threaded workload like iTunes, we could neutralize the difference in core count (though, if anything, FX could have benefited from its 8 MB L3). Nevertheless, Piledriver clearly completes our workload much faster, yielding a 15% improvement, per clock cycle, over Bulldozer.

Turning off two of FX-8150's Bulldozer modules gives us the opportunity to run a threaded workload like 3ds Max without slanting the result toward Bulldozer. And once again, the Piledriver-based APU wins by roughly 15%.

Ivy Bridge was only about 4% faster at a given clock rate than Sandy Bridge. So, while we’re fairly certain that a Piledriver-based FX wouldn’t overtake the newest Core i7s, it should be more competitive than today’s Bulldozer-based CPUs. Where does the speed-up come from? Doesn't appear to be cache latency; Sandra shows the same results for Bulldozer and Piledriver.

As far as its role in Trinity, the benchmarks will show that the Piledriver architecture generally outperforms Llano’s Stars design, particularly in applications that emphasize integer math. When you start taxing Piledriver’s shared floating-point resources, older Llano-based APUs still wind up delivering better performance, though generally by slim margins.

3. Turbo Core Finds Its Way Into APUs

Whereas AMD launched Llano with a limited number of SKUs, none of which were unlocked or equipped with Turbo Core functionality, all six of the Trinity-based APUs currently being discussed are equipped with Turbo Core, and three of them are unlocked.

Of course, because an APU contains x86 and graphics compute units, managing performance against thermal output requires communication between all of the chip's functional components. Each Piledriver module has its own power monitor, which reports to a manager in the on-die northbridge. The monitors keep tabs on power consumption in a deterministic fashion, based on the module’s activity. The GPU has a power monitor of its own, which also measures power use based on each compute unit’s activity.

Using data from those three monitors (on the four-core APUs), the northbridge integrates power over time. When consumption is less than TDP, active resources are able to run faster up to the available power headroom. As with FX, Turbo Core supports two levels of boosted P-states, which can be used to increase CPU frequency by any number of 100 MHz increments.

The behavior of Turbo Core is now being referred to as bi-directional, since data from multiple sources must be read as input before output (frequency control) can be applied. As a result, the APU operates at different clock rates depending on whether the workload is lightly threaded, heavily threaded, or GPU-intensive.  

It’s easiest to measure the effect of Turbo Core in a lightly-threaded application that leaves the most thermal headroom on the table. A run through our same iTunes conversion shows a 4% speed-up attributable to AMD’s third-gen Turbo Core technology.

Overclocking A10-5800K

Of course, there are going to be enthusiasts who, rather than allow Turbo Core to push performance modestly, wish to get more aggressive on their own.

AMD recognizes this, giving its K-series parts unlocked multiplier ratios for easier access to higher clock rates without affecting the frequency (and consequently, the stability) of other on-board subsystems. Because these APUs incorporate a graphics engine as well, though, that component's performance is also adjustable.

Despite the early status of our FM2-equipped motherboards, we managed to crank our A10-5800K up to 4.5 GHz using core voltage settings as high as 1.5 V. Windows would start to boot at 4.6 and 4.7 GHz, but never made it to the desktop before locking up. AMD says that, using a different platform, it's able to hit 4.8 GHz on air cooling. As we get closer to channel availability, we have to guess that platform vendors will better-optimize the overclocking headroom of K-series SKUs, particularly since there will be three of them at launch.

AMD also makes it a point to note that overclocking its x86 cores is far less effective than tuning the graphics engine. And while the Radeon HD 7660D on our -A10-5800K is set to operate at 800 MHz by default, it's running beyond 1 GHz in AMD's lab. Unfortunately, while our ASRock-based platform has a field for graphics overclocking, setting it higher currently doesn't seem to affect performance. We'll need to revisit overclocking later this year.

4. Graphics: Fewer Shaders, Better Efficiency

As Don pointed out in his mobile A10-4600M coverage, shifting Trinity to the VLIW4 architecture first found in AMD’s Radeon HD 6900-series cards allows it to deploy fewer shaders, but take better advantage of them. By then turning clock rate up or down (depending on thermal headroom), it’s able to improve performance without adding a lot of complexity to the APU itself.

At its most feature-complete, Trinity’s Devastator graphics processor includes as many as six SIMD engines, each with four texture units and 16 thread processors. There are four ALUs in each thread processor, adding up to 384 total shaders and 24 texture units.

Sliding down AMD’s stack, SIMD engines are gradually turned off and clock rates are dialed down to create differentiation. The A10s both have 384 shaders. The A8s lose two SIMDs (and eight texture units), creating a 256-shader component. AMD’s A6-5400K sports three SIMDs, totaling 192 shaders. And the A4 looks to be a 128-shader offering.

Dual Graphics

Like Llano, Trinity supports Dual Graphics configurations—cooperative rendering using the on-die Radeon engine and a discrete card of roughly comparable potency. Although I don’t have any of the models AMD lists in its support matrix, I did discover that a Radeon HD 6670 does the trick as well.

Our baseline blue bars represent the Radeon HD 7660D built into the A10-5800K APU. The green bars are our Radeon HD 6670 on its own. And the red bar is both graphics engines working cooperatively.

At 1280x720 there isn’t enough graphics load to let Dual Graphics shine before this platform runs into a processor bottleneck. The gap opens up at 1680x1050, though, and continues to show off the benefit of Dual Graphics at 1920x1080, where the integrated Radeon HD 7660D and Radeon HD 6670 average almost 100 FPS.

Discrete GPU
Desktop APU
Discrete Graphics Code-Name
Radeon Product Name
Recommended Memory
A6-Series
HD 7540D
A8-Series
HD 7560D
A10-Series
HD 7660D
Desktop Configurations
Turks XT
HD 7670
GDDR5
Discrete Available
Discrete RecommendedDiscrete Recommended
Turks Pro
HD 7570
GDDR5Discrete AvailableDiscrete RecommendedDiscrete Recommended
Turks Pro
HD 7570
DDR3
Discrete Recommended
Discrete RecommendedDiscrete Recommended
Caicos XT
HD 7470
DDR3Discrete RecommendedDiscrete AvailableDiscrete Available
Caicos Pro
HD 7450
DDR3Discrete AvailableNo Discrete (APU-Only)
No Discrete (APU-Only)
All-in-One Configurations
Onega LP
HD 7670A
GDDR5Discrete RecommendedDiscrete RecommendedDiscrete Recommended
Onega LP
HD 7650A
DDR3Discrete RecommendedDiscrete RecommendedDiscrete Recommended
Caspian XT
HD 7470A
DDR3Discrete RecommendedDiscrete AvailableDiscrete Available
Caspian Pro
HD 7450A
DDR3Discrete AvailableNo Discrete (APU-Only)No Discrete (APU-Only)
Cedar
HD 7350A
DDR3No Discrete (APU-Only)No Discrete (APU-Only)No Discrete (APU-Only)


We also benchmarked Batman: Arkham City using the Low quality setting preset and found that performance actually slid backward with Dual Graphics enabled. The version of the driver we tested, however, still doesn't offer Dual Graphics support in anything but DirectX 11 applications. You don't get DX 11 features in Batman until you choose a higher preset. Fortunately, AMD has a beta driver (Catalyst 12.6) that adds DirectX 9 compatibility, too. We didn't have time to give it a spin before today's piece, but it's something we'll follow-up on.

What we do know from WoW is that Dual Graphics has serious potential on this platform. Notably, the setup process is significantly easier than it was back when I evaluated Llano. You simply plug in the discrete card, keep your display attached to the on-board GPU, and install the drivers. Dual Graphics even gets enabled automatically now.

Display Connectivity

Another advantage that Trinity holds over Llano is Eyefinity support. The new APU includes four display controllers, whereas its predecessor only included two. That means you can do three- or four-way monitor configurations, providing you use the right combination of outputs. Four screens, for example, require DisplayPort 1.2 multi-streaming on at least one output. A trio of screens is easier. You can use two non-DisplayPort panels, plus one DisplayPort- or VGA-equipped screen as the third.

5. Memory Bandwidth Scaling: Feed The Beast

It used to be that memory controllers built into chipset northbridges played a key role in determining system performance, making the modules you’d drop in an important consideration as well. But processor-based controllers have done a lot to maximize throughput. Two-, three-, and four-channel implementations designed to address high-end server workloads are generally overkill on the desktop.

As a result, it’s been a really long time since we’ve encountered a processor architecture starved for bandwidth. That’s bad news for memory vendors, who charge a premium for modules rated for higher data rates at lower latencies. After all, if cheap DDR3-1333 gets the job done, why worry about a kit capable of DDR3-2800?

The answer is integrated graphics.

Intel’s HD Graphics 4000 engine is fast enough to reflect moderate scaling as memory bandwidth increases. Before that, the Llano-based A8 APUs also demonstrated acute sensitivity to system memory data rates, justifying higher-end modules. And now, with Trinity, we get a purportedly higher-end graphics processor that’ll undoubtedly need to be fed even faster in order to realize its potential.

In response, AMD officially adds support for up to DDR3-2133 with one module per channel, or DDR3-1866 in one- and two-module-per-channel configurations. In comparison, Llano topped out at DDR3-1600. (Update: AMD clarifies that desktop Trinity-based APUs will max out with DDR3-1866 support).

In the SiSoft Sandra tests you’ll see shortly, the Trinity-based APUs yield less memory bandwidth from our 16 GB DDR3-1600 kit than Llano. But let’s see what happens when we replace those modules with Kingston’s new KHX2800OC12D3T1K2/4GX kit of two DDR3-2800 modules.

Although we’re only using two modules for this specific test, the ASRock FM2A75 Pro4 motherboard serving as our test platform won’t boot above DDR3-1866, limiting the scope of our early testing. Attempt to run at DDR3-2133 using manual parameters, or DDR3-2800/2666 using pre-programmed settings simply wouldn’t work.

With that said, bandwidth doesn’t scale linearly, and gains are already tapering off by 1866 MT/s. We do, however, see AMD’s A10-5800K pick up modest bandwidth as data rates go up. How do those numbers correspond to a real-world gaming scenario?

Just as we were expecting, feeding Trinity with the right memory kit is going to make a huge difference in graphics-bound applications, in particular. And while the increases in average frame rate are great in World of Warcraft, the minimum frame rate numbers are even more meaningful. When performance dips to 21 FPS, it’s a totally different experience compared to 41 FPS.  

The situation isn’t as compelling in WinRAR, which has historically demonstrated more sensitivity to memory bandwidth than any other application in our suite. Performance scales well through DDR3-1333, but as timings have to be loosened, higher latencies counteract the bandwidth increase, and WinRAR hits a ceiling of sorts.

Of course, the thing to remember is that this is a preview. It seems pretty clear that Trinity-based APUs are going to benefit from fast memory kits able to feed their Radeon graphics engines. What we don’t yet know, though, is whether motherboard vendors will be able to tune their firmware for stable operation at even higher data rates, allowing us to push some of the newest enthusiast-oriented kits.

6. Socket Compatibility And The A85X FCH

Want Trinity? You Need A New Motherboard

Perhaps the biggest downer for early adopters of AMD’s Fusion initiative is the quickness with which the company is deprecating support for the Socket FM1 interface used to enable desktop-class Llano APUs. In much the same way that Intel replaced LGA 1156 with a very similarly-sized LGA 1155, AMD’s existing 905-pin socket is giving way to a 904-pin one.

Presumably, changes to the FM2 interface came about due to power delivery, since the PCIe and DDR3 I/Os shouldn’t be any different. Whatever the reason, though, Llano-based APUs won’t drop into FM2-equipped boards, and Trinity-based APUs won’t work in platforms with Socket FM1. As you can see in the image above, Socket FM2, on the left, and FM1, on the right, are keyed completely differently.

Meet The New A85X FCH

Although Trinity-based APUs are not socket-compatible with Llano, there’s nothing precluding motherboard vendors from attaching existing Fusion Controller Hubs to the new processor’s four-lane UMI interface. We actually have two FM2-equipped motherboards in the lab: ASRock’s FM2A75 Pro4 and a platform based on A85X, formerly referred to as Hudson-D4.

In reality, the two chipsets are pretty hard to tell apart. Basically, A85X gives you eight SATA 6Gb/s-capable ports, RAID 5 support, and the ability to divide the APU’s 16 lanes of PCI Express 2.0 into a pair of x8 links.

Otherwise, you’re looking at the same combination of USB 2.0 and 3.0 ports (4 + 10), the same four-lane Unified Media Interface, four lanes of second-gen PCIe, and four-channel audio (along with FIS-based switching, mSATA support, legacy PCI, and so on). AMD has not yet added PCI Express 3.0 support to any of its platforms, and isn’t expected to for some time.

More than likely, you’ll look to A75-based boards with Socket FM2 interfaces to save a little money, or A85-based platforms as a more feature-complete step up.

7. Test Setup And Benchmarks
Test Hardware
Processors
AMD A10-5800K (Trinity) 3.8 GHz (19 * 200 MHz), Four Cores, Socket FM2, 4 MB Total L2 Cache, Turbo Core enabled, Power-savings enabled

AMD A8-5600K (Trinity) 3.6 GHz (18 * 200 MHz), Four Cores, Socket FM2, 4 MB Total L2 Cache, Turbo Core enabled, Power-savings enabled

AMD A6-5400K (Trinity) 3.6 GHz (18 * 200 MHz), Two Cores, Socket FM2, 1 MB Total L2 Cache, Turbo Core enabled, Power-savings enabled

AMD A8-3850 (Llano) 2.9 GHz (14.5 * 200 MHz), Four Cores, Socket FM1, 4 MB Total L2 Cache, Power-savings enabled

AMD FX-8150 (Zambezi) 3.6 GHz (18 * 200 MHz), Eight Cores, Socket AM3+, 8 MB Shared L3 Cache, Turbo Core enabled, Power-savings enabled
Thermal Paste
Zalman ZM-STG1
Motherboard
ASRock FM2A75 Pro4 (Socket FM2) AMD A75 FCH, Beta BIOS

ASRock A75 Extreme6 (Socket FM1) AMD A75 FCH, BIOS v.2.00

Asus Sabertooth 990FX (Socket AM3+) AMD 990FX/SB950, BIOS 1208
Memory
G.Skill 16 GB (4 x 4 GB) DDR3-1600, F3-12800CL9Q2-32GBZL @ 9-9-9-24 and 1.5 V

Kingston 4 GB (2 x 2 GB) DDR3-2800, KHX2800OCC12D3T1K2/4GX @ 1.5 V
Hard Drive
Intel SSD 510 250 GB, SATA 6 Gb/s for gaming tests

Intel SSD 520 240 GB, SATA 6 Gb/s for productivity/content creation tests
Graphics
AMD Radeon HD 7660D

AMD Radeon HD 7560D

AMD Radeon HD 7540D

AMD Radeon HD 6670

Nvidia GeForce GTS 450
Power Supply
Cooler Master UCP-1000 W
System Software And Drivers
Operating System
Windows 7 Ultimate 64-bit
DirectX
DirectX 11
Graphics DriverHD Graphics Driver For Windows 7 (15.26.8.64.2696)

ASRock sent us its upcoming FM2A75 Pro4 to use a test platform. As its name suggests, the FM2A75 employs the older A75 FCH, which will likely become an approach some manufacturers use to maintain lower prices on FM2-equipped motherboards. We also have an A85-based board in the lab from another vendor, but its BIOS wasn't quite as far along.

You'll have to pardon the lack of comparison data. AMD never sent an A8-3870K to our SoCal lab, and my closest Intel-based competition is a Core i3-2105, which sells for quite a bit more than the A8-3850 in the charts on-hand. So, I took to Newegg and ordered an A8-3870K and a Core i3-2100, which make for a more balanced match-up. Both are on the test bench behind me right now, and we'll be publishing a video in the next few days with our findings. Not knowing how much the A10-5800K will cost, I'll try to get the Core i3-2105's HD Graphics 3000 included as well.

Game Benchmarks And Settings
Batman: Arkham City
Game Settings: Lowest Quality Settings, Anti-Aliasing: Disabled, V-sync: Disabled, DirectX 11 Mode, 1280x720 / 1680x1050 / 1920x1080, Built-in Benchmark
The Elder Scrolls V: Skyrim
Game Settings: Medium Quality Settings, FXAA disabled, V-sync: Disabled, 1280x720 / 1680x1050 / 1920x1080, 25-second playback, Fraps
World of Warcraft: Cataclysm
Game Settings: Good Quality Settings, Anti-Aliasing: 1x AA, V-sync: Disabled, 1280x720 / 1680x1050 / 1920x1080, Demo: Crushblow to The Krazzworks, DirectX 11, 64-bit Binary
Diablo III
Game Settings: Low Quality Settings, Anti-Aliasing: Disabled, V-sync: Disabled, 1280x720 / 1680x1050  / 1920x1080, The Siege Of Bastion's Keep, 120-second playback, Fraps
Audio Benchmarks and Settings
iTunesVersion: 10.4.10, 64-bit
Audio CD ("Terminator II" SE), 53 min., Convert to AAC audio format
Lame MP3Version 3.98.3
Audio CD "Terminator II SE", 53 min, convert WAV to MP3 audio format, Command: -b 160 --nores (160 Kb/s)
Video Benchmarks and Settings
HandBrake CLIVersion: 0.9.5
Video: Big Buck Bunny (720x480, 23.972 frames) 5 Minutes, Audio: Dolby Digital, 48 000 Hz, Six-Channel, English, to Video: AVC Audio: AC3 Audio2: AAC (High Profile)
MainConcept Reference v2.2
Version: 2.2.0.5440
MPEG-2 to H.264, MainConcept H.264/AVC Codec, 28 sec HDTV 1920x1080 (MPEG-2), Audio:
MPEG-2 (44.1 kHz, 2 Channel, 16-Bit, 224 Kb/s), Codec: H.264 Pro, Mode: PAL 50i (25 FPS), Profile: H.264 BD HDMV
Application Benchmarks and Settings
WinRARVersion: 4.11
RAR, Syntax "winrar a -r -m3", Benchmark: 2010-THG-Workload
WinZip 16.5Version: 16.5
WinZip GUI, Benchmark: 2010-THG-Workload
7-Zip
Version 9.22 beta
LZMA2, Syntax "a -t7z -r -m0=LZMA2 -mx=5", Benchmark: 2010-THG-Workload
Adobe Premiere Pro CS 5.5
Paladin Sequence to H.264 Blu-ray
Output 1920x1080, Maximum Quality, Mercury Playback Engine: Software Mode
Adobe After Effects CS 6
Version: CS5.5
Tom's Hardware Workload, SD project with three picture-in-picture frames, source video at 720p, Render Multiple Frames Simultaneously
Adobe Photoshop CS 6 (64-Bit)Version: 11
Filtering a 16 MB TIF (15 000x7266), Filters:, Radial Blur (Amount: 10, Method: zoom, Quality: good) Shape Blur (Radius: 46 px; custom shape: Trademark sysmbol) Median (Radius: 1px) Polar Coordinates (Rectangular to Polar)
ABBYY FineReaderVersion: 10 Professional Build (10.0.102.82)
Read PDF save to Doc, Source: Political Economy (J. Broadhurst 1842) 111 Pages
3ds Max 2012
Version: 10 x64
Rendering Space Flyby Mentalray (SPECapc_3dsmax9), Frame: 248, Resolution: 1440 x 1080
Adobe Acrobat X Professional
PDF Document Creation (Print) from Microsoft PowerPoint 2010
SolidWorks 2010
PhotoView 360
Render 01-Lighter Explode.SLDASM (SolidMuse.com)
Image Output Resolution: 1920x1080, Render: Preview Quality “Good”, Final Render Quality “Best”
Visual Studio 2010
Compile Chrome project (1/31/2012) with devenv.com /build Release
Synthetic Benchmarks and Settings
PCMark 7Version: 1.0.4
3DMark 11
Version 1.0.3
SiSoftware Sandra 2012 SP4a
CPU Test=CPU Arithmetic/Multimedia, Memory Test=Bandwidth Benchmark, Cryptography, Cache Latency
8. Benchmark Results: 3DMark 11

Although it comes equipped with fewer shaders than the Llano-based A8-3850, AMD’s upcoming A10-5800K appears to serve up superior performance as a result of its more utilizable (hey, that’s actually a word) architecture and higher operating frequencies. Our early estimate grants the beefiest Trinity-based chip a 20% advantage in 3DMark 11.

The A8-5600K, on the other hand, is almost exactly as fast as the A8-3850, which might be a little disappointing for anyone assuming the step from A8-3850 to A8-5600K should yield better performance.

Expectedly, the A6 trails behind a ways. And although I hate to drag Ivy Bridge into this mainstream match-up between AMD APUs, in referencing back to my Core i7-3770K launch coverage, I did notice that my A8-3850 result was just one point away from the one I generated for today’s piece. More interesting, HD Graphics 4000 scored 769 points in the suite test. That’s lower than the dual-core A6.

At least from the standpoint of graphics performance, AMD seems to be in a good place.

9. Benchmark Results: Sandra 2012

We know that AMD isn't particularly fond of diagnostics like Sandra, which aren’t indicative of real-world alacrity. But it does help us analyze our results by exposing potential strengths and weaknesses.

Llano doesn’t have a lot of the ISA enhancements included in Trinity. However, its efficient architecture facilitates solid performance, despite a 2.9 GHz clock rate.

Support for AVX helps bolster Trinity’s floating-point performance, despite the fact that its two Piledriver modules share FP resources. The more substantial gain happens in integer throughput, which benefits from higher clock rates and four distinct cores.

Trinity includes acceleration for AES encryption and decryption, and the performance of that feature is closely tied to available memory bandwidth. Llano does not support those additional instructions, which is why it lands at the bottom of this chart for AES throughput.

Despite common data rates and timing settings, the Llano-based APU gets more out of its dual-channel DDR3 memory controller than Trinity. AMD’s newer design technically supports higher settings, though, meaning you should be able to get up to DDR3-2133 using one slot per channel.

10. Benchmark Results: Adobe CS5 And 6

Our well-threaded Photoshop CS6 benchmark definitely appreciates the quad-core APUs, favoring two Trinity-based chips over Llano. It’s only when one Piledriver module is stripped from the design that performance plummets.

This is one of those very real-world tests that AMD likes to talk about—applying filters to your work in Photoshop can clearly keep you waiting a while. And if you go the dual-core route, you’ll literally be waiting around twice as long for the task to finish up.

I’d hope that any serious video editor using Premiere Pro already knows the application’s GPU acceleration (enabled via CUDA) is the way to go. If not, though, more CPU cores are the way to go in this threaded app.

Once again we have a very real-world piece of software being used in a very practical way: rendering a finished project. Surely, AMD would agree that the dual-core 65 W APU wasn’t designed for this sort of workload, as it’s simply decimated.

The quad-core APUs do quite a bit better, and the Piledriver architecture easily leverages its clock rate advantage and improved IPC to maneuver around Llano.

But because I already drew one reference to how much better AMD’s integrated Radeon graphics are than Intel’s HD Graphics, I feel it’s equally important to point out Ivy Bridge’s superiority in x86-based workloads. The $210 Core i5-3550 gets this job done in less than half of the time.

Our After Effects workload doesn’t take nearly as long. However, it reflects a similar performance story. Two Trinity-based APU models outmode the A8-3850, while the dual-core A6 trails by a large margin.

11. Benchmark Results: Content Creation

3ds Max is known as a floating-point-intensive application, and it’s in this test that Piledriver’s shared resources hurt its performance relative to Llano. A8-3850 isn’t even the quickest previous-gen APU available, and it’s still faster than the unreleased A10-5800K in this real-world metric.

Fortunately, APUs aren’t being positioned as great solutions for workstations. Nevertheless, 3ds Max serves as a reminder that AMD’s newest architecture makes certain compromises that affect the behavior of some applications positively and others negatively.

The same goes for SolidWorks. Although the Trinity-based parts don’t trail by much, you wouldn’t expect a flagship A10 to lose out to last year’s A8. But when you’re pushing floating-point-heavy math, that’s the trade-off you’re going to see.

12. Benchmark Results: Productivity

The Trinity design earns a slim victory in our optical character recognition benchmark, so long as you’re looking at the quad-core implementation. Losing two cores (or a single Piledriver module) hampers performance in a serious way.

Llano assumes a lead in Fritz. Our audience likes to see Fritz included in our suite, but unlike some of our more applicable benchmarks, a loss in Fritz isn’t particularly concerning for us, given its fairly synthetic outcome.

In contrast, compiling Google Chrome in Visual Studio 2010 is the very definition of real-world. And while we can fairly easily dismiss the results from Fritz, the fact that the second-fastest Llano-based APU outmaneuvers the soon-to-be flagship A10 isn’t something you can argue away using references to heterogeneous computing. If you already own Llano, it's going to be hard to compel an upgrade based on results like these.

Another very real-world workload, printing a PowerPoint file to PDF format favors the Trinity-based desktop chips in a very big way. The fact that the A6 APU outmaneuvers the A8 is a good indication that this test is single-threaded, and that its Turbo Core feature is pushing performance up.

In any case, the tweaks made to Piledriver help lock down a commanding victory over Llano.

13. Benchmark Results: Media Encoding

Two Trinity-based chips manage to slip past Llano. The third, A6-5400K, lags behind as a result of its dual-core architecture. Overall, though, it’s a strong showing for the Piledriver-based Trinity APUs.

We see a similar story in HandBrake, where higher frequencies help the Piledriver architecture overcome Llano’s superior IPC throughput.

An integer-based single-threaded workload like Lame should make very effective use of the resources Trinity has to offer. And indeed, we see all three upcoming APUs blow past Llano. When you look back at our tests of FX-8150 (particularly its results on this page of our Core i7-3770K story), these new Piledriver-based APUs are actually cutting through this workload faster than Bulldozer (and indeed, Phenom II X6 1100T, which accelerates up to 3.7 GHz).

The same holds true in iTunes. Finally, AMD has forward progress on its hands in the x86-based testing, whereas lightly-threaded workloads were what previously embarrassed the Bulldozer design. Although we know from comparative testing that Intel is still going to put down better numbers in a metric like this, it’s at least good to see AMD delivering on its promises of better per-clock performance from Piledriver.

14. Benchmark Results: File Compression

This one’s all-new. Corel revamped the WinZip engine, better-optimizing it for multi-core processors (in addition to exposing support for OpenCL on AMD GPUs, which we'll test shortly). We’re certainly not fans of a company using an open standard to lock out competition. Further, we know that AMD was instrumental in exposing this functionality, and that’s the justification given for preventing Intel or Nvidia from benefiting from it.

Nevertheless, there’s a clear improvement from Trinity compared to Llano in this integer math-dominant benchmark. The dual-core A6 suffers, as we’d expect.

Our more familiar WinRAR test also favors Trinity’s four integer cores over Llano’s quad-core configuration.

You can really look at results like these, along with WinZip’s, and see how big of a difference there is between apps that lean more heavily on integer-based code versus floating-point math. When you broaden the comparison criteria to include Intel, Trinity doesn’t look as stellar in WinRAR. However, if you’re lining it up against Llano, the speed-up is certainly measurable.

The outcome in 7-Zip is much closer, though at least the two Trinity-based APUs wind up on top (narrow though the victory may be). 7-Zip is clearly very well-threaded. Switching from the quad-core A10 and A8 to the dual-core A6 decimates performance. AMD marketing can say a lot of things about good-enough performance, but when you’re waiting an extra three or four minutes for a simple, relatively small compression workload to wrap up, there is unquestionably something to be said for owning hardware that’s better than passable.

15. Benchmark Results: Batman: Arkham City

The preceding pages comprehensively previewed the performance of Trinity’s x86 cores in a number of single-/multi-threaded apps with a reliance on both integer- and floating-point-heavy code. We know definitively that this new APU design is faster than Llano in most workloads, but slower in a few (particularly threaded apps that tax the shared floating-point resources of each Piledriver module). But what about graphics?

Batman: Arkham City is our first outing with Trinity’s 3D component, and the results here are compelling indeed. At 1280x720, A10-5800K sees a greater-than 20% speed-up compared to A8-3850, the more efficient Radeon HD 7660D engine also enjoying the benefit of a higher frequency.

It’s only a bummer that 1680x1050 and 1920x1080 are too demanding for AMD’s highest-end implementation at High quality settings (sans DirectX 11 features).

16. Benchmark Results: World Of Warcraft: Cataclysm

More interested in playable frame rates, we dialed WoW back to the Good quality preset.

The result is relatively smooth performance all the way through 1920x1080 on AMD’s A10-5800K (running a full 25% faster than A8-3850, mind you).

Throughout testing, even the A8-5600K’s less powerful Radeon HD 7560D manages to beat the currently-available -3850 with Radeon HD 6550D graphics.

17. Benchmark Results: The Elder Scrolls V: Skyrim

Skyrim is somewhat of an enigma. It’s not a very graphically-demanding game, but it does express a propensity for fast processors. We’re only using the Medium quality preset here. But it’s really only at 1280x720 where the game is smooth enough to be considered playable—and that’s only counting the two higher-end Trinity-based APUs.

Stepping down to the Low quality setting yields significantly higher frame rates. That’s what I used in the Intel Core i7-3770K review. After that piece, though, I decided that it’s simply not worth sacrificing all semblance of quality just to get decent performance. Personally, if I can’t play a game and have it look decent, it’s time to upgrade. In this title, you might want to consider something discrete.

18. Benchmark Results: Diablo III

I’ve been playing a little too much Diablo, so it’s pretty easy to take an Inferno-geared Wizard into Normal mode and steam roll a benchmark run of my own creation.

The experience wasn’t enjoyable at 1920x1080 on any APU. The minimum frame rate dips are just too low. Playing at 1680x1050 was a little more viable. However, I had to turn everything down to Low quality just to average more than 40 FPS on the A10-5800K. Although that’s 16% faster than A8-3850, it’s really only at 1280x720 where performance is truly fluid.

At a suitably-low resolution, you’re getting as much as 30% more performance from A10-5800K than A8-3850. Surely, that’d be quite a bit of fun in an HTPC. Though, again, you’d find me using Dual Graphics, at least, if gaming alacrity was really a requirement.

19. Benchmark Results: OpenCL

Alright. We know that Piledriver represents a respectable improvement over Bulldozer, lending Trinity competitive performance versus its previous-generation Llano-based APUs. And we now know that the more efficient VLIW4 architecture, coupled with higher clock rates, translates into anywhere from 15 to 30%-higher frame rates in a number of mainstream games.

But AMD is trumpeting this message of heterogeneous computing—exploiting processing resources, wherever they may be, to maximize performance. We’ve been working on a series of stories with AMD to quantify the effects of open standards like DirectCompute and OpenCL in different software environments, but it remains a challenge to benchmark some of the applications currently being optimized to exploit the hardware AMD is developing.

We’ve really done video transcoding to death. Although we haven’t yet circled back to cover the quality implications of Intel’s second-gen Quick Sync implementation, Nvidia’s NVEnc, or AMD’s VCE, we know that Ivy Bridge’s fixed-function logic is some of the fastest we’ve tested. Moreover, we still haven’t seen VCE enabled in an optimized application (though UVD3 and VCE are fixed-function components of Trinity).

Short of titles like MediaConverter and MediaEspresso, we’ve been at a loss for incorporating productivity-oriented software into our benchmark suite. That’s starting to change more quickly, as companies like Adobe tie OpenCL support into their offerings. Perhaps the biggest win thus far for AMD is Corel’s WinZip 16.5. I mentioned a few pages ago that Corel is deliberately locking out Intel and Nvidia, and I don’t particularly approve of that. However, the compression utility is still immensely popular, making it a great example of how graphics hardware can be applied to a workload not previously associated with graphics.

I have FX-8150 in there so you can see how long it takes the eight-core chip to finish a workload that’s now supposedly optimized for parallelized hardware.

As you can see, though, enabling OpenCL acceleration has a huge impact on performance. What once took 2:11 on the A10-5800K only takes 1:28 when the APU’s Devastator graphics core contributes to the effort. That’s a 32.8% improvement, and likely what AMD is hoping to see across the board as software developers begin figuring out how much of their code can be sped up using graphics resources.

LuxMark, which centers on the SmallLuxGPU2 rendering engine, is another OpenCL-based measurement tool we’ve been using.

In it, we see an A10-5800K trailing a discrete GeForce GTS 450 graphics card in an FX-8150-based machine.

Remember, Trinity employs AMD’s VLIW4 architecture, not GCN, which bolsters compute performance substantially. As such, it’s not surprising to see the Llano-based A8-3850 outrun the A8-5600K with fewer shaders. The next-gen APU family, Kaveri, will employ GCN, though.

We also had plans to run Musemage 1.9—introduced to us in William Van Winkle’s most recent exploration of GPU-accelerated image editing apps. However, the software’s licensing scheme is such that, after three hardware changes, it is revoked. Paraken Technology, the company responsible for Musemage, sent up a handful of licenses to use, but I didn't have time to get everything set back up again. We do plan to test Musemage going forward, though.

20. Power

Power consumption is monitored throughout our testing. And because our benchmark suite is scripted, it’s easy to subject each APU to the same workload and evaluate consumption during the entire run.

Our power data is interesting, though I’d caution against taking it as gospel. The motherboard vendors we’ve talked to indicate that Turbo Core functionality isn’t working perfectly yet.

With that said, A8-3850, A10-5800K, and A8-5600K are all rated for 100 W, while the A6-5400K has a 65 W thermal design power.

We immediately see that the Llano-based A8 (the yellow line) doesn’t drop to as low of an idle power consumption number as a Trinity-based chip. Otherwise, the three 100 W APUs all appear to place relatively close to each other.

Run the averages, and you actually see the trio end up within 4 W of each other. A8-5600K averages 101 W total system power use, while the A10 lands at 105 W. The Llano-based APU is in between.

The little A6-5400K-based machine averages just 83 W of power consumption through our benchmark suite. But look at that green line. It takes so long to complete testing that power use over time ends up being worse than the quad-core chips.

21. Trinity On The Desktop: Already Announced, But Enthusiasts Must Wait

At this year’s Computex, AMD announced that Trinity-based APUs for the desktop are already shipping in machines from Acer, Asus, HP, and Lenovo. The chips just aren’t available in the channel yet. And as a result, we don’t yet know what any of these processors are going to cost. Not that it matters—this is a preview and we’re not here to pass judgment. Motherboard BIOS bugs are still being worked out and drivers are still not quite complete.

What we’re left with, then, are initial impressions.

Let’s start with the Piledriver architecture, which everyone is hoping will show up in a desktop-class CPU sooner than later. Our per-clock cycle testing suggests that the revised design, as it’s implemented on Trinity, is as much as 15% faster than Bulldozer. A quad-core Trinity-based chip will still trail a quad-core Llano APU if you hit it with a floating-point-heavy workload—but that’s to be expected, given that each of two Piledriver modules shares a floating-point unit. Fortunately for AMD, most of what we use to test taxes the architecture’s four integer cores.

A majority of our benchmarks favor Trinity over Llano thanks to IPC improvements and significantly higher clock rates. Piledriver still gives up significant instruction per cycle throughput compared to the older Stars design, but is better able to compensate than Bulldozer. The result, then, is modest x86 performance. It’s better than Bulldozer, but only a slight step up from what you get Llano. And that’s if we ignore the competition entirely. I didn’t have a appropriately-priced Intel chip to test, but just received a Core i3-2100 from Newegg that comes close to matching an A8-3870K’s price tag. Tests commence on that tonight.

How about Trinity’s built-in graphics component? Clearly, this is one of AMD’s greatest strengths. We know from our Core i7-3770K review that HD Graphics 4000 can’t even keep up with Llano. Pile on frame rates that are 20 to 25% higher than the first-gen APU and you have the prelude to a blowout favoring AMD's Trinity. Of course, there aren’t any Intel processors with HD Graphics 4000 selling where we’d expect to find these upcoming APUs, making HD Graphics 2000 or 3000 a more realistic comparison. We’ll see how that Core i3-2100 sizes up, but the results of our benchmarks are foregone.

Finally, how do CPU and GPU come together to enhance this second-generation effort in the way AMD suggests they should, if they do at all? That’s a question driven less by hardware implementation and more by execution in the ecosystem. Are there more applications available today able to leverage graphics processing power? Decidedly, yes. Is the number large enough that we’re able to pepper our suite with optimized titles? Unfortunately not. There are other titles out there, but benchmarking them isn’t always easy, though that’s something we’re working to address.

One of the most notable names in our list of metrics, WinZip, does benefit from acceleration by virtue of OpenCL, and we are able to gauge its performance. The speed-up seen in that benchmark is profound, particularly (and somewhat ironically) on the Llano-based APU.

Are We Heading Into A New Era?

I was around reviewing CPUs back when single-core processors started giving way to dual-core chips. Back then, nothing was optimized for threading aside from server-oriented apps. Software developers had to reorient themselves before multi-core desktop CPUs made sense. But it happened. Just look at how much of our suite favors the quad-core APUs over A6-5400K. Scaling clock rate indefinitely proved impossible, so AMD and Intel went wide instead.

The same re-orientation has to happen before the idea that you buy a graphics card for more than gaming is really true (I don’t think it is yet, despite AMD’s claims). That process is happening right now, though, and you can see the momentum building. We expect to see companies like CyberLink on the bleeding edge of technology because it gives them a competitive advantage amongst early adopters. Corel and Adobe aren’t there until it’s ready for prime time. And yet, here they are.

By the time AMD’s third-generation APU, Kaveri, is ready (2013, the company says), we’ll be looking at x86 cores based on Steamroller, the Graphics Core Next architecture, and HSA enhancements that allow the GPU to access CPU memory. Look at the difference between the software infrastructure between Llano’s introduction on the desktop one year ago and today’s preview. If that was just the tip of the iceberg, I can only imagine what top-tier developers will be doing with our graphics processors in another year’s time.

That’s a particularly long ways off for a channel looking at old stock of Llano and still unable to buy Trinity, though. We’ll just have to defer final judgment until AMD sees fit to start offering Trinity-based APUs to do-it-yourselfers. A few more months, we're hearing.

Follow Chris Angelini on Twitter