OpenCL In Action: Post-Processing Apps, Accelerated

Benchmark Results: vReveal On The A8-3850 With Discrete Graphics

Now we edge closer to the big question of the day: is AMD’s APU platform capable enough to tackle the same demanding tasks for which we usually recommend FX and Core i5/i7 processors? Let’s warm up by examining how the CPU component of the chip fares when backed by discrete graphics.

In this context, looking at our CPU-only software results should be the most telling. Sure enough, we see the non-accelerated 480p test shows that the FX-8150 enjoys 13% lower CPU utilization compared to the A8, while the 1080p clip lets the FX cruise around at 22% lower utilization. With an almost 70% load on the A8, we’d be reluctant to even attempt this operation while multitasking in the real world.

However, notice that utilization also leaps on the A8 even when GPU acceleration is turned on. Whereas the 1080p load on our FX-8150 and Radeon HD 5870 platform was 10%, it jumps to 38% on an A8-3850 and Radeon HD 5870. This is the most compelling evidence that, even in the world of OpenCL and hardware-based acceleration, it still takes a balanced combination of CPU and GPU to yield the best experience. Offloading a task doesn't mean it's completely alleviated. Clearly, MotionDSP still relies on a capable host processor for part of its workflow.

With only one effect active, our job still renders at full speed in all cases except 1080p in software mode. Despite the significant difference in CPU utilization, the net effect on render performance between our two CPUs is only 6%.

With six effects enabled, we again see the FX offer a slight utilization advantage over the A8 when running in software. But given that all numbers are at least 67%, we’d say the benefits of the FX here are nominal.

With GPU-assist turned on, all of our utilization percentages plummet. However, the A8 still shows twice as much utilization as the FX, predictably indicating that the tasks handled by each host processor hit AMD's A8 harder than its flagship FX.

Reinforcing the point, we see the A8 take a 25% render speed hit compared to the FX when relying on a Radeon HD 5870 for acceleration. This is interesting because, while the CPU utilization numbers seem unkind to our FX in a CPU comparison, these real-world numbers show that the FX can still yield a significant time savings in high-load scenarios.

No surprises here. The A8 APU incurs 15% to 20% higher utilization than the FX.

More important, while the FX-8150 and Radeon HD 7970 combination demonstrates a 3.5x CPU utilization benefit from enabling graphics acceleration at 1080p, the A8-3850 enjoys a less than 2x benefit. Again, we see the pitfall of putting too-powerful of a graphics card with a processor that can't always keep up. Balance is critical when you want to maximize performance, and it really takes a higher-end CPU to fully exploit what a potent graphics card offers.

The 1080p accelerated test shows our A8 running at 27% utilization with a Radeon HD 5870 plugged in. It jumps to 40% with a Radeon HD 7970 installed. And yet, when we look at render performance, the Radeon HD 7970 has a 24% advantage, managing near-real-time output. What's all of that mean?

It'd appear that the 7970 is doing more work than the 5870, in turn creating a larger workload for the A8. Although you end up seeing higher CPU utilization, the end result is an experience that's closer to ideal. That is to say, close to real-time rendering is possible here where it wasn't on the Radeon HD 5870.

Create a new thread in the US Reviews comments forum about this subject
This thread is closed for comments
40 comments
    Your comment
    Top Comments
  • Will there be an open cl vs cuda article comeing out anytime soon? :ange:
    13
  • Hmmm...how do I win a 7970 for OpenCl tasks?
    10
  • Other Comments
  • ... OpenCL FTW!!!
    5
  • Will there be an open cl vs cuda article comeing out anytime soon? :ange:
    13
  • Hmmm...how do I win a 7970 for OpenCl tasks?
    10
  • DjEaZy... OpenCL FTW!!!


    Your welcome.

    --Apple
    -5
  • amuffinWill there be an open cl vs cuda article comeing out anytime soon?
    At the core, they are very similar. I'm sure that Nvidia's toolchain for CUDA and OpenCL share a common backend, at least. Any differences between versions of an app coded for CUDA vs OpenCL will have a lot more to do with the amount of effort spent by its developers optimizing it.
    -1
  • Fun fact: President of Khronos (the industry consortium behind OpenCL, OpenGL, etc.) & chair of its OpenCL working group is a Nvidia VP.

    Here's a document paralleling the similarities between CUDA and OpenCL (it's an OpenCL Jump Start Guide for existing CUDA developers):

    NVIDIA OpenCL JumpStart Guide


    I think they tried to make sure that OpenCL would fit their existing technologies, in order to give them an edge on delivering better support, sooner.
    0
  • bit_userI think they tried to make sure that OpenCL would fit their existing technologies, in order to give them an edge on delivering better support, sooner.


    Well nvidia did work very closely with Apple during the development of openCL.
    0
  • At last, an article to point to for people who love shoving a gtx 580 in the same box with a celeron.
    1
  • In regards to testing the APU w/o discrete GPU you wrote:

    Quote:
    However, the performance chart tells the second half of the story. Pushing CPU usage down is great at 480p, where host processing and graphics working together manage real-time rendering of six effects. But at 1080p, the two subsystems are collaboratively stuck at 29% of real-time. That's less than half of what the Radeon HD 5870 was able to do matched up to AMD's APU. For serious compute workloads, the sheer complexity of a discrete GPU is undeniably superior.


    While the discrete GPU is superior, the architecture isn't all that different. I suspect, the larger issue in regards to performance was stated in the interview earlier:

    Quote:
    TH: Specifically, what aspects of your software wouldn’t be possible without GPU-based acceleration? NB: ...you are also solving a bandwidth bottleneck problem. ... It’s a very memory- or bandwidth-intensive problem to even a larger degree than it is a compute-bound problem. ... It’s almost an order of magnitude difference between the memory bandwidth on these two [CPU/GPU] devices.


    APUs may be bottlenecked simply because they have to share CPU level memory bandwidth.

    While the APU memory bandwidth will never approach a discrete card, I am curious to see whether overclocking memory to an APU will make a noticeable difference in performance. Intuition says that it will never approach a discrete card and given the low end compute performance, it may not make a difference at all. However, it would help to characterize the APUs performance balance a little better. I.E. Does it make sense to push more GPU muscle on an APU, or is the GPU portion constrained by the memory bandwidth?

    In any case, this is a great article. I look forward to the rest of the series.
    4
  • What about power consumption? It's fine if we can lower CPU load, but not that much if the total power consumption increase.
    1
  • deanjoYour welcome.--Apple

    ... not just apple... ok, they started, but it's cross platform...
    2
  • looking forward to this 9 part series
    4
  • Ever since AMD announced the Fusion concept, I understood that is what they had in mind. And that's the reason I believe AMD is more in the right track than Intel, despite looking like the opposite is true. Just imagine if OpenCL is widely used, and look at the APU-only benchmarks versus the Sandy Bridge.

    Of course, Intel has the resources to play catch-up real quick, or, if they want, just buy nVidia. (the horror!)

    Really looking forward to the other parts of this article!
    4
  • DjEaZy... not just apple... ok, they started, but it's cross platform...


    Umm, ya pretty much "just apple" from creation to the open standard proposal to the getting it of it accepted, to the influencing of the hardware vendors to support it. Apple designed it so that it would be crossplatform to begin with, that was kind of the whole idea behind it.
    3
  • Since memory sharing seems to be a bottleneck. Why not incorporate two separate memory controllers each with their own lane to separate ram chips. Imagine being able to upgrade ur VRAM with a chip upgrade like back in the old days.
    0
  • Glad to see AMD hit it this time....
    0
  • William, on page "Benchmark Results: ArcSoft Total Media Theatre SimHD". After enabling GPU acceleration, most actually have their CPU utilizations increased. It seems counter-intuitive, can you explain why?
    0
  • And that is what APU should be about. Graphics cores should accelerate cpu cores. I just hope that more and more apps will take advantage of gpu cores.
    0
  • Please label the X axis on the graphs. The numbers do not mean much if we do not know what they are referring to.
    0
  • JPForumsAPUs may be bottlenecked simply because they have to share CPU level memory bandwidth.
    Not just the sharing, but less overall.

    Quote:
    I am curious to see whether overclocking memory to an APU will make a noticeable difference in performance.
    I'm sure it would, in most cases. Memory usage often depends on the type of workload and the kinds of memory optimizations done by the developers. Since discrete GPUs typically have so much bandwidth, they will tend not to optimize for lower-bandwidth APUs. Furthermore, in most cases there's only so much a developer can do to work around memory bandwidth limitations.

    Memory bandwidth is the biggest drawback of APUs. It's the reason I don't see the GPU add-in card disappearing anytime soon. At least, not until the industry closes the gap between CPU and GPU memory speeds.
    0