OpenCL In Action: Post-Processing Apps, Accelerated

Tom's Hardware Q&A With MotionDSP, Continued

TH: Not that long ago, we were seeing test results that showed how users got most of their performance pop simply from having GPU-based acceleration, not necessarily from dropping a small fortune on a top-end card with GPGPU support. Has that changed? Because with APUs, that support is essentially now built-in.

SV: What’s changed is companies like AMD now are shipping these Fusion chips that combine the GPU with the CPU on the same die. That makes a difference. You’re right—before, if you had a discrete GPU, you’d get the most bang for your buck just having the GPU there. But having that discrete GPU meant you were going to be buying a higher-end laptop or desktop. What’s cool about the Fusion chips is that you’ve got a pretty darn good integrated GPU in, say, the Llano platform. For a $500 or $600 price point, a consumer can get a laptop that seriously kicks ass with our software. Before, they would have had to buy a $1000 laptop with a discrete GPU to get the same level of performance.

NB: Let’s try to provide a different metric. A high-end, modern CPU from AMD or Intel will retail in the $200 to $300 range. You can outperform that CPU by a factor of 3x to 5x with a GPU that costs about the same. If you have a discrete GPU that costs about $300 and you run our software on that component, you’ll get a three to five times higher frame rate. Now let’s say you want to spend $1000 on a CPU. As I said, there are certain bottlenecks where massive multi-frame processing cannot happen on that CPU because of memory bandwidth and compute bottlenecks. You’re hitting the limits, and it really becomes irrelevant how much money you’re spending on the CPU. You may be hitting a threshold of six or seven frames per second and you can’t go any faster. That's just because you are hitting certain inherent limits on the CPU the GPU doesn't have.

MotionDSP's Nik BozinovicMotionDSP's Nik Bozinovic

SV: Nik and I are saying two different things. I’m saying on the consumer side, on a $500 or $600 laptop with Fusion chips like Llano, you’re getting the performance that recently cost $1000 in a laptop. And on the discrete side, he’s saying that for the same price, you’re going to get like five to eight times the performance per dollar from a GPU compared to spending that money on a CPU.

NB: If you go to standard def-video, that’s where CPUs in general fare better than GPUs. But everybody takes 720p or 1080p now with their Android phones and iPhones, so the difference between CPU and the GPU becomes much, much larger. As Sean said, an off-the-shelf laptop you buy for $600 is going to give you the performance of a $2000 desktop from before the heterogeneous computing era, just because you didn't have the power of parallel computing at your fingertips.

TH: We’ve got OpenCL in play from AMD, Nvidia, and Intel now. Nvidia is also still advocating CUDA as a closer-to-the-hardware solution, and Intel's approach seems to be running OpenCL on its CPU cores. As a developer, you obviously need to support as broadly as possible, but technically, how much of a difference do these rival strategies make?

NB: From an ISV perspective, we would like to have one standard that runs everywhere. CUDA, being a vendor-specific technology, is something less appealing to us than OpenCL at this time. Now, it’s a fact that CUDA was the first one out there. Nvidia's the incumbent. But we’ve honestly been impressed by the speed of development on OpenCL-based tools and the entire chain, from like SDK to tools to runtime and how things run in a driver. We’ve seen very significant commitment, and I’m talking from an ISV perspective, where things were not so pretty even 18 or 12 months ago. It was very hard or impossible at that time for us to promise to deliver a solid product using heterogeneous computing on OpenCL. That changed completely over the last year. Right now, AMD is the vendor behind OpenCL development and implementation. We don't know what Intel is going to achieve with Ivy Bridge, but ARM announced that it's going to dip its toes into OpenCL water.

Create a new thread in the US Reviews comments forum about this subject
This thread is closed for comments
40 comments
Comment from the forums
    Your comment
    Top Comments
  • amuffin
    Will there be an open cl vs cuda article comeing out anytime soon? :ange:
  • Anonymous
    Hmmm...how do I win a 7970 for OpenCl tasks?
  • Other Comments
  • DjEaZy
    ... OpenCL FTW!!!
  • amuffin
    Will there be an open cl vs cuda article comeing out anytime soon? :ange:
  • Anonymous
    Hmmm...how do I win a 7970 for OpenCl tasks?
  • deanjo
    DjEaZy... OpenCL FTW!!!


    Your welcome.

    --Apple
  • bit_user
    amuffinWill there be an open cl vs cuda article comeing out anytime soon?
    At the core, they are very similar. I'm sure that Nvidia's toolchain for CUDA and OpenCL share a common backend, at least. Any differences between versions of an app coded for CUDA vs OpenCL will have a lot more to do with the amount of effort spent by its developers optimizing it.
  • bit_user
    Fun fact: President of Khronos (the industry consortium behind OpenCL, OpenGL, etc.) & chair of its OpenCL working group is a Nvidia VP.

    Here's a document paralleling the similarities between CUDA and OpenCL (it's an OpenCL Jump Start Guide for existing CUDA developers):

    NVIDIA OpenCL JumpStart Guide


    I think they tried to make sure that OpenCL would fit their existing technologies, in order to give them an edge on delivering better support, sooner.
  • deanjo
    bit_userI think they tried to make sure that OpenCL would fit their existing technologies, in order to give them an edge on delivering better support, sooner.


    Well nvidia did work very closely with Apple during the development of openCL.
  • nevertell
    At last, an article to point to for people who love shoving a gtx 580 in the same box with a celeron.
  • JPForums
    In regards to testing the APU w/o discrete GPU you wrote:

    Quote:
    However, the performance chart tells the second half of the story. Pushing CPU usage down is great at 480p, where host processing and graphics working together manage real-time rendering of six effects. But at 1080p, the two subsystems are collaboratively stuck at 29% of real-time. That's less than half of what the Radeon HD 5870 was able to do matched up to AMD's APU. For serious compute workloads, the sheer complexity of a discrete GPU is undeniably superior.


    While the discrete GPU is superior, the architecture isn't all that different. I suspect, the larger issue in regards to performance was stated in the interview earlier:

    Quote:
    TH: Specifically, what aspects of your software wouldn’t be possible without GPU-based acceleration?

    NB: ...you are also solving a bandwidth bottleneck problem. ... It’s a very memory- or bandwidth-intensive problem to even a larger degree than it is a compute-bound problem. ... It’s almost an order of magnitude difference between the memory bandwidth on these two [CPU/GPU] devices.


    APUs may be bottlenecked simply because they have to share CPU level memory bandwidth.

    While the APU memory bandwidth will never approach a discrete card, I am curious to see whether overclocking memory to an APU will make a noticeable difference in performance. Intuition says that it will never approach a discrete card and given the low end compute performance, it may not make a difference at all. However, it would help to characterize the APUs performance balance a little better. I.E. Does it make sense to push more GPU muscle on an APU, or is the GPU portion constrained by the memory bandwidth?

    In any case, this is a great article. I look forward to the rest of the series.
  • Anonymous
    What about power consumption? It's fine if we can lower CPU load, but not that much if the total power consumption increase.
  • DjEaZy
    deanjoYour welcome.--Apple

    ... not just apple... ok, they started, but it's cross platform...
  • mayankleoboy1
    looking forward to this 9 part series
  • salgado18
    Ever since AMD announced the Fusion concept, I understood that is what they had in mind. And that's the reason I believe AMD is more in the right track than Intel, despite looking like the opposite is true. Just imagine if OpenCL is widely used, and look at the APU-only benchmarks versus the Sandy Bridge.

    Of course, Intel has the resources to play catch-up real quick, or, if they want, just buy nVidia. (the horror!)

    Really looking forward to the other parts of this article!
  • deanjo
    DjEaZy... not just apple... ok, they started, but it's cross platform...


    Umm, ya pretty much "just apple" from creation to the open standard proposal to the getting it of it accepted, to the influencing of the hardware vendors to support it. Apple designed it so that it would be crossplatform to begin with, that was kind of the whole idea behind it.
  • memadmax
    Since memory sharing seems to be a bottleneck. Why not incorporate two separate memory controllers each with their own lane to separate ram chips. Imagine being able to upgrade ur VRAM with a chip upgrade like back in the old days.
  • Anonymous
    Glad to see AMD hit it this time....
  • Th-z
    William, on page "Benchmark Results: ArcSoft Total Media Theatre SimHD". After enabling GPU acceleration, most actually have their CPU utilizations increased. It seems counter-intuitive, can you explain why?
  • tmk221
    And that is what APU should be about. Graphics cores should accelerate cpu cores. I just hope that more and more apps will take advantage of gpu cores.
  • razor512
    Please label the X axis on the graphs. The numbers do not mean much if we do not know what they are referring to.
  • bit_user
    JPForumsAPUs may be bottlenecked simply because they have to share CPU level memory bandwidth.
    Not just the sharing, but less overall.

    Quote:
    I am curious to see whether overclocking memory to an APU will make a noticeable difference in performance.
    I'm sure it would, in most cases. Memory usage often depends on the type of workload and the kinds of memory optimizations done by the developers. Since discrete GPUs typically have so much bandwidth, they will tend not to optimize for lower-bandwidth APUs. Furthermore, in most cases there's only so much a developer can do to work around memory bandwidth limitations.

    Memory bandwidth is the biggest drawback of APUs. It's the reason I don't see the GPU add-in card disappearing anytime soon. At least, not until the industry closes the gap between CPU and GPU memory speeds.