Test Setup And Benchmarks
In planning this series, we asked ourselves what readers really need to know about exploiting DirectCompute/OpenCL acceleration. Is it difficult? No. The drivers enable functionality by default, and most applications able to leverage the improvements have what amounts to an on/off switch to either use the feature or not. We’re stumped as to why anyone would disable acceleration, but it does make our job of testing the features much easier.
Right out of the gate, we're testing two OpenCL-enabled post-processing applications: ArcSoft’s Total Media Theater (TMT) 5.2 (in pre-release at our time of testing) and MotionDSP’s vReveal.
The SimHD component of TMT now uses OpenCL and GPU-based processing to interpolate standard-definition video (480p) to near-HD levels (720p) in real-time. To test this, we played a DVD copy of Minority Report with (GPU) and without (CPU) OpenCL enabled. We ran in comparison split-screen mode with SD on the left half of the image and near-HD on the right half. ArcSoft provides four main features within SimHD—upscaling, dynamic lighting, denoise, and smoothness—but we only tested with the first three enabled and set to maximum. The smoothness option was not available when testing in a CPU-only processing scenario, so we omitted it to have consistency across our test parameters. Also, we only tested SimHD with AMD's Radeon HD 5870 because, as of our testing, Total Media Theater wouldn't cooperate with AMD's Radeon HD 7970. That's not surprising, considering we also had trouble getting GPU-accelerated compute working in CyberLink's Media Espresso in our AMD Radeon HD 7950 Review: Up Against GeForce GTX 580 coverage.
MotionDSP’s vReveal gained fame as one of the first and best consumer-oriented applications for fixing shaky video. The amount of processing required to pull this effect off in real-time is formidable, since several frames of video must be analyzed at once and, many features tracked and recompiled across those frames. At 1080p, this load can cripple some systems. Today, vReveal 3 also includes several additional features, including sharpening, brightening, and noise cleaning, all of which can run concurrently with stabilization. Ideally, you'd have all of this rendering in real-time.
In our tests, we used two sample video clips that ship with the downloadable binary of vReveal 3: the “Barcelona” file at 480p and the “San Francisco” file at 1080p. We tested these twice, once with only basic stabilization enabled and then in a more demanding configuration with five effects piled on. We then tested these in CPU-only, APU-accelerated, and two different discrete GPU-based configurations.
| Test Hardware | |
|---|---|
| Test System 1 | |
| Processor | AMD FX-8150 (Zambezi) 3.6 GHz, Socket AM3+, 8 MB Shared L3 Cache, Turbo Core enabled, 125 W |
| Motherboard | Asus Crosshair V Formula (Socket AM3+), AMD 990FX/SB950 |
| Memory | 8 GB (2 x 4 GB) AMD Performance Memory AE34G1609U2 (1600 MT/s, 8-9-8-24) |
| SSD | 240 GB Patriot Wildfire SATA 6Gb/s |
| Graphics | AMD Radeon HD 7970 3 GB |
| AMD Radeon HD 5870 1 GB | |
| Power Supply | PC Power & Cooling Turbo-Cool 860 W |
| Operating System | Windows 7 Professional, 64-bit |
| Test System 2 | |
| Processor | AMD A8-3850 (Llano) 2.9 GHz, Socket FM1, 4 MB L2 Cache, 100 W, Radeon HD 6550D Graphics |
| Motherboard | Gigabyte A75-UD4H (Socket FM1), AMD A75 FCH |
| Memory | 8 GB (2 x 4 GB) AMD Performance Memory AE34G1609U2 (1600 MT/s, 8-9-8-24) |
| SSD | 240 GB Patriot Wildfire SATA 6Gb/s |
| Graphics | AMD Radeon HD 7970 3 GB |
| AMD Radeon HD 5870 1 GB | |
| Power Supply | PC Power & Cooling Turbo-Cool 860 W |
| Operating System | Windows 7 Professional, 64-bit |
| Test System 3 | |
| Platform | Gateway NV55S05u |
| Processor | AMD A8-3500M (Llano), 1.5 GHz, Socket FS1, 4 MB L2 Cache, 35 W, Radeon HD 6620G Graphics |
| Memory | 4 GB Elpida PC3-10600S-9-10-F2 2 GB Hynix PC3-10600S-9-10-B1 |
| Hard Drive | Western Digital Scorpio Blue 640 GB, 5400 RPM, 8 MB Cache, SATA 3Gb/s |
| Operating System | Windows 7 Home Premium, 64-bit |
| Test System 4 | |
| Platform | HP Pavillion dv6 |
| Processor | Intel Core i5-2410M (Sandy Bridge), 2.3 GHz, Socket G2, 3 MB Shared L3 Cache, 35 W, HD Graphics 3000 |
| Memory | 4 GB Samsung PC3-10600S-09-10-ZZZ |
| Hard Drive | Seagate Momentus 7200.4 500 GB, 7200 RPM, 16 MB Cache, SATA 3Gb/s |
| Operating System | Windows 7 Professional, 64-bit |
- 1 / 2
- Next
-
Latest Graphics Cards News
- 05/20 – Qualcomm Hires Former AMD CTO Demers
- 05/18 – Nvidia Responds to GeForce 600 Series V-Sync Stuttering Issue
- 05/17 – Behold: Here Are The First Unreal Engine 4 Screens
- 05/17 – Nvidia Debuts GK110-based 7.1 Billion Transistor Super GPU
- 05/17 – GE Announces Kepler Graphics Card for Military and Aviation
Latest Graphics Cards reviews
- 05/10 – GeForce GTX 670 2 GB Review: Is It Already Time To Forget GTX...
- 05/09 – Best Graphics Cards For The Money: May 2012
- 05/09 – Video Teaser: Radeon HD 7800 Series Remains A Solid Value
- 05/08 – Benchmarking AMD's 768-Shader Pitcairn: Not For Public...
- 05/03 – GeForce GTX 690 Review: Testing Nvidia's Sexiest...

... OpenCL FTW!!!
Will there be an open cl vs cuda article comeing out anytime soon?
Hmmm...how do I win a 7970 for OpenCl tasks?
... OpenCL FTW!!!
Your welcome.
--Apple
Will there be an open cl vs cuda article comeing out anytime soon?
At the core, they are very similar. I'm sure that Nvidia's toolchain for CUDA and OpenCL share a common backend, at least. Any differences between versions of an app coded for CUDA vs OpenCL will have a lot more to do with the amount of effort spent by its developers optimizing it.
Fun fact: President of Khronos (the industry consortium behind OpenCL, OpenGL, etc.) & chair of its OpenCL working group is a Nvidia VP.
Here's a document paralleling the similarities between CUDA and OpenCL (it's an OpenCL Jump Start Guide for existing CUDA developers):
NVIDIA OpenCL JumpStart Guide
I think they tried to make sure that OpenCL would fit their existing technologies, in order to give them an edge on delivering better support, sooner.
I think they tried to make sure that OpenCL would fit their existing technologies, in order to give them an edge on delivering better support, sooner.
Well nvidia did work very closely with Apple during the development of openCL.
At last, an article to point to for people who love shoving a gtx 580 in the same box with a celeron.
In regards to testing the APU w/o discrete GPU you wrote:
While the discrete GPU is superior, the architecture isn't all that different. I suspect, the larger issue in regards to performance was stated in the interview earlier:
NB: ...you are also solving a bandwidth bottleneck problem. ... It’s a very memory- or bandwidth-intensive problem to even a larger degree than it is a compute-bound problem. ... It’s almost an order of magnitude difference between the memory bandwidth on these two [CPU/GPU] devices.
APUs may be bottlenecked simply because they have to share CPU level memory bandwidth.
While the APU memory bandwidth will never approach a discrete card, I am curious to see whether overclocking memory to an APU will make a noticeable difference in performance. Intuition says that it will never approach a discrete card and given the low end compute performance, it may not make a difference at all. However, it would help to characterize the APUs performance balance a little better. I.E. Does it make sense to push more GPU muscle on an APU, or is the GPU portion constrained by the memory bandwidth?
In any case, this is a great article. I look forward to the rest of the series.
What about power consumption? It's fine if we can lower CPU load, but not that much if the total power consumption increase.
Your welcome.--Apple
... not just apple... ok, they started, but it's cross platform...
looking forward to this 9 part series
Ever since AMD announced the Fusion concept, I understood that is what they had in mind. And that's the reason I believe AMD is more in the right track than Intel, despite looking like the opposite is true. Just imagine if OpenCL is widely used, and look at the APU-only benchmarks versus the Sandy Bridge.
Of course, Intel has the resources to play catch-up real quick, or, if they want, just buy nVidia. (the horror!)
Really looking forward to the other parts of this article!
... not just apple... ok, they started, but it's cross platform...
Umm, ya pretty much "just apple" from creation to the open standard proposal to the getting it of it accepted, to the influencing of the hardware vendors to support it. Apple designed it so that it would be crossplatform to begin with, that was kind of the whole idea behind it.
Since memory sharing seems to be a bottleneck. Why not incorporate two separate memory controllers each with their own lane to separate ram chips. Imagine being able to upgrade ur VRAM with a chip upgrade like back in the old days.
Glad to see AMD hit it this time....
William, on page "Benchmark Results: ArcSoft Total Media Theatre SimHD". After enabling GPU acceleration, most actually have their CPU utilizations increased. It seems counter-intuitive, can you explain why?
And that is what APU should be about. Graphics cores should accelerate cpu cores. I just hope that more and more apps will take advantage of gpu cores.
Please label the X axis on the graphs. The numbers do not mean much if we do not know what they are referring to.
APUs may be bottlenecked simply because they have to share CPU level memory bandwidth.
Not just the sharing, but less overall.
Memory bandwidth is the biggest drawback of APUs. It's the reason I don't see the GPU add-in card disappearing anytime soon. At least, not until the industry closes the gap between CPU and GPU memory speeds.