Full Blu-ray Transcoding Speed: APP Versus CUDA Versus Quick Sync
We now know that there are clear differences between hardware-accelerated decoders, and even software-based decoders. But what about encoding? That's what started this foray into image quality after all.
Header Cell - Column 0 | AMD Radeon HD 6970 | Nvidia GeForce GTX 580 |
---|---|---|
Manufacturing Process | 40 nm TSMC | 40 nm TSMC |
Die Size | 389 mm² | 520 mm² |
Transistors | 2.64 billion | 3 billion |
Engine Clock | 880 MHz | 772 MHz |
Stream Processors / CUDA Cores | 1536 | 512 |
Compute Performance | 2.7 TFLOPS | 1.58 TFLOPS |
Texture Units | 96 | 64 |
Texture Fillrate | 84.5 Gtex/s | 49.4 Gtex/s |
ROPs | 32 | 48 |
Pixel Fillrate | 28.2 Gpix/s | 37.1 Gpix/s |
Frame Buffer | 2 GB GDDR5 | 1.5 GB GDDR5 |
Memory Clock | 1375 MHz | 1002 MHz |
Memory Bandwidth | 176 GB/s (256-bit) | 192 GB/s (384-bit) |
Maximum Board Power | 250 W | 244 W |
We're using the best consumer cards money can buy, AMD's Radeon HD 6970 and Nvidia's GeForce GTX 580.
Full BDAV, 31.2 GB H.264 BDAVHH:MM:SS | AMD | Nvidia | Intel Performance | Intel Quality |
---|---|---|---|---|
Hardware Decode & Hardware/GPGPU Encode | 1:24:00 | 0:49:34 | 0:19:35 | 0:23:22 |
Hardware Decode & Software Encode | 0:47:55 | 0:49:38 | 0:35:21 | 0:46:13 |
Software Decode & GPGPU/Hardware Encode | 1:01:17 | 0:50:21 | 0:48:17 | 0:48:41 |
Software Decode & Encode | 1:04:26 | 1:04:22 | 0:55:38 | 1:05:20 |
When it comes to transcoding entire videos, MediaEspresso was the only program that would accept our full 31.2 GB unprotected Blu-ray Iron Man movie. In MediaConverter 7 and Badaboom, we were prompted with audio codec errors, as neither software recognizes TrueHD. Separately, it is important to point out that if you use Quick Sync, you are forced to chose a Performance or Quality setting. This is unavailable if you are running a Nvidia- or AMD-based card.
According to our benchmarks, the biggest bottleneck really occurs at the decode stage. If you enable APP or CUDA encoding, there is a small gain (more so for CUDA), but the biggest benefit is when you turn on hardware-accelerated decoding. Enabling APP encoding and UVD 3 on the Radeon card actually appears to be the worst thing you can do for performance. Every other combination of settings is faster, including software-only. With CUDA, we get a measly four-second gain with both hardware settings enabled (versus PureVideo-only).
Intel's Quick Sync hardware demonstrates much more impressive numbers. However, the quality setting yields much less aggressive scaling. What we're seeing there is the effect of using a lower bitrate for software-based encoding.
665 MB H.264 BDAV/M2TS Transcode, MM:SS | ||
---|---|---|
AMD Radeon HD 6970 | ||
Transcode Application | MediaEspresso | MediaConverter |
Hardware Decode & APP Encode | 2:29 | 1:40 |
Hardware Decode & Software Encode | 2:28 | - |
Software Decode & APP Encode | 1:57 | - |
Software Decode & Encode | 2:41 | 1:22 |
Nvidia GeForce GTX 580 | ||
---|---|---|
Transcode Application | MediaEspresso | MediaConverter |
Hardware Decode & CUDA Encode | 1:37 | 1:06 |
Hardware Decode & Software Encode | 1:50 | - |
Software Decode & CUDA Encode | 2:02 | - |
Software Decode & Encode | 2:41 | 1:22 |
Intel HD Graphics 3000 (Core i5-2500K) | |||
---|---|---|---|
Transcoding Application | MediaEspresso Performance | MediaEspresso Quality | MediaConverter |
Hardware Decode & Quick Sync Encode | 0:46 | 0:56 | 1:09 |
Hardware Decode & Software Encode | 1:26 | 2:22 | - |
Software Decode & Quick Sync Encode | 2:10 | 2:07 | - |
Software Decode & Encode | 2:10 | 2:43 | 1:24 |
Even though our 665 MB H.264/AC3 BDAV clip is the same bitrate as our 31.2 GB movie, we see hardware-accelerated encoding yielding much better results on the Radeon HD 6970 and GeForce GTX 580. Intel's Quick Sync-enabled Core i5-2500K only sees a substantial gain using the quality setting and encode acceleration.
Note that ArcSoft's MediaConverter either uses hardware- or software-based transcoding. It doesn't offer the granular control enabled by CyberLink. Overall, we see much faster transcode times with MediaConverter, though. If we just go by the numbers, it appears that MediaConverter is better optimized for multithreaded performance as well. We are talking about more than a full minute improvement over the times seen in MediaEspresso.
Stay On the Cutting Edge: Get the Tom's Hardware Newsletter
Get Tom's Hardware's best news and in-depth reviews, straight to your inbox.
Using this smaller clip, AMD's APP is finally faster than the pure software route, but not by much. Unfortunately, using APP in MediaConverter is still slower than using the CPU-only option. The suggestion here is that you'd see better results if you were stuck on an older machine with a slower processor and modern graphics card. The new Core i5-2500K is simply too fast.
Overall, Quick Sync soundly trounces CUDA and APP for encoding and decoding performance. But the differences between AMD and Nvidia are less apparent given such mixed results. Is this simply the result of using high bitrate Blu-ray source files?
Current page: Full Blu-ray Transcoding Speed: APP Versus CUDA Versus Quick Sync
Prev Page Software Decoding: All CPU, All the Time Next Page Small Clip Transcoding Speed: APP Versus CUDA Versus Quick Sync-
spoiled1 Tom,Reply
You have been around for over a decade, and you still haven't figured out the basics of web interfaces.
When I want to open an image in a new tab using Ctrl+Click, that's what I want to do, I do not want to move away from my current page.
Please fix your links.
Thanks -
spammit omgf, ^^^this^^^.Reply
I signed up just to agree with this. I've been reading this site for over 5 years and I have hoped and hoped that this site would change to accommodate the user, but, clearly, that's not going to happen. Not to mention all the spelling and grammar mistakes in the recent year. (Don't know about this article, didn't read it all).
I didn't even finish reading the article and looking at the comparisons because of the problem sploiled1 mentioned. I don't want to click on a single image 4 times to see it fullsize, and I certainly don't want to do it 4 times (mind you, you'd have to open the article 4 separate times) in order to compare the images side by side (alt-tab, etc).
Just abysmal. -
cpy THW have worst image presentation ever, you can't even load multiple images so you can compare them in different tabs, could you do direct links to images instead of this bad design?Reply -
ProDigit10 I would say not long from here we'll see encoders doing video parallel encoding by loading pieces between keyframes. keyframes are tiny jpegs inserted in a movie preferably when a scenery change happens that is greater than what a motion codec would be able to morph the existing screen into.Reply
The data between keyframes can easily be encoded in a parallel pipeline or thread of a cpu or gpu.
Even on mobile platforms integrated graphics have more than 4 shader units, so I suspect even on mobile graphics cards you could run as much as 8 or more threads on encoding (depending on the gpu, between 400 and 800 Mhz), that would be equal to encoding a single thread video at the speed of a cpu encoding with speed of 1,6-6,4GHz, not to mention the laptop or mobile device still has at least one extra thread on the CPU to run the program, and operating system, as well as arrange the threads and be responsible for the reading and writing of data, while the other thread(s) of a CPU could help out the gpu in encoding video.
The only issue here would be B-frames, but for fast encoding video you could give up 5-15MB video on a 700MB file due to no B-frame support, if it could save you time by processing threads in parallel. -
intelx first thanks for the article i been looking for this, but your gallery really sucks, i mean it takes me good 5 mins just to get 3 pics next to each other to compare , the gallery should be updated to something else for fast viewing.Reply -
_Pez_ Ups ! for tom's hardware's web page :P, Fix your links. :) !. And I agree with them; spoiled1 and spammit.Reply -
AppleBlowsDonkeyBalls I agree. Tom's needs to figure out how to properly make images accessible to the readers.Reply -
kikireeki spoiled1Tom, You have been around for over a decade, and you still haven't figured out the basics of web interfaces.When I want to open an image in a new tab using Ctrl+Click, that's what I want to do, I do not want to move away from my current page.Please fix your links.ThanksReply
and to make things even worse, the new page will show you the picture with the same thumbnail size and you have to click on it again to see the full image size, brilliant! -
acku Apologies to all. There are things I can control in the presentation of an article and things that I cannot, but everyone here has given fair criticism. I agree that right click and opening to a new window is an important feature for articles on image quality. I'll make sure Chris continues to push the subject with the right people.Reply
Web dev is a separate department, so we have no ability to influence the speed at which a feature is implemented. In the meantime, I've uploaded all the pictures to ZumoDrive. It's packed as a single download. http://www.zumodrive.com/share/anjfN2YwMW
Remember to view pictures in the native resolution to avoid scalers.
Cheers
Andrew Ku
TomsHardware.com -
Reynod An excellent read though Andrew.Reply
Please give us an update in a few months to see if there has been any noticeable improvements ... keep your base files for reference.
I would imagine Quicksynch is now a major plus for those interested in rendering ... and AMD and NVidia have some work to do.
I appreciate the time and effort you put into the research and the depth of the article.
Thanks,
:)