Sign in with
Sign up | Sign in

Full Blu-ray Transcoding Speed: APP Versus CUDA Versus Quick Sync

Video Transcoding Examined: AMD, Intel, And Nvidia In-Depth
By

We now know that there are clear differences between hardware-accelerated decoders, and even software-based decoders. But what about encoding? That's what started this foray into image quality after all.


AMD Radeon HD 6970
Nvidia GeForce GTX 580
Manufacturing Process
40 nm TSMC
40 nm TSMC
Die Size
389 mm²520 mm²
Transistors
2.64 billion
3 billion
Engine Clock
880 MHz
772 MHz
Stream Processors / CUDA Cores
1536
512
Compute Performance
2.7 TFLOPS
1.58 TFLOPS
Texture Units
96
64
Texture Fillrate
84.5 Gtex/s
49.4 Gtex/s
ROPs
32
48
Pixel Fillrate
28.2 Gpix/s
37.1 Gpix/s
Frame Buffer
2 GB GDDR5
1.5 GB GDDR5
Memory Clock
1375 MHz
1002 MHz
Memory Bandwidth
176 GB/s (256-bit)
192 GB/s (384-bit)
Maximum Board Power
250 W
244 W


We're using the best consumer cards money can buy, AMD's Radeon HD 6970 and Nvidia's GeForce GTX 580.

Full BDAV, 31.2 GB H.264 BDAV
HH:MM:SS
AMD Nvidia Intel Performance Intel Quality
Hardware Decode & Hardware/GPGPU Encode1:24:000:49:340:19:350:23:22
Hardware Decode & Software Encode
0:47:550:49:380:35:210:46:13
Software Decode & GPGPU/Hardware Encode
1:01:170:50:210:48:170:48:41
Software Decode & Encode
1:04:261:04:220:55:381:05:20


When it comes to transcoding entire videos, MediaEspresso was the only program that would accept our full 31.2 GB unprotected Blu-ray Iron Man movie. In MediaConverter 7 and Badaboom, we were prompted with audio codec errors, as neither software recognizes TrueHD. Separately, it is important to point out that if you use Quick Sync, you are forced to chose a Performance or Quality setting. This is unavailable if you are running a Nvidia- or AMD-based card.

According to our benchmarks, the biggest bottleneck really occurs at the decode stage. If you enable APP or CUDA encoding, there is a small gain (more so for CUDA), but the biggest benefit is when you turn on hardware-accelerated decoding. Enabling APP encoding and UVD 3 on the Radeon card actually appears to be the worst thing you can do for performance. Every other combination of settings is faster, including software-only. With CUDA, we get a measly four-second gain with both hardware settings enabled (versus PureVideo-only).

Intel's Quick Sync hardware demonstrates much more impressive numbers. However, the quality setting yields much less aggressive scaling. What we're seeing there is the effect of using a lower bitrate for software-based encoding.

665 MB H.264 BDAV/M2TS Transcode, MM:SS
AMD Radeon HD 6970
Transcode Application
MediaEspresso
MediaConverter
Hardware Decode & APP Encode
2:29
1:40
Hardware Decode & Software Encode
2:28
-
Software Decode & APP Encode
1:57
-
Software Decode & Encode
2:41
1:22
Nvidia GeForce GTX 580
Transcode Application
MediaEspressoMediaConverter
Hardware Decode & CUDA Encode1:37
1:06
Hardware Decode & Software Encode1:50
-
Software Decode & CUDA Encode2:02
-
Software Decode & Encode2:41
1:22
Intel HD Graphics 3000 (Core i5-2500K)
Transcoding Application
MediaEspresso Performance
MediaEspresso Quality
MediaConverter
Hardware Decode & Quick Sync Encode0:46
0:56
1:09
Hardware Decode & Software Encode1:26
2:22
-
Software Decode & Quick Sync Encode2:10
2:07
-
Software Decode & Encode2:10
2:43
1:24


Even though our 665 MB H.264/AC3 BDAV clip is the same bitrate as our 31.2 GB movie, we see hardware-accelerated encoding yielding much better results on the Radeon HD 6970 and GeForce GTX 580. Intel's Quick Sync-enabled Core i5-2500K only sees a substantial gain using the quality setting and encode acceleration.

Note that ArcSoft's MediaConverter either uses hardware- or software-based transcoding. It doesn't offer the granular control enabled by CyberLink. Overall, we see much faster transcode times with MediaConverter, though. If we just go by the numbers, it appears that MediaConverter is better optimized for multithreaded performance as well. We are talking about more than a full minute improvement over the times seen in MediaEspresso.

Using this smaller clip, AMD's APP is finally faster than the pure software route, but not by much. Unfortunately, using APP in MediaConverter is still slower than using the CPU-only option. The suggestion here is that you'd see better results if you were stuck on an older machine with a slower processor and modern graphics card. The new Core i5-2500K is simply too fast.

Overall, Quick Sync soundly trounces CUDA and APP for encoding and decoding performance. But the differences between AMD and Nvidia are less apparent given such mixed results. Is this simply the result of using high bitrate Blu-ray source files?

Display all 52 comments.
This thread is closed for comments
Top Comments
  • 28 Hide
    spoiled1 , February 7, 2011 3:39 AM
    Tom,
    You have been around for over a decade, and you still haven't figured out the basics of web interfaces.

    When I want to open an image in a new tab using Ctrl+Click, that's what I want to do, I do not want to move away from my current page.

    Please fix your links.
    Thanks
  • 19 Hide
    spammit , February 7, 2011 4:11 AM
    omgf, ^^^this^^^.

    I signed up just to agree with this. I've been reading this site for over 5 years and I have hoped and hoped that this site would change to accommodate the user, but, clearly, that's not going to happen. Not to mention all the spelling and grammar mistakes in the recent year. (Don't know about this article, didn't read it all).

    I didn't even finish reading the article and looking at the comparisons because of the problem sploiled1 mentioned. I don't want to click on a single image 4 times to see it fullsize, and I certainly don't want to do it 4 times (mind you, you'd have to open the article 4 separate times) in order to compare the images side by side (alt-tab, etc).

    Just abysmal.
  • 17 Hide
    cpy , February 7, 2011 4:30 AM
    THW have worst image presentation ever, you can't even load multiple images so you can compare them in different tabs, could you do direct links to images instead of this bad design?
Other Comments
  • 28 Hide
    spoiled1 , February 7, 2011 3:39 AM
    Tom,
    You have been around for over a decade, and you still haven't figured out the basics of web interfaces.

    When I want to open an image in a new tab using Ctrl+Click, that's what I want to do, I do not want to move away from my current page.

    Please fix your links.
    Thanks
  • 19 Hide
    spammit , February 7, 2011 4:11 AM
    omgf, ^^^this^^^.

    I signed up just to agree with this. I've been reading this site for over 5 years and I have hoped and hoped that this site would change to accommodate the user, but, clearly, that's not going to happen. Not to mention all the spelling and grammar mistakes in the recent year. (Don't know about this article, didn't read it all).

    I didn't even finish reading the article and looking at the comparisons because of the problem sploiled1 mentioned. I don't want to click on a single image 4 times to see it fullsize, and I certainly don't want to do it 4 times (mind you, you'd have to open the article 4 separate times) in order to compare the images side by side (alt-tab, etc).

    Just abysmal.
  • 17 Hide
    cpy , February 7, 2011 4:30 AM
    THW have worst image presentation ever, you can't even load multiple images so you can compare them in different tabs, could you do direct links to images instead of this bad design?
  • 4 Hide
    ProDigit10 , February 7, 2011 4:53 AM
    I would say not long from here we'll see encoders doing video parallel encoding by loading pieces between keyframes. keyframes are tiny jpegs inserted in a movie preferably when a scenery change happens that is greater than what a motion codec would be able to morph the existing screen into.
    The data between keyframes can easily be encoded in a parallel pipeline or thread of a cpu or gpu.
    Even on mobile platforms integrated graphics have more than 4 shader units, so I suspect even on mobile graphics cards you could run as much as 8 or more threads on encoding (depending on the gpu, between 400 and 800 Mhz), that would be equal to encoding a single thread video at the speed of a cpu encoding with speed of 1,6-6,4GHz, not to mention the laptop or mobile device still has at least one extra thread on the CPU to run the program, and operating system, as well as arrange the threads and be responsible for the reading and writing of data, while the other thread(s) of a CPU could help out the gpu in encoding video.

    The only issue here would be B-frames, but for fast encoding video you could give up 5-15MB video on a 700MB file due to no B-frame support, if it could save you time by processing threads in parallel.
  • 7 Hide
    intelx , February 7, 2011 6:04 AM
    first thanks for the article i been looking for this, but your gallery really sucks, i mean it takes me good 5 mins just to get 3 pics next to each other to compare , the gallery should be updated to something else for fast viewing.
  • 7 Hide
    _Pez_ , February 7, 2011 6:09 AM
    Ups ! for tom's hardware's web page :p , Fix your links. :)  !. And I agree with them; spoiled1 and spammit.
  • 8 Hide
    AppleBlowsDonkeyBalls , February 7, 2011 6:12 AM
    I agree. Tom's needs to figure out how to properly make images accessible to the readers.
  • 7 Hide
    kikireeki , February 7, 2011 9:49 AM
    spoiled1Tom, You have been around for over a decade, and you still haven't figured out the basics of web interfaces.When I want to open an image in a new tab using Ctrl+Click, that's what I want to do, I do not want to move away from my current page.Please fix your links.Thanks


    and to make things even worse, the new page will show you the picture with the same thumbnail size and you have to click on it again to see the full image size, brilliant!
  • 6 Hide
    acku , February 7, 2011 10:31 AM
    Apologies to all. There are things I can control in the presentation of an article and things that I cannot, but everyone here has given fair criticism. I agree that right click and opening to a new window is an important feature for articles on image quality. I'll make sure Chris continues to push the subject with the right people.

    Web dev is a separate department, so we have no ability to influence the speed at which a feature is implemented. In the meantime, I've uploaded all the pictures to ZumoDrive. It's packed as a single download. http://www.zumodrive.com/share/anjfN2YwMW

    Remember to view pictures in the native resolution to avoid scalers.

    Cheers
    Andrew Ku
    TomsHardware.com
  • 4 Hide
    Reynod , February 7, 2011 10:41 AM
    An excellent read though Andrew.

    Please give us an update in a few months to see if there has been any noticeable improvements ... keep your base files for reference.

    I would imagine Quicksynch is now a major plus for those interested in rendering ... and AMD and NVidia have some work to do.

    I appreciate the time and effort you put into the research and the depth of the article.

    Thanks,

    :) 
  • -1 Hide
    acku , February 7, 2011 10:54 AM
    Quote:
    An excellent read though Andrew.

    Please give us an update in a few months to see if there has been any noticeable improvements ... keep your base files for reference.

    I would imagine Quicksynch is now a major plus for those interested in rendering ... and AMD and NVidia have some work to do.

    I appreciate the time and effort you put into the research and the depth of the article.

    Thanks,

    :) 


    Will do, but I think overall this article sums up everything in a way that it's relavant for months to come. (Well, it's my hope it did anyways). "In a worst-case scenario, hardware acceleration gives you 75% of the quality and a minor speed up versus processor-only transcoding. In a best-case scenario, you are getting 99% of the quality, and running up to 400% faster than a processor working on its own." The difference is that in a few months, the worse case will likely be up to 80%, 90%, or even 99%.

    There is always going to be some sort of trade off, and for the majority of us, 99% quality preservation at 4x the speed is well worth the benefit. The problem is that there is virtually no way to compare transcoding software or even GPGPU hardware (or software) without introducing new variables to testing. You need to accept all the variables and treat the problem like a puzzle grid.

    I would add there is so much more to image quality than what we talked about. We didn't even discuss LCD hardware or colorspace. I think this article changes the game a bit. I think we have gotten so use to seeing tearing, blocking, or some video artifact and then we simply blame the video encoder without a second thought.

    If you read many of the sandy bridge articles on the web, people were simply saying "that video looks fuzzy" in very specific cases and then labeled Quick Sync or CUDA poor at transcoding as a result. While the video they saw was fuzzy, that doesn't automatically make it a transcoding error. It could have been a renderer or decoder problem. For example, if bitrate dropped off suddenly, its possible that a specific decoder wasn't cable of keeping up. This was a major point we were trying to make. Those automatic claims are invalid if they didn't cross check the problem to isolate decoders and renderers.

    Hell, you can't even rely on the same trancode path. If you rerun a trancode, the randomness (due to parallelism) can cause an visible error you didn't see in the first transcode, even if you use the same hardware and software config

    Cheers,
    Andrew Ku
    TomsHardware.com
  • -4 Hide
    Miharu , February 7, 2011 11:34 AM
    Hi Toms,
    Before you write this article I had never hear about all of 3 softwares you talking about.
    I figure out you talk about new software supporting iPhone.

    New softwares... who they're probably no optimized for all solution.
    So I just imagine you didn't thinked about this before write this article.

    Comeback with x264 and MediaConcept H.264 analyst and benchmark. Perhaps I'll read you this time.
  • 1 Hide
    acku , February 7, 2011 11:43 AM
    Quote:
    Hi Toms,
    Before you write this article I had never hear about all of 3 softwares you talking about.
    I figure out you talk about new software supporting iPhone.

    New softwares... who they're probably no optimized for all solution.
    So I just imagine you didn't thinked about this before write this article.

    Comeback with x264 and MediaConcept H.264 analyst and benchmark. Perhaps I'll read you this time.


    When it comes to GPGPU transcoding, these are the three software titles that are at the forefront. MediaConcept only recently finished a CUDA encoder in August. Elemental coded its own back in 2008. They were the first and they are just as valid as MediaConcept. If you follow insider industry news (like streamingmedia.com - read by people that create video for the masses like Hulu's Eric Feng), then you know that Elemental's software is used by ABC, Big Ten Network, CBS Interactive, National Geographic and PBS. Hell MainConcept's Quick Sync encoder is still in beta as of this month. http://www.mainconcept.com/press/single-view/article/updated-mainconceptTM-h264avc-encoder-sdk-for-intelR-quick-sync-video.html Arcsoft and Cyberlink were Intel's launch partners to demo Quick Sync, read any of the Sandy reviews.

    Cheers,
    Andrew Ku
    TomsHardware

  • 0 Hide
    Anonymous , February 7, 2011 12:07 PM
    Thanks for the work put into the article, since I'm very new to all this however, I think it may have gone over my head :) 

    I am in the market for a new 'budget pc' and leaning toward an intel i5-2500k with an nVidia gts450 gfx card, the system should be aimed at producing great video quality at reasonable speed.

    I'm not sure if I interpretted the results correctly, but it seems I would not need to get the nvidia card after all since software encoding produces better results and the HD 3000 would suffice? any advice would be greatly appreciated.

    Thanks!
    Amien
  • 0 Hide
    acku , February 7, 2011 12:21 PM
    Quote:
    Thanks for the work put into the article, since I'm very new to all this however, I think it may have gone over my head :) 

    I am in the market for a new 'budget pc' and leaning toward an intel i5-2500k with an nVidia gts450 gfx card, the system should be aimed at producing great video quality at reasonable speed.

    I'm not sure if I interpretted the results correctly, but it seems I would not need to get the nvidia card after all since software encoding produces better results and the HD 3000 would suffice? any advice would be greatly appreciated.

    Thanks!
    Amien


    Quick Sync is basically = GPGPU. It's just done fixed function style. I would say if you aren't a crazy cook about image q, and I mean at the extreme end.... Using Spectracal to calibrate your HDTV. Only watch tv-reruns on Blu-ray, etc... Don't worry about software encoding. If are willing to give up that 1% (best case scenario) or ~25% (worse case), Quick Sync on the new Sandies will gives you up to a 4x speed bump. Remember that we used a GTX 580. It has 512 CUDA cores. The 450 only has 192. If you bought that graphics card, you wouldn't see the same transcoding performance as we did with the 580. Plus transcoding using a CUDA or APP uses the GPU for processing. That is going to burn into your power bill. Quick Sync uses fixed function hardware so its always going to be the most power efficient, even more than a pure software route.

    As I see it, forget the Nvidia card (unless you are gaming). The i5-2500k will still give you two options: Quick Sync or full software encoding. Remember that you need software that actually uses Quick Sync to transcode though. It isn't an automatic feature with every transcode software.

    Good luck on your build. I'd ping Don (who does our best CPU and graphics for the $ guides) if you have more questions on specific components.

    Cheers,
    Andrew Ku
    TomsHardware.com
  • 0 Hide
    Miharu , February 7, 2011 12:56 PM
    Andrew, did there are any avantage using Intel 3000 with ATI or Intel 3000 with Nvidia chipset as GPGPU ?
    I don't think "drivers" currently support that kind of thing... or any encode softwares?

    What did you think?

    Thank you
  • 1 Hide
    acku , February 7, 2011 12:59 PM
    Quote:
    Andrew, did there are any avantage using Intel 3000 with ATI or Intel 3000 with Nvidia chipset as GPGPU ?
    I don't think "drivers" currently support that kind of thing... or any encode softwares?

    What did you think?

    Thank you


    You can only choose one encoder. It is only going to be one of the following Quick Sync, APP, or CUDA. You can't do combos. Remember that Intel HD 3000 is the graphics side. Quick Sync is a separate logic circuit even though it's on the same die. I'll add that Quick Sync is disabled if you use a discrete graphics card.
  • 0 Hide
    cknobman , February 7, 2011 1:13 PM
    I have always just been happy using handbrake for all my video encoding needs and have never been disatisfied.

    I usually dont get in that big of a hurry and have never noticed anything terrible when watching the output but then .......

    Im not a videophile
  • 0 Hide
    amien , February 7, 2011 2:03 PM
    Thanks very much for that info, I'll be using Premiere Pro cs5, so i'm not sure if that supports Quick Sync?

    Out of interest, what card was used (if any) in the cpu benchmarks at
  • 0 Hide
    amien , February 7, 2011 2:04 PM
Display more comments