Video Transcoding Examined: AMD, Intel, And Nvidia In-Depth

Page 13 of 13:

Final Words

It's time to wrap up this crazy foray into image quality. If you didn't bother with all of the technical bits, here are the summaries:

What looks the best? APP, CUDA, or Quick Sync?

Image quality isn't an exact science. We have all grown accustomed to decoders that are designed to mask errors in bad video. Your Radeon's UVD or your GeForce's PureVideo logic circuit purposely tries to fix bad video when it can.

Ideally, you want to use the same software decoder in all of your testing. But just as important, you need a decoder that doesn't mask any errors if you're trying to draw a comparison between the quality of different video files. Consumer-grade decoders like MediaFoundation, CyberLink, Arcsoft, and Badaboom are all designed with error correction in mind. They are trying to make your viewing experience better, which is great. But if you are doing image quality analysis, you're in an an entirely different arena.

At the moment, there is no clear winner here. Video transcoded in APP doesn't look good (in our opinion) in Arcsoft's MediaConverter. Intel looks grainy in Badaboom (Elemental has an excuse; it's still in alpha), and CUDA has some serious issues in MediaEspresso. Furthermore, I'm not sure there is ever going to be point where you can make a clear judgment across the board, considering so much relies on software programmers. The hardware could be good, the reference library could be robust, and you could still have a human-introduced programming issue that screws up the video output.

How do I tell badly-transcoded video from good video?

As an everyday user, you can't really make that call. Consider multiple video players, which use different decoders and encoders. Take Apple, for example. It uses a Broadcom video decoder, along with CoreAnimation, in its media devices. But RIM, Nokia, and Samsung all have their own decoders and renderers. Typically, the industry tests by isolating individual variables. For example, if you are testing scalers, you only change the screen resolution. If you are testing encoder quality, you should only change the bitrate. Then you use a single software decoder to play back the video. If there is a problem, you switch it up by encoding the source using multiple encoders and trying combinations to narrow it down.

The only way you can tell a badly-transcoded video from a good one is to compare to the source image in multiple video players over multiple graphic configurations. Even on a single configuration, bad video should appear uniformly poor on a majority of players. Good video will simply be good regardless. If you aren't willing to spend a whole month comparing quality, your only task is making sure transcoded video looks good to you in the software players and hardware devices you use.

What should I use for encoding?

Do you use Spectral to calibrate your HDTV? Do you agonize at night over motion compensation? Do you only re-watch your favorite TV shows once they've been released on Blu-ray, instead of streaming them over Netflix or Hulu because you look down on the picture quality? If you are a quality nut, then you will want to stick with software-based decode and encode, keeping hardware acceleration from touching your content entirely. It is the one constant, reliable setting for video quality.

Take a deep breath and put this in context, though. In a worst-case scenario, hardware acceleration gives you 75% of the quality and a minor speed up versus processor-only transcoding. In a best-case scenario, you are getting 99% of the quality, and running up to 400% faster than a processor working on its own. If you're in a rush, or don't care about the quality of videos you move over to your 960x640 iPhone 4, that trade-off isn't as bad as it might otherwise seem for a purist. No matter what, there is always going to be an inverse relationship between speed and quality.

If you are a mobile user transcoding on battery power, Quick Sync is the only way to go right now (desktop folks are less likely to use it because of Intel's limitation that discrete graphics make Quick Sync unavailable). GPU-based transcoding still involves general-purpose hardware, and spinning up a Radeon HD 6970M or GeForce GTX 480M is going to gobble up battery life.

Will transcoding on the GPU ever be as good as it is on the CPU?

I posed this question to people within the industry. Why does hardware-accelerated encoding (even Intel's Quick Sync) provide such inconsistent results? We got back two answers. Some people compared the situation to the emergence of threaded computing. They made the point that it took years for us to finally see apps that were optimized to take advantage of more than one core. It is their belief that, if we wait long enough, we will see more robust reference libraries. Intel, Nvidia, and AMD have simply focused on speed up until now because that is what we see in benchmarks. In response to Intel's jab at CUDA transcoding, one person remarked, "that may have been true three years when things were just getting starting, but now your conclusion (Andrew) is that these things (quality output) are very close today." (Chris: In my own defense, the results of my Brazos coverage speak volumes. The resulting video quality is completely unacceptable in a living room setting, and I'd personally go with CPU-based transcoding to avoid those visual artifacts.)

Another person concurred, saying, "give them time and they will get quality down." Both individuals have a fair point. Indeed, Sam Blackman, Elemental's CEO, is insistent that hardware-accelerated encoders will be just as good as their CPU counterparts given time. He further stated, "if you look at where we were with Badaboom two years ago versus where we are now, the amount of progress is so significant that its pretty clear that the trajectory of GPGPU encoders is one that is going to surpass CPU only in the very near future, if it hasn't already."

Others pointed out that H.264 is an inherently serial codec, and so is ever other codec in use. People in this camp believe that transcoding on graphics processors will approach the CPU, but it will never be the same. There are many enhancements in modern encoders that rely on the serial nature of the video data. Many of these enhancements needed to be stripped out to achieve parallelism. One person said that, until we have a codec that is optimized for parallelism, we aren't going to go very far on the GPU, at least in regards to transcoding. Both sides are probably right to a degree.

The serial nature of every available codec today happens to be one of the biggest bottlenecks affecting how much transcoding occurs on the GPU. Ideally, what you really want is to have a macroblock be more dependent on its neighbor temporally, rather than spatially. That way you can process more frames in parallel. Unfortunately, this type of codec doesn't exist yet. Many in the industry want to see a new codec designed with parallel computing in mind. We hope this is a possibility that H.265 (HEVC) explores.

The game is on. Let the best GPGPU-based (or fixed-function hardware) encoder win.

*A small footnote: I only talked about image quality with respect to H.264, as it is the most pervasive video codec today. However, VC-1 and MPEG-2 decoding could be a different story. Furthermore, when it comes to video, it would be foolish to not also roll audio into the discussion. We wanted to keep things "simple," so that didn't fit into today's context.

Current page: Final Words

Prev Page Inside The Black Box: GPGPU Encoding

TOPICS

52 Comments Comment from the forums

spoiled1

Tom,
You have been around for over a decade, and you still haven't figured out the basics of web interfaces.

When I want to open an image in a new tab using Ctrl+Click, that's what I want to do, I do not want to move away from my current page.

Please fix your links.
Thanks
Reply
spammit

omgf, ^^^this^^^.

I signed up just to agree with this. I've been reading this site for over 5 years and I have hoped and hoped that this site would change to accommodate the user, but, clearly, that's not going to happen. Not to mention all the spelling and grammar mistakes in the recent year. (Don't know about this article, didn't read it all).

I didn't even finish reading the article and looking at the comparisons because of the problem sploiled1 mentioned. I don't want to click on a single image 4 times to see it fullsize, and I certainly don't want to do it 4 times (mind you, you'd have to open the article 4 separate times) in order to compare the images side by side (alt-tab, etc).

Just abysmal.
Reply
cpy

THW have worst image presentation ever, you can't even load multiple images so you can compare them in different tabs, could you do direct links to images instead of this bad design?
Reply
ProDigit10

I would say not long from here we'll see encoders doing video parallel encoding by loading pieces between keyframes. keyframes are tiny jpegs inserted in a movie preferably when a scenery change happens that is greater than what a motion codec would be able to morph the existing screen into.
The data between keyframes can easily be encoded in a parallel pipeline or thread of a cpu or gpu.
Even on mobile platforms integrated graphics have more than 4 shader units, so I suspect even on mobile graphics cards you could run as much as 8 or more threads on encoding (depending on the gpu, between 400 and 800 Mhz), that would be equal to encoding a single thread video at the speed of a cpu encoding with speed of 1,6-6,4GHz, not to mention the laptop or mobile device still has at least one extra thread on the CPU to run the program, and operating system, as well as arrange the threads and be responsible for the reading and writing of data, while the other thread(s) of a CPU could help out the gpu in encoding video.

The only issue here would be B-frames, but for fast encoding video you could give up 5-15MB video on a 700MB file due to no B-frame support, if it could save you time by processing threads in parallel.
Reply
intelx

first thanks for the article i been looking for this, but your gallery really sucks, i mean it takes me good 5 mins just to get 3 pics next to each other to compare , the gallery should be updated to something else for fast viewing.
Reply
_Pez_

Ups ! for tom's hardware's web page :P, Fix your links. :) !. And I agree with them; spoiled1 and spammit.
Reply
AppleBlowsDonkeyBalls

I agree. Tom's needs to figure out how to properly make images accessible to the readers.
Reply
kikireeki

spoiled1Tom, You have been around for over a decade, and you still haven't figured out the basics of web interfaces.When I want to open an image in a new tab using Ctrl+Click, that's what I want to do, I do not want to move away from my current page.Please fix your links.Thanks
and to make things even worse, the new page will show you the picture with the same thumbnail size and you have to click on it again to see the full image size, brilliant!
Reply
acku

Apologies to all. There are things I can control in the presentation of an article and things that I cannot, but everyone here has given fair criticism. I agree that right click and opening to a new window is an important feature for articles on image quality. I'll make sure Chris continues to push the subject with the right people.

Web dev is a separate department, so we have no ability to influence the speed at which a feature is implemented. In the meantime, I've uploaded all the pictures to ZumoDrive. It's packed as a single download. http://www.zumodrive.com/share/anjfN2YwMW

Remember to view pictures in the native resolution to avoid scalers.

Cheers
Andrew Ku
TomsHardware.com
Reply
Reynod

An excellent read though Andrew.

Please give us an update in a few months to see if there has been any noticeable improvements ... keep your base files for reference.

I would imagine Quicksynch is now a major plus for those interested in rendering ... and AMD and NVidia have some work to do.

I appreciate the time and effort you put into the research and the depth of the article.

Thanks,

:)
Reply

Show more comments