Video Transcoding Examined: AMD, Intel, And Nvidia In-Depth

Hardware Decoder Quality: Examined

Part of what makes transcoding a difficult subject to tackle is that we are dealing with different decoder hardware. Even though a program like Windows Media Player 12 or PowerDVD may use near-universal API calls for DXVA (used for hardware-accelerated decoding), the hardware actually doing the processing is designed differently. For that reason alone, we want to pay some attention to decoder output.

There are a couple of problems to overcome here. When you play back a video, you need to grab the exact same frame for comparison. Two sequential frames in the same clip can have large differences, which is why you need to be sure that you are comparing the exact same frame across multiple platforms. Using VLC, it is easy because you can go to a specific time and take a screenshot using the built-in capture function. However, this means you are limited to VLC's decoder, which only uses the decode portion of the pipeline. It doesn't utilize motion compensation or frequency transformation acceleration, even when you enable hardware-based decoding. Plus, it doesn't support Intel's implementation, which means that's only an AMD versus Nvidia versus software decoder battle.

So, we need to find a way to use something like WMP12 and still capture from the video renderer. There is actually an easy solution to the last part. WMP12, PowerDVD, and other recent video players all use a new MediaFoundation component, known as the Enhanced Video Renderer (EVR). Normally, when you take a screen capture of WMP12, you get nothing, since it comes in as an overlay. If you download the DirectX 11 SDK, there is a setting within the control panel to enable screen capture, which solves the problem of examining decoder hardware. Now, this does use the older DirectDraw code to dump a still frame, but that happens after it has been rendered by EVR. At this point, the video is paused and is no longer behaving like a streaming data set. DirectDraw is simply allowing us to dump that as part of our regular screen capture. Since we are dealing with a static frame, what we have captured is representative of what you see on the screen.

We can address the problem of capturing specific scenes by using an unprotected copy of a Blu-ray-based workingprint segment from Iron Man. Before a movie is completely finished, directors and video editors use this rough cut of a movie to help them add effects, animation, insert footage, remove footage, or redub audio. Workingprints contain time codes to the centisecond. This is a property we can exploit to ensure that we consistently extract the same the frame for analysis. This isn't without complications. You need to be in a hyper-reaction mode, and over many countless hours of mouse clicks fueled by caffeine, we were able to capture everything we needed.

The last issue with which we have to deal is scaling. When you watch a Blu-ray movie on a screen larger or smaller than 1920x1080, the video data gets piped through a scaler in order to accommodate the larger resolution. In order to minimize the effect of scaling, we play our video back on a native 1920x1080 screen. Smart decoders will bypass the scalers at native resolutions. Some decoders don't, and that is often one point where image quality can be improved. Furthermore, decoders are inherently different in the very way they take in data. Some may prefer an 8x8 block; others may use a different block size for processing. For us, we care about the final output. That is what is going to show us the real-world difference, if there is one.

Hardware Decoder Quality: Are they the same?

As far as color space manipulation is concerned, AMD actively changes the color profile of video during playback. Transcoded video is actually unaffected because you don't actually play back the video, which means you aren't sending the data stream to the renderer. Nvidia and Intel default to the color profile of the application. I intentionally leave the manipulation by AMD enabled so you can see the difference.

Aside from saturating the image, UVD 3 still looks worse than Intel's Clear Video HD (CVT) and the fourth generation of PureVideo (VPDAU4). You can specifically see this around the blades of the helicopter to the far-right of the image. There is no aliasing around the edges, and the workers on the truck are less blurred, which means Intel's and Nvidia's motion compensation algorithms are coping with the camera panning motion more accurately.

This next scene is where Rhodes is driving towards the battle between Obadiah and Tony near the end of the film. This is high motion to the extreme, and these three frames were some of the most difficult to capture. It is near impossible to tell Nvidia and Intel part. AMD sticks out like a sore thumb simply because the high brightness setting causes a halo effect on the street lamp. Even when we turn off color manipulation, there is a very slight degree of graininess present near the front bumper of the car.

Even with color manipulation, in dark scenes we're hard-pressed to tell AMD apart at first glance. Overall, all three companies show much more consistent results here. The only thing obvious detail we noticed was that Nvidia looks slightly lighter under Iron Man and the car than in our Intel shot. AMD is noticeably brighter in these two areas, but that is to be expected.

In bright scenes it is harder to distinguish the quality of UVD 3 unless you turn off color manipulation. When you do, you notice that the Nvidia and AMD solutions show the background officer slightly grainier. Even with color settings handed over to the application's control, you can still see that some color management occurs beyond what the viewer can control. If you overlay the Nvidia and Intel shots over one another, you can tell that Nvidia is a shade lighter.

In this last scene, the camera is slowing panning as Gwyneth looks up. AMD's color saturation makes much more sense here, as the scene appears more vivid. With that said, the brightness is very aggressive and tends to wash out a lot of detail. Nvidia is overly sharp. You get more detail, but overall, the picture looks grainier. Intel is the one with the upper hand here; its image shows a good balance betweeen detail and sharpness.

  • spoiled1
    Tom,
    You have been around for over a decade, and you still haven't figured out the basics of web interfaces.

    When I want to open an image in a new tab using Ctrl+Click, that's what I want to do, I do not want to move away from my current page.

    Please fix your links.
    Thanks
    Reply
  • spammit
    omgf, ^^^this^^^.

    I signed up just to agree with this. I've been reading this site for over 5 years and I have hoped and hoped that this site would change to accommodate the user, but, clearly, that's not going to happen. Not to mention all the spelling and grammar mistakes in the recent year. (Don't know about this article, didn't read it all).

    I didn't even finish reading the article and looking at the comparisons because of the problem sploiled1 mentioned. I don't want to click on a single image 4 times to see it fullsize, and I certainly don't want to do it 4 times (mind you, you'd have to open the article 4 separate times) in order to compare the images side by side (alt-tab, etc).

    Just abysmal.
    Reply
  • cpy
    THW have worst image presentation ever, you can't even load multiple images so you can compare them in different tabs, could you do direct links to images instead of this bad design?
    Reply
  • ProDigit10
    I would say not long from here we'll see encoders doing video parallel encoding by loading pieces between keyframes. keyframes are tiny jpegs inserted in a movie preferably when a scenery change happens that is greater than what a motion codec would be able to morph the existing screen into.
    The data between keyframes can easily be encoded in a parallel pipeline or thread of a cpu or gpu.
    Even on mobile platforms integrated graphics have more than 4 shader units, so I suspect even on mobile graphics cards you could run as much as 8 or more threads on encoding (depending on the gpu, between 400 and 800 Mhz), that would be equal to encoding a single thread video at the speed of a cpu encoding with speed of 1,6-6,4GHz, not to mention the laptop or mobile device still has at least one extra thread on the CPU to run the program, and operating system, as well as arrange the threads and be responsible for the reading and writing of data, while the other thread(s) of a CPU could help out the gpu in encoding video.

    The only issue here would be B-frames, but for fast encoding video you could give up 5-15MB video on a 700MB file due to no B-frame support, if it could save you time by processing threads in parallel.
    Reply
  • intelx
    first thanks for the article i been looking for this, but your gallery really sucks, i mean it takes me good 5 mins just to get 3 pics next to each other to compare , the gallery should be updated to something else for fast viewing.
    Reply
  • _Pez_
    Ups ! for tom's hardware's web page :P, Fix your links. :) !. And I agree with them; spoiled1 and spammit.
    Reply
  • AppleBlowsDonkeyBalls
    I agree. Tom's needs to figure out how to properly make images accessible to the readers.
    Reply
  • kikireeki
    spoiled1Tom, You have been around for over a decade, and you still haven't figured out the basics of web interfaces.When I want to open an image in a new tab using Ctrl+Click, that's what I want to do, I do not want to move away from my current page.Please fix your links.Thanks
    and to make things even worse, the new page will show you the picture with the same thumbnail size and you have to click on it again to see the full image size, brilliant!
    Reply
  • acku
    Apologies to all. There are things I can control in the presentation of an article and things that I cannot, but everyone here has given fair criticism. I agree that right click and opening to a new window is an important feature for articles on image quality. I'll make sure Chris continues to push the subject with the right people.

    Web dev is a separate department, so we have no ability to influence the speed at which a feature is implemented. In the meantime, I've uploaded all the pictures to ZumoDrive. It's packed as a single download. http://www.zumodrive.com/share/anjfN2YwMW

    Remember to view pictures in the native resolution to avoid scalers.

    Cheers
    Andrew Ku
    TomsHardware.com
    Reply
  • Reynod
    An excellent read though Andrew.

    Please give us an update in a few months to see if there has been any noticeable improvements ... keep your base files for reference.

    I would imagine Quicksynch is now a major plus for those interested in rendering ... and AMD and NVidia have some work to do.

    I appreciate the time and effort you put into the research and the depth of the article.

    Thanks,

    :)
    Reply