Full Blu-ray Transcoding Speed: APP Versus CUDA Versus Quick Sync

We now know that there are clear differences between hardware-accelerated decoders, and even software-based decoders. But what about encoding? That's what started this foray into image quality after all.

AMD Radeon HD 6970 Nvidia GeForce GTX 580 Manufacturing Process 40 nm TSMC 40 nm TSMC Die Size 389 mm² 520 mm² Transistors 2.64 billion 3 billion Engine Clock 880 MHz 772 MHz Stream Processors / CUDA Cores 1536 512 Compute Performance 2.7 TFLOPS 1.58 TFLOPS Texture Units 96 64 Texture Fillrate 84.5 Gtex/s 49.4 Gtex/s ROPs 32 48 Pixel Fillrate 28.2 Gpix/s 37.1 Gpix/s Frame Buffer 2 GB GDDR5 1.5 GB GDDR5 Memory Clock 1375 MHz 1002 MHz Memory Bandwidth 176 GB/s (256-bit) 192 GB/s (384-bit) Maximum Board Power 250 W 244 W

We're using the best consumer cards money can buy, AMD's Radeon HD 6970 and Nvidia's GeForce GTX 580.

Full BDAV, 31.2 GB H.264 BDAVHH:MM:SS AMD Nvidia Intel Performance Intel Quality Hardware Decode & Hardware/GPGPU Encode 1:24:00 0:49:34 0:19:35 0:23:22 Hardware Decode & Software Encode 0:47:55 0:49:38 0:35:21 0:46:13 Software Decode & GPGPU/Hardware Encode 1:01:17 0:50:21 0:48:17 0:48:41 Software Decode & Encode 1:04:26 1:04:22 0:55:38 1:05:20

When it comes to transcoding entire videos, MediaEspresso was the only program that would accept our full 31.2 GB unprotected Blu-ray Iron Man movie. In MediaConverter 7 and Badaboom, we were prompted with audio codec errors, as neither software recognizes TrueHD. Separately, it is important to point out that if you use Quick Sync, you are forced to chose a Performance or Quality setting. This is unavailable if you are running a Nvidia- or AMD-based card.

According to our benchmarks, the biggest bottleneck really occurs at the decode stage. If you enable APP or CUDA encoding, there is a small gain (more so for CUDA), but the biggest benefit is when you turn on hardware-accelerated decoding. Enabling APP encoding and UVD 3 on the Radeon card actually appears to be the worst thing you can do for performance. Every other combination of settings is faster, including software-only. With CUDA, we get a measly four-second gain with both hardware settings enabled (versus PureVideo-only).

Intel's Quick Sync hardware demonstrates much more impressive numbers. However, the quality setting yields much less aggressive scaling. What we're seeing there is the effect of using a lower bitrate for software-based encoding.

665 MB H.264 BDAV/M2TS Transcode, MM:SS AMD Radeon HD 6970 Transcode Application MediaEspresso MediaConverter Hardware Decode & APP Encode 2:29 1:40 Hardware Decode & Software Encode 2:28 - Software Decode & APP Encode 1:57 - Software Decode & Encode 2:41 1:22

Nvidia GeForce GTX 580 Transcode Application MediaEspresso MediaConverter Hardware Decode & CUDA Encode 1:37 1:06 Hardware Decode & Software Encode 1:50 - Software Decode & CUDA Encode 2:02 - Software Decode & Encode 2:41 1:22

Intel HD Graphics 3000 (Core i5-2500K) Transcoding Application MediaEspresso Performance MediaEspresso Quality MediaConverter Hardware Decode & Quick Sync Encode 0:46 0:56 1:09 Hardware Decode & Software Encode 1:26 2:22 - Software Decode & Quick Sync Encode 2:10 2:07 - Software Decode & Encode 2:10 2:43 1:24

Even though our 665 MB H.264/AC3 BDAV clip is the same bitrate as our 31.2 GB movie, we see hardware-accelerated encoding yielding much better results on the Radeon HD 6970 and GeForce GTX 580. Intel's Quick Sync-enabled Core i5-2500K only sees a substantial gain using the quality setting and encode acceleration.

Note that ArcSoft's MediaConverter either uses hardware- or software-based transcoding. It doesn't offer the granular control enabled by CyberLink. Overall, we see much faster transcode times with MediaConverter, though. If we just go by the numbers, it appears that MediaConverter is better optimized for multithreaded performance as well. We are talking about more than a full minute improvement over the times seen in MediaEspresso.

Using this smaller clip, AMD's APP is finally faster than the pure software route, but not by much. Unfortunately, using APP in MediaConverter is still slower than using the CPU-only option. The suggestion here is that you'd see better results if you were stuck on an older machine with a slower processor and modern graphics card. The new Core i5-2500K is simply too fast.

Overall, Quick Sync soundly trounces CUDA and APP for encoding and decoding performance. But the differences between AMD and Nvidia are less apparent given such mixed results. Is this simply the result of using high bitrate Blu-ray source files?