Sign in with
Sign up | Sign in
Next-Gen Video Encoding: x265 Tackles HEVC/H.265
By ,
1. Introducing HEVC And x265

So much of what we do at Tom’s Hardware depends on an evolving benchmark suite. Sometimes I put up news stories or Twitter posts asking for what you want to see from our reviews, and we’ve added a ton of testing based on that feedback. But we also keep up with industry trends and adopt testing for taxing new technologies as soon as we can.

Now, you’re already familiar with the H.264 video codec, which is instrumental in compressing high-definition video for distribution. Most of the devices you watch movies on employ fixed-function logic to accelerate decoding of H.264-based content, minimizing the host processor workload and, at least on mobile devices, extending battery life. But high-quality software-based encoding can still be pretty taxing, which is why we have Adobe’s Media Encoder, HandBrake, and TotalCode Studio in our standard benchmark suite.

What’s the point of three different benchmarks that involve H.264? As it turns out, each encoding algorithm is different, and at a given quality level, bit rate can vary quite a bit. The following chart, which comes from a comparison conducted by Lomonosov Moscow State University’s Graphics and Media Lab, demonstrates the x264 encoder’s efficiency compared to other popular options.

x264 benefits from years of development and optimization. It’s freely available under the terms of the GNU GPL for internal use, or you can license it commercially if your company is concerned about linking proprietary applications to GPL code. So, big companies like Netflix, Hulu, Amazon, and YouTube are leveraging it to get more quality from lower-bit rate files, preserving bandwidth and delivering a better experience. Meanwhile, enthusiasts and power users get to use it at home without paying anything, and open source front-ends like HandBrake employ it for H.264-based encoding.

But of course, we’re entering this era of higher-definition displays, higher dynamic range, and larger color space, all of which has to be represented by more data. That means larger video files if you want better quality. You can already see how streaming the nicest-looking content is getting increasingly more bandwidth-intensive. Fortunately, the standard for H.264’s successor, High Efficiency Video Coding, was recently published. It’s more computationally intensive, but should increase coding efficiency dramatically compared to H.264.

Instead of H.264’s 16x16-pixel macroblocks, HEVC employs something called a Coding Tree Unit that can be as large as 64x64, describing less complex areas more efficiently. Even still, 1080p encodes are expected to be five to 10 times more taxing, while 4K video multiplies those demands by another 4 to 16x. Fortunately, a lot of effort went into making sure that encoding can be parallelized, and I’ll illustrate the impact of this shortly.

How, you ask? Today, MulticoreWare (the company responsible for creating an OpenCL-accelerated version of x264 for Telestream’s Episode Encoder) is making pre-alpha code for its HEVC encoder available at Bitbucket. Its commercially-funded project began earlier this year, and it’ll employ the same business model as x264, meaning you can download and compile x265 under the GNU GPL as well. Leveraging source code from x264 (and indeed, with that project’s lead developer as an adviser), MulticoreWare is hoping to see x265 become a true successor.

2. Benchmarking Pre-Alpha x265

We got our hands on the early code and were able to run some preliminary numbers. Again, the developers say their code is still very early. It’s currently x86-only (albeit with support for advanced instruction sets like AVX2), lacks B-frame support (preventing it from achieving maximum compression), and doesn’t yet include some of the optimizations taken from x264 (like look-ahead and rate control). However, the features it does offer are good enough for some Core i7-4770K benchmarking as a precursor to what you might see in an upcoming processor review.

In the interest of transparency, we're using the following benchmark command: x265 --input Kimono1_1920x1080_24.yuv  --width 1920 --height 1080 --rate 24 -f 240 -o q24_Kimono1.out --rect --max-merge 1 --hash 1 --wpp --gops 4 --tu-intra-depth 1 --tu-inter-depth 2 --no-tskip, with quantization parameters between 24 and 42. It's notable that we're employing GOP (Group Of Picture)-level parallelism to keep our quad-core -4770K busy. I'm also adding the --cpuid switch to control the instruction sets being used.

The first observation we’re able to make is that x265 currently sees the most benefit from optimizations related to SSE3 and then SSE4.1. Gains attributable to AVX and AVX2 are still relatively small. According to the lead developer, the one part of the encoder utilizing AVX2, motion search, isn’t particularly performance-critical.

The x265 team is hoping to achieve real-time 1080p30 encoding on a Xeon-based server with 16 physical cores by next month. At this early stage, however, we’re still under 4 FPS using a quad-core -4770K.

Performance scales up as the quantization parameter changes, affecting quality. Naturally, as you approach the maximum QP of 51 (we’re only testing up to 42), bit rates drop and the encoder’s speed increases.

Although QP changes affect performance and encode time, the bit rate does not vary, with one exception. For some reason, we observed a very slightly higher rate with AVX2 enabled, which is why you see the red line peeking out above the other overlapped results.

You can draw conclusions about video quality two ways: objectively, using mathematical models, or subjectively, by looking at two clips and creating your own evaluation. PSNR, or peak signal-to-noise-ratio, is the most commonly-used objective technique describing the ratio between the reference image (in our case a raw YUV file) and error introduced by compressing it. This isn’t a perfect representation of quality, but, in general, a higher PSNR corresponds to an output more representative of the original.

Again, we see consistency across each instruction set, except for AVX2, which dips a bit at the highest QPs.

3. x265 Versus x264 And CPU Utilization

The idea is that HEVC should allow you to encode video at similar quality levels and lower bit rates. MulticoreWare is currently estimating 25 to 35% lower bit rates at a given PSNR. However, its developers are also adamant that as they add those aforementioned features, encoding efficiency will improve.

With that in mind, let’s take a quick peek at how pre-alpha x265 stacks up against the well-optimized x264. For this one, we're using the following command: x264  --preset placebo --sar 1920:1080 --fps 24 --frames 500 --psnr -o x264Kimonoq24.264  Kimono1_1920x1080_24.yuv, again with quantization parameters between 24 and 42. We could have used the --tune psnr switch to generate higher values, though this negatively affects subjective quality compared to the settings used here.

At each point on the curve, you can either get better quality for a given bit rate from x265, or the same quality at a lower bit rate. 

Our test workload, encoded by x264Our test workload, encoded by x264

Same workload with GOP-level parallelism enabled in x265Same workload with GOP-level parallelism enabled in x265

We know that other HEVC encoders will emerge, and our benchmark suite will likely evolve to phase out the several H.264-based tests as this happens. But more immediately, we’re excited to have x265 as the first in our test lab, taxing high-end hardware more intensively than most of the other metrics we use. In fact, I screen-capped both workloads in action and consistently saw x265 pegging our Haswell-based test mule at 100% utilization, while x264 bounced around quite a bit more.

By the time you read this, MulticoreWare should have more information up at x265.org.