Next-Gen Video Encoding: x265 Tackles HEVC/H.265

Benchmarking Pre-Alpha x265

We got our hands on the early code and were able to run some preliminary numbers. Again, the developers say their code is still very early. It’s currently x86-only (albeit with support for advanced instruction sets like AVX2), lacks B-frame support (preventing it from achieving maximum compression), and doesn’t yet include some of the optimizations taken from x264 (like look-ahead and rate control). However, the features it does offer are good enough for some Core i7-4770K benchmarking as a precursor to what you might see in an upcoming processor review.

In the interest of transparency, we're using the following benchmark command: x265 --input Kimono1_1920x1080_24.yuv  --width 1920 --height 1080 --rate 24 -f 240 -o q24_Kimono1.out --rect --max-merge 1 --hash 1 --wpp --gops 4 --tu-intra-depth 1 --tu-inter-depth 2 --no-tskip, with quantization parameters between 24 and 42. It's notable that we're employing GOP (Group Of Picture)-level parallelism to keep our quad-core -4770K busy. I'm also adding the --cpuid switch to control the instruction sets being used.

The first observation we’re able to make is that x265 currently sees the most benefit from optimizations related to SSE3 and then SSE4.1. Gains attributable to AVX and AVX2 are still relatively small. According to the lead developer, the one part of the encoder utilizing AVX2, motion search, isn’t particularly performance-critical.

The x265 team is hoping to achieve real-time 1080p30 encoding on a Xeon-based server with 16 physical cores by next month. At this early stage, however, we’re still under 4 FPS using a quad-core -4770K.

Performance scales up as the quantization parameter changes, affecting quality. Naturally, as you approach the maximum QP of 51 (we’re only testing up to 42), bit rates drop and the encoder’s speed increases.

Although QP changes affect performance and encode time, the bit rate does not vary, with one exception. For some reason, we observed a very slightly higher rate with AVX2 enabled, which is why you see the red line peeking out above the other overlapped results.

You can draw conclusions about video quality two ways: objectively, using mathematical models, or subjectively, by looking at two clips and creating your own evaluation. PSNR, or peak signal-to-noise-ratio, is the most commonly-used objective technique describing the ratio between the reference image (in our case a raw YUV file) and error introduced by compressing it. This isn’t a perfect representation of quality, but, in general, a higher PSNR corresponds to an output more representative of the original.

Again, we see consistency across each instruction set, except for AVX2, which dips a bit at the highest QPs.

Create a new thread in the US Reviews comments forum about this subject
This thread is closed for comments
32 comments
    Your comment
  • Jindrich Makovicka
    In addition to PSNR comparison, I'd be much more interested in the SSIM metric, which is better suited for codecs using psychovisual optimizations.

    PSNR can be usable for when testing varying parameters for one codec, but not so much when comparing two completely different codecs.
    3
  • CaedenV
    Nice intro to the new codec!
    And to think that this is unoptomized... Once this is finalized it will really blow 264 out of the water and open new doors for 4K content streaming, or 1080p streaming with much better detail and contrast. This is especially important with the jump to 4K video. The 16x16 grouping limit on x264 is great for 1080p, but with 4K and 8K coming down the pipe in the industry we need something better. The issue is that we really do not have many more objects on the screen as we did back in the days of 480i video, it is merely that each object is more detailed. Funny thing is that a given object will typically have more homogeneous data across its surface area, and when you jump form 1080p to 4K (or 8K as is being done for movies) then it takes a lot more 16x16 groupings which may all relay the same information if it is describing a large simple object. Moving up to 64x64 alone allows for 8K groupings that take up the same percentage of the screen as 16x16 groupings do in 1080p.
    4
  • nibir2011
    Considering the CPU Load i think it wont be a viable solution for almost any home user within next 2-3 years unless CPUs gets exceptionally fast.

    Of course then we have the Quantum Computer. ;)
    -3
  • Shawna593767
    Quantum computers aren't fast enough for this, the get their speed by doing less calculations.
    For instance a faster per clock x86 computer might have to do say 10 million calculations to find something, whereas the quantum computer is slower per clock but would only need 100,000 calculations.
    4
  • Cryio
    I the rate Intel is NOT improving their CPUs, quantum computers are far, far away
    3
  • nibir2011
    694354 said:
    Quantum computers aren't fast enough for this, the get their speed by doing less calculations. For instance a faster per clock x86 computer might have to do say 10 million calculations to find something, whereas the quantum computer is slower per clock but would only need 100,000 calculations.



    well a practical quantum computer does not exist . lol

    i think that is not the case with calculation.i think what you mean is accuracy. number of calculation wont be different; it will be how many times same calculations need to be done. in theory a quantum computer[whatever qubits] should be able to make perfect calculations as it can get all the possible results by parallelism of bits[long stuff]. a normal cpu cant do that it has to evaluate each results separately. SO a quantum computer is very very efficient than any traditional cpu. Speed is different it depends on both algorithm and architecture. quantum algorithms is at its infancy. last year maybe a quantum algorithm for finding out primes was theorized. I do not know if we will see a quantum computer capable of doing what the regular computers do next 30 years.

    thanks
    0
  • InvalidError
    Most of the 10bit HDR files I have seen seem to be smaller than their 8bits encodes for a given quality. I'm guessing this is due to lower quantization error - less bandwidth wasted on fixing color and other cumulative errors and noises over time.
    2
  • ddpruitt
    I know it's a minor detail but it's important:

    H.264 and H.265 are NOT encoding standards, they are DECODING standards. The standards don't care how the video is encoded just how it's decoded, I think it should be made clear because the article implies they are decoding standards and people incorrectly assume one implies the other. x264 and x265 are just open source encoders that encode to formats that can be decoded properly by H.264 and H.265.

    x264 has noticeable issues with blacks, they tend to come out grey. I would like to see if x265 resolves the problem. I would also like to see benchmarks on the decoding end (CPU Load, power usage, etc) as I see this becoming an issue in the future with streaming video on mobile devices and laptops.
    7
  • chuyayala
    I truly hope this is optimized for Open-CL encoding in the future.
    1
  • Nintendo Maniac 64
    You guys should really include VP9 in here as well, since unlike VP8 it's actually competitive according to the most recent testing done on the Doom9 forums, though apparently the reference encoder's 2-pass mode is uber slow.
    1
  • InvalidError
    764073 said:
    I know it's a minor detail but it's important: H.264 and H.265 are NOT encoding standards, they are DECODING standards.

    h264/h265 are methods of storing compressed video.

    While the exact method of encoding and decoding it are at the individual algorithm developers' sole discretion, the structures and core algorithms related to how information ultimately needs to be structured dictate a fair chunk of how BOTH the encode and decode algorithms need to work.

    Most of the decode steps are almost exact inverse transforms of their corresponding encode steps. That's why you have reference encoders and decoders to prove that every encoding step can be reversed by its corresponding decoding step. Saying that encode is unrelated to decode is very naive; they are very closely related - at least in reference implementations.

    Some programmers may find shortcuts through the reference designs or ways to combine multiple steps into one or find other ways to achieve the same result for a given step or group of steps but the overall encode and decode algorithms usually retain the same general flow.
    3
  • ElMoIsEviL
    694354 said:
    Quantum computers aren't fast enough for this, the get their speed by doing less calculations. For instance a faster per clock x86 computer might have to do say 10 million calculations to find something, whereas the quantum computer is slower per clock but would only need 100,000 calculations.


    Quantum Computers do less calculations by virtue of being able to calculate every possible answer simultaneously (as the amount of Qbits rises the amount of possible solutions entertained in a Quantum State rises exponentially). Seems to me that this would be perfect when it comes to the concept of having to do multiple passes in terms of video encoding. You'd only have to do a single pass and during that single pass any possible outcome in terms of frame IQ can be processed simultaneously ensuring that each frame kept is the perfect frame free of errors.

    It would be quite super.
    0
  • nevilence
    Quantum Computers do less calculations by virtue of being able to calculate every possible answer simultaneously./quotemsg]

    That doesnt sound right at all? I may be wrong here but qbits are the same in concept as a normal bit, expcept instead of two states it can have three i.e 0, 1 or 0/1. That doesnt calculate every possible answer, just one of three possible states? So two qbits can give 9 different states as opposed to classical bits which would only give 4. Again I may have my wires crossed here. Just sounds off to me
    0
  • jkflipflop98
    Basic quantum theory is very difficult to grasp, and understanding the operation of a quantum computer is beyond most humans. There are only a handful of people on this planet that can knowingly speak on the subject, and I'll almost guarantee none of them visit this site.
    2
  • nevilence
    48056 said:
    Basic quantum theory is very difficult to grasp, and understanding the operation of a quantum computer is beyond most humans. There are only a handful of people on this planet that can knowingly speak on the subject, and I'll almost guarantee none of them visit this site.


    Its only difficult if I observe myself trying to grasp it, wait does that make me a wave or a partical
    2
  • Achoo22
    Quote:
    Basic quantum theory is very difficult to grasp, and understanding the operation of a quantum computer is beyond most humans. There are only a handful of people on this planet that can knowingly speak on the subject, and I'll almost guarantee none of them visit this site.


    So? Just as there are plenty of drivers that aren't very knowledgeable about internal combustion engines, there are plenty of expert programmers that have little knowledge of current computer organization at its lowest levels.
    0
  • Sagittaire
    Well I know very well x264 and I make actually test H264 vs H265. Your test have problem:
    - If you make psnr test, you must use best possible setting for x264 and use --tune-psnr
    - you use quantizer mode for x264 and x265. Anyway x264 have really most advanced rate control with crf mode.
    - you don't use the best possible x264 version. x264 10 bits produce better result (at least 0.2 dB for same bitrate).

    If you combine all these tweak, I can say that x264 with best setting produce certainely same PSNR result than x265. Moreover H265 is superior to H264, no doubt about that. But I can say, when I read your article, than x265 is not really superior to x264 at this time (for PSNR result at least).
    0
  • ElMoIsEviL
    45371 said:
    Quantum Computers do less calculations by virtue of being able to calculate every possible answer simultaneously./quotemsg] That doesnt sound right at all? I may be wrong here but qbits are the same in concept as a normal bit, expcept instead of two states it can have three i.e 0, 1 or 0/1. That doesnt calculate every possible answer, just one of three possible states? So two qbits can give 9 different states as opposed to classical bits which would only give 4. Again I may have my wires crossed here. Just sounds off to me


    Quote:
    A quantum computer can be in many states simultaneously, which in turn means that it can, in some sense, perform many different calculations at the same time. To be precise, a quantum computer with four qubits could be in 2^4 (ie, 16) different states at a time. As you add qubits, the number of possible states rises exponentially. A 16-bit quantum machine can be in 2^16, or 65,536, states at once, while a 128-qubit device could occupy 3.4 x 10^38 different configurations, a colossal number which, if written out in longhand, would have 39 digits. Having been put into a delicate quantum state, a quantum computer can thus examine billions of possible answers simultaneously. (One way of thinking about this is that the machine has co-operated with versions of itself in parallel universes.)


    Hope that answers your query.
    0
  • nevilence
    The figures you are talking about are just the number of classical bits represented by qbits. Yes 16 qbits may represent what 65k+ classical bits can, but that doesnt mean it examines all answers simultaneously. The act of measuring the spin up or down of the qbit, gives a very clearly defined state, only before observation are there the possiblity of "ulimited" states.

    Well that is at least my understanding, mind you I aint no quantum master either. The youtube link is my source, its a very good watch.

    http://www.youtube.com/watch?v=g_IaVepNDT4
    0
  • Filiprino
    x265 is still unoptimized. If the performance improves and with the same bitrate as x264 you can get rid of banding on obscure scenes enabling higher contrast, it's a welcomed codec. I still do not have a CPU that can do x264 1080p real time and you come with this codec eating as much as 10x more cycles. I hope performance improves a lot :(
    0