'The biggest speedup I've seen so far' — FFmpeg devs boast of another 100x leap thanks to handwritten assembly code

FFmpeg update can make some operations 100x faster
(Image credit: FFmpeg)

The developers behind the FFmpeg project are again claiming major performance uplifts delivered by wielding the art of handwritten assembly code. With the latest patch applied, users should see a “100x speedup” in the cross-platform open-source media transcoding application. However, the developers were soon to clarify that the 100x claim applies to just a single function, “not the whole of FFmpeg.”

“The biggest speedup I've seen so far”

Last November, we reported on an FFmpeg performance boost that could speed certain operations by up to 94x. The latest handwritten assembly patch boosts the app’s ‘rangedetect8_avx512’ performance by 100.73%. If your modern processor doesn’t support AVX512, you should still see a 65.63% uplift with the rangedetect8_avx2 code path.

Where will you feel these speed increases? In some follow-up tweets, the FFmpeg developers admit that “It's a single function that's now 100x faster, not the whole of FFmpeg.” They would later go on to elaborate that the functionality, which might enjoy a 100% speed boost, depending upon your system, was “an obscure filter.”

The obscurity of the function means it hadn’t been prioritized by the devs until now. But we also gather that the filter code was recoded using the SIMD (Single Instruction, Multiple Data) processing concept for vastly improved parallel processing on today’s powerful chips.

Evidently, compilers – programs that take higher-level language code and spit out assembly (machine) code – are still not competitive with handwritten assembly. Or you could say, “register allocator sucks on compilers,” as FFmpeg tweeted today.

(Image credit: FFmpeg)

Assembly language evangelicals

Harking back to the golden age of home computing in the 1980s and 1990s, where fixed-spec systems had lifecycles measured in half-decades - and strictly limited processing resources - handwritten assembly code optimizations played a larger part in the business of speeding up computers, games, and other software.

FFmpeg is perhaps one of the few ‘assembly evangelists’ remaining. The dev team even runs a ‘school.’

FFmpeg tools and libraries run across Linux, Mac OS X, Microsoft Windows, the BSDs, Solaris, systems, and more. One of the most popular video player software utilities, VLC, uses the libavcodec and libavformat libraries from the FFmpeg project.

Follow Tom's Hardware on Google News to get our up-to-date news, analysis, and reviews in your feeds. Make sure to click the Follow button.

TOPICS
Mark Tyson
News Editor

Mark Tyson is a news editor at Tom's Hardware. He enjoys covering the full breadth of PC tech; from business and semiconductor design to products approaching the edge of reason.

  • DS426
    Reminds me of the Voxel Space engine used in circa 1993 for the original Comanche PC game that was entirely written in Assembly as it needed to perform well even without GPU acceleration.
    Reply
  • ex_bubblehead
    There's still a lot to be said for hand optimised machine language code.
    Reply
  • bit_user
    The article said:
    Last November, we reported on an FFmpeg performance boost that could speed certain operations by up to 94x.
    That speedup was reproducible only if you compiled the totally unoptimized, generic C implementation in debug mode.

    When I tried compiling it in release mode and using clang instead of gcc, I got over 50% as fast as the hand-written assembly, without any changes to the generic C sources. Upon reading the sources, it's clear that the C could've been written more optimally, likely yielding further improvements - and I'm not even talking about using any AVX512 intrinsics!

    I will be taking a look at this latest patch, when I have a chance.

    P.S. thanks for actually linking the patch, this time. Last time, the patch wasn't in ffmpeg, but rather someone posted a slide from an x265 presentation on the ffmpeg Twitter account. Took me a while to figure that out.
    Reply
  • bit_user
    ex_bubblehead said:
    There's still a lot to be said for hand optimised machine language code.
    Oh, but they never even tried using C with intrinsics. Whenever they optimize something, they go straight to assembly. So, we don't even know how well-optimized C compares.

    In the last thread, someone claimed compilers wouldn't be smart enough to fuse two separate operations into a single AVX-512 instruction, which I subsequently demonstrated clang/LLVM doing. I've been quite impressed by its autovectorization. It won't restructure your code to be vector-friendly, but it seems to do a good job of more straight-forward vectorization tasks.

    If I get a chance to fiddle with this patch, I'll post my findings here.
    Reply
  • bit_user
    DS426 said:
    Reminds me of the Voxel Space engine used in circa 1993 for the original Comanche PC game that was entirely written in Assembly as it needed to perform well even without GPU acceleration.
    In 1993, compilers weren't nearly as sophisticated and the first PC 3D graphics accelerator cards didn't yet exist. Nvidia's NV1 didn't launch until May, 1995. Cards based on 3D Labs' Glint 300SX also showed up about the same time.

    BTW, I'm sure plenty of other 3D games from that time used assembly language. It wouldn't surprise me to learn that Wolfenstein and Doom both did. Quake was worked on by the author of the book Zen of Assembly Language. I read his columns in Dr. Dobbs Journal, back in the day, and still have a copy of his book Zen of Code Optimization floating around, somewhere.
    Reply
  • DS426
    bit_user said:
    In 1993, compilers weren't nearly as sophisticated and the first PC 3D graphics accelerator cards didn't yet exist. Nvidia's NV1 didn't launch until May, 1995. Cards based on 3D Labs' Glint 300SX also showed up about the same time.

    BTW, I'm sure plenty of other 3D games from that time used assembly language. It wouldn't surprise me to learn that Wolfenstein and Doom both did. Quake was worked on by the author of the book Zen of Assembly Language. I read his columns in Dr. Dobbs Journal, back in the day, and still have a copy of his book Zen of Code Optimization floating around, somewhere.
    Great point! It sounds wild today but probably wasn't very rare back then. As for graphics, yep, that design decision makes complete sense (I would say not even a decision as I don't think there was another feasible option) given the goings-on of that time. Even 3dfx' Voodoo accelerator didn't come out commercially until 1996.
    Reply