Intel Releases Fast AV1 Video Encoder for CPUs

AOMedia
(Image credit: AOMedia)

Being one of the Alliance for Open Media founding members, Intel has done a lot to promote the AV1 codec and make it more accessible to content makers/providers and end-users. Intel was the first to offer hardware AV1 decoding with its Xe-LP GPUs in 2020. This week it released version 1.0 of its speedy open-source Scalable Video Technology AV1 encoder and decoder for CPUs. SVT-AV1 works with all modern processors. 

The AV1 open-source video codec was designed for ultra-high-def resolutions, wide color gamut, and high-dynamic range enhancements. AOMedia said in 2018 that its AV1 was 30% more efficient than existing codecs (implying primarily on H.265/HEVC that is designed for similar 4K+ content), which is a big deal. But one problem with highly-efficient codecs is that they are extremely hungry for resources and generally require hardware acceleration to work correctly. Meanwhile, modern CPUs have a lot of resources and new instructions that can be applied to decoding and encoding, which is precisely what SVT-AV1 does. 

SVT-AV1 is a scalable standard-agnostic encoder/decoder library that can take advantage of the multi-core nature of modern CPUs and AVX2 instructions. The SVT-AV1 also adds further AVX2 optimizations to boost performance, image quality improvements, fast-decode for more preset levels, and S-frames support, reports Phoronix

Intel's SVT-AV1 libraries are supported on modern x86 machines (Intel 5th Generation Core 'Broadwell' and higher) running Apple's macOS, Microsoft's Windows, and Linux. 

Intel and Netflix initially started the SVT-AV1 project to develop a production-quality AV1-encoder with performance levels applicable to various applications, from premium video-on-demand to real-time and live encoding/transcoding. In August 2020, the SVT-AV1 encoded/decode library was adopted by AOMedia's Software Implementation Working Group (SIWG) to make AV1 more popular. The SVT-AV1 version 1.0 release marks a milestone in the development of the encoder/decoder libraries. 

Version 1.0 of SVT-AV1 encoder/decoder libraries is a milestone and good news for content creators and end-users. However, for companies like Netflix, Intel now offers Arctic Sound-M accelerators based on DG2 silicon that can handle eight simultaneous 4K streams and supports hardware-accelerated AV1 encoding and decoding.

(Image credit: Igor's Lab)

The single-tile Intel Arctic Sound 1T features an Xe-HP GPU with 384 EUs and 16GB of HBM2E memory, offering a peak bandwidth of up to 716 GB/s (which probably means that we are dealing with two stacks of HBM2E that use a 2048-bit interface). The accelerator is a short single-slot full-height card rated for a 150W TDP.  

Intel's Arctic Sound 2T card carries an Xe-HP GPU with two tiles, 960 EUs (480×2 to be more accurate), and 32GB of HBM2E DRAM. The accelerator uses a full-length, full-height (FLFH) form factor and is rated for a 300W TDP delivered using one eight-pin power connector. (One thing to keep in mind is that IgorsLab edited the images of the cards to protect the source.)

(Image credit: Intel)

Intel's Xe-HP architecture is a far cry from the company's Xe-LP architecture we know from the Iris Xe consumer-grade GPUs. The Xe-HP card supports more floating-point formats (e.g., FP16, FP32, FP64 for general purpose, bfloat16 format for AI/ML computing), more compute-specific instructions, DP4A convolution instruction for deep learning, and Intel's XMX extensions.

The datacenter-oriented Xe-HP GPUs use all-new execution units (EUs) with various IPC improvements, feature HBM2E memory support, and are made using Intel's performance-optimized 10nm SuperFin process technology. In short, the Xe-HP is not the Xe-LP or Xe-HPG on steroids, but something completely different. 

(Image credit: Intel)

Intel now allowed some of its customers to preview its Arctic Sound compute cards carrying single-tile and dual-tile Xe-HP implementations. Intel announced a quad-tile Xe-HP implementation last year and even demonstrated one of such accelerators in action offering over 42 FP32 TFLOPS of performance. However, the company is not ready to sample it right now or is only sampling it with select customers.

Intel's Xe-HP plans are not completely clear as the company has never detailed them. Meanwhile, the EU count of these two cards is somewhat lower than expected (assuming that one Xe-HP tile features 512 EUs). At present, we don't know how old these cards are and which configurations Intel plans to ship.

Anton Shilov
Contributing Writer

Anton Shilov is a contributing writer at Tom’s Hardware. Over the past couple of decades, he has covered everything from CPUs and GPUs to supercomputers and from modern process technologies and latest fab tools to high-tech industry trends.