Intel's New AVX10 Brings AVX-512 Capabilities to E-Cores
Future hybrid chips will have the goodness of AVX-512 features on both P- and E-cores.
Intel posted its new APX (Advanced Performance Extensions) today and also disclosed the new AVX10 [at the bottom of this page] that will bring unified support for AVX-512 capabilities to both P-Cores and E-Cores for the first time. This evolution of the AVX instruction set will help Intel sidestep the severe issues it encountered with its new x86 hybrid architecture found in the Alder and Raptor Lake processors.
However, the new AVX10 ISA won't be supported with Intel's current-gen CPUs — it's slated to arrive in future chips. Intel says that AVX10 will be its vector ISA of choice moving into the future for both consumer and server processors.
Intel AVX10 (Advanced Instruction Extensions 10)
At its most basic level, AVX10 will allow Intel's chips that have both E-cores and P-cores to still support AVX-512, though 512-bit instructions can only run on P-cores. Meanwhile, converged 256-bit AVX10 instructions can run on either the p-cores or e-cores, thus allowing the full chip to still have support for AVX-512 capabilities.
As such, Intel won't have to disable support for 512-bit vectors as it did when it disabled AVX-512 for both Alder Lake and Raptor Lake.
Diving deeper, the AVX10 (Advanced Instruction Extensions 10) ISA is a superset of AVX-512 and comes with all of the features of the AVX-512 ISA for processors with both 256-bit and 512-bit vector register sizes.
The converged AVX10 ISA will include "AVX-512 vector instructions with an AVX512VL feature flag, a maximum vector register length of 256 bits, as well as eight 32-bit mask registers and new versions of 256-bit instructions supporting embedded rounding," and this version will run on both p-cores and e-cores.
However, the e-cores will be limited to the converged AVX10's maximum 256-bit vector length, while P-cores can use 512-bit vectors. This feels akin to Arm's support for variable vector widths with SVE.
Intel says that existing applications will provide the same level of performance with AVX10 as they did with AVX-512, at least at the same vector lengths. Intel also claims:
- Intel AVX2-compiled applications, re-compiled to Intel AVX10, should realize performance gains without the need for additional software tuning.
- Intel AVX2 applications sensitive to vector register pressure will gain the most performance due to the 16 additional vector registers and new instructions.
- Highly-threaded vectorizable applications are likely to achieve higher aggregate throughput when running on E-core-based Intel Xeon processors or on Intel products with performance hybrid architecture.
Intel will support AVX10 version 1 (AVX10.1) beginning with its sixth-gen Xeon "Granite Rapids" chips, but that generation will only support 512-bit vector instructions, and not the new converged 256-bit vector instructions. Instead, this first gen will serve as the transition chip from AVX-512 to AVX10.
Chips arriving after Granite Rapids will support AVX10.2, which adds support for the converged 256-bit vector lengths and other new features, like new AI data types and conversions, data movement optimizations, and standards support. All future Xeon processors will continue fully supporting all AVX-512 instructions to ensure that legacy apps function normally.
To address developer feedback (obviously negative), Intel also plans to significantly simplify its AVX10 enumeration methods compared to AVX-512. Intel also plans to ensure that each move to a new AVX10 revision has enough new instructions and capabilities to merit a change, thus reducing version and enumeration bloat.
Intel will freeze the AVX-512 ISA when AVX10 debuts, and all future use of AVX-512 instructions will occur through the AVX10 ISA. Meanwhile, the new AMX will be unimpacted.
Stay On the Cutting Edge: Get the Tom's Hardware Newsletter
Get Tom's Hardware's best news and in-depth reviews, straight to your inbox.
Intel APX (Advanced Performance Extensions)
Intel also announced the new APX (Advanced Performance Extensions) today (not to be confused with the old-school iAPX 432).
Intel claims APX-compiled code contains 10% fewer loads and 20% fewer stores than the same code compiled for an Intel 64 baseline. Intel also says that register accesses are both faster and consume significantly less dynamic power than complex load and store operations. Interestingly, the new APX finds a new use for the 128B area that was left unused when Intel abandoned MPX back in 2019, and repurposes it for XSAVE.
Here are APX's top-level features:
- 16 additional general-purpose registers (GPRs) R16–R31, also referred to as Extended GPRs (EGPRs) in this document
- Three-operand instruction formats with a new data destination (NDD) register for many integer instructions
- Conditional ISA improvements: New conditional load, store and compare instructions, combined with an option for the compiler to suppress the status flags writes of common instructions
- Optimized register state save/restore operations
- A new 64-bit absolute direct jump instruction
Intel claims it has implemented APX in such a way that it will not impact the silicon area or power consumption of the CPU core. You can read much more about APX here, and Intel has a list of resources for both APX and AVX10 at the bottom of the linked page.
APX and AVX10 come on the heels of Intel's recent announcement that it is investigating slimming down the Intel 64 architecture to a simplified version of x86 named x86S.
Paul Alcorn is the Managing Editor: News and Emerging Tech for Tom's Hardware US. He also writes news and reviews on CPUs, storage, and enterprise hardware.
-
Kamen Rider Blade Is it me, or should this have been called AVX 4 instead of AVX 10?Reply
There was AVX or
AVX 1
AVX 2 (Mostly 256b)
AVX 3 (512b Extensions)
This update should've been called AVX 4? -
cyrusfox I am just glad they have a strategy to unify ISA with the new hybrid chips. Seems really myopic this wasn't solved ages ago. So horrible to have to fuse off capable silicon features due to not working out the details how small and large cores would work together.Reply -
Kamen Rider Blade said:Is it me, or should this have been called AVX 4 instead of AVX 10?
There was AVX or
AVX 1
AVX 2 (Mostly 256b)
AVX 3 (512b Extensions)
This update should've been called AVX 4?
There is no such thing as AVX3. We only have AVX, AVX2 and AVX-512 x86 ISAs. AVX10 is just a superset of AVX-512. AVX10 will enable AVX-512 capabilities across both Performance and Efficient core designs with hybrid processors.
AVX10 contains all the richness of AVX-512 and additional features/capabilities while being able to work for both P and E cores, respectively.
Edit:
AVX10 actually has 2 subsets, AVX10/256, similar to AVX2, and AVX10/512 which is similar to AVX-512. -
thestryker
I think this is mostly the manufacturing delays coming into play. RPL was never originally supposed to exist, and now we're getting a refresh of it.cyrusfox said:I am just glad they have a strategy to unify ISA with the new hybrid chips. Seems really myopic this wasn't solved ages ago. So horrible to have to fuse off capable silicon features due to not working out the details how small and large cores would work together.
I'm curious if MTL has any implementation (like ADL did with AVX 512) since the cores in it and Granite Rapids are the same. -
hotaru251 didnt ppl get workarounds to run avx-512 on their hybrid chips then intel said "no" and shut that method down?Reply -
TerryLaze
Well not really, you had to turn the hybrid into a classic CPU by turning off the e-cores and for intel it was more important for people to get used to the hybrid approach then for them to get avx-512.hotaru251 said:didnt ppl get workarounds to run avx-512 on their hybrid chips then intel said "no" and shut that method down?
It won't be unified, the e-cores will still only be able to do avx-256 and they will have the thread director or whatever make it work.cyrusfox said:I am just glad they have a strategy to unify ISA with the new hybrid chips. Seems really myopic this wasn't solved ages ago. So horrible to have to fuse off capable silicon features due to not working out the details how small and large cores would work together.
They could have done this on older CPUs, they could do this now on all hybrid CPUs.
Diving deeper, the AVX10 (Advanced Instruction Extensions 10) ISA is a superset of AVX-512 and comes with all of the features of the AVX-512 ISA for processors with both 256-bit and 512-bit vector register sizes.
-
Findecanor
This is not really an extension of AVX-512, but rather a step back and a start-over.Kamen Rider Blade said:AVX 3 (512b Extensions)
This update should've been called AVX 4?
Many in the programmer community have been asking for CPUs with a short-vector version of AVX-512 for a while, often using the provisional name "AVX-256".
The set isn't just about extending the width and number of registers. For instance, the use of boolean vectors for conditional load/store per lane is a big thing. -
bit_user
The problem is that you think too logically. These things are decided by marketing people, and they probably feel like "AVX-512" sounds a lot like AVX5. So, they want to call it AVX10 to make it sound way better (even though it's not).Kamen Rider Blade said:Is it me, or should this have been called AVX 4 instead of AVX 10?
There was AVX or
AVX 1
AVX 2 (Mostly 256b)
AVX 3 (512b Extensions)
This update should've been called AVX 4?
I mean, why did Nvidia go from 700, 900, 1000, 2000, 3000, 4000? It's because "one better" doesn't sound like much, once you get above 10. You want each generation to sound a lot better, even if it's not (as in this case).
Oh, and by the way, I wouldn't even call it AVX4. If we're being logical, then 10.1 is basically just a way to tell software whether the AVX registers are 256 bits or 512 bits, apart from whether or not the AVX-512 instruction set is itself supported. 10.1 really doesn't add any real functionality that doesn't already exist in AVX-512. -
bit_user
No, you had it right the first time. As it stands today, AVX-512 instructions can operate on 128 bit, 256 bit, or 512 bit operands. AVX10.1 is just rebranding AVX-512, while adding an additional variable to indicate whether the implementation supports all 3 operand sizes, or whether it supports only the first two.Metal Messiah. said:AVX10 actually has 2 subsets, AVX10/256, similar to AVX2, and AVX10/512 which is similar to AVX-512. -
bit_user
Really??? Source?thestryker said:RPL was never originally supposed to exist,
Yeah, good question... except that Meteor Lake's CPU tile is slated for the Intel 4 process node, while Granite Rapids is slated for Intel 3. I know the nodes are similar, but I don't know if their layout-compatible.thestryker said:I'm curious if MTL has any implementation (like ADL did with AVX 512) since the cores in it and Granite Rapids are the same.