AMD and Intel celebrate first anniversary of x86 alliance — new security features coming to x86 CPUs
AMD and Intel are celebrating one year since the formation of the x86 Ecosystem Advisory Group, an alliance designed to coordinate the evolution of the x86 instruction set architecture (ISA) and ensure that new features are supported by both leading CPU designers. In the first year, AMD and Intel have managed to ratify four new features that are set to be supported by the upcoming processors from these companies, including long-awaited memory tagging.
The new cross-vendor capabilities agreed upon by AMD and Intel are ACE (Advanced Matrix Extension) and AVX10 to enhance the performance of matrix multiplication and vector operations, as well as FRED (Flexible Return and Event Delivery) and ChkTag (x86 Memory Tagging) to reduce latency between software and hardware, as well as to detect errors like buffer overflows or use-after free bugs.
Intel's Granite Rapids processors already support AVX10.1 and AMX, whereas Sapphire Rapids were first to support AMX instructions. With the ratification by the x86 EDA, AVX10 and AMX will be supported by AMD's next-generation processors, though we can only wonder whether this will happen with Zen 6 or already with Zen 7. Other capabilities are less well-known.
Intel introduced FRED publicly in 2023, and by now, the capability is well-documented in developer documentation. The technology is described as a replacement for traditional x86 interrupt and exception mechanisms, so ultimately it is designed to simplify context switches, reduce latency, improve performance, and security when working with operating systems that support it.
FRED speeds up how the CPU switches between user mode (ring 3) and kernel mode (ring 0) with a hardware-defined entry and exit path. While this does not sound too impressive, replacing the old x86 mechanism (which uses the Interrupt Descriptor Table and IRET) is a big deal. At present, every time an application interacts with the OS (which happens millions of times per second), the CPU must switch between user mode and kernel mode, which introduces fairly high latencies with today's machines. Since the traditional IDT and IRET mechanisms are software-managed, while FRED provides a hardware-defined and verified entry and return path, replacing the former with the latter also improves reliability and security, in addition to performance
Up until today, AMD's stance on FRED was unclear, but now that the feature is recognized by the x86 EAG as a cross-vendor capability, it will be added to AMD's platforms over time.
Perhaps the most interesting addition to the list of cross-vendor x86 EAG features is the ChkTag (x86 Memory Tagging) capability, which has not been widely discussed before. The feature is added to catch memory safety errors — problems like buffer overflows, use-after-free, and out-of-bounds memory access — directly in hardware. Memory tagging is rapidly becoming a standard feature in modern CPUs as it is valuable (can catch a variety of bugs in hardware) and easy to implement, which is why modern processors from Apple and Ampere now support Arm's MTE technology.
Get Tom's Hardware's best news and in-depth reviews, straight to your inbox.
It is hard to say when AMD and Intel plan to implement ChkTag (x86 Memory Tagging) in their processors. The announcement by the x86 Ecosystem Advisory Group signals both are committed to supporting this feature, but there is no obligation to implement it within a certain timeframe. Meanwhile, hardware changes of this depth typically require building them into the CPU microarchitecture itself, so expect support of FRED and ChkTag to come several years down the road.
Follow Tom's Hardware on Google News, or add us as a preferred source, to get our latest news, analysis, & reviews in your feeds.

Anton Shilov is a contributing writer at Tom’s Hardware. Over the past couple of decades, he has covered everything from CPUs and GPUs to supercomputers and from modern process technologies and latest fab tools to high-tech industry trends.
-
bit_user Reply
APX is conspicuous in its absence. It seems already baked into upcoming Intel cores (Nova Lake, Diamond Rapids?). I hope it doesn't get fused off, for lack of an agreed standard!The article said:The new cross-vendor capabilities agreed upon by AMD and Intel are ...
Well, AVX10.1 should be trivial to implement, if you've already got a fairly complete AVX-512 implementation (as AMD does). So, I'd be surprised if it's not included in Zen 6.The article said:With the ratification by the x86 EDA, AVX10 and AMX will be supported by AMD's next-generation processors, though we can only wonder whether this will happen with Zen 6 or already with Zen 7. Other capabilities are less well-known.
AMX is another can of worms, entirely. It adds a lot of bloat to each core, since it adds 8 kB of ISA register state and requires a "sea of MACs" for a worthwhile implementation. I don't foresee either Intel or AMD putting it in all of their client cores any time soon. That still leaves open the question of whether to put it in CCD chiplets, so long as AMD continues to share them between client & server, but I'd bet it's not happening for Zen 6, at the very least.
There's a lot else they seem not to have unified on, such as the memory encryption extensions they each developed and AMD's INVLPGB, which does remote shoot-downs of pages in other cores' TLBs. It's good to see standardization on FRED, at least. -
Stomx Reply
Have you really seen any improvement with it more than few percents? Ian Cutress 5x speedups with AVX512 look like a hoax "Some engineer at Intel who then left Intel improved my code with AVX512, I do not know how and what he has done there..."bit_user said:if you've already got a fairly complete AVX-512 implementation (as AMD does). So, I'd be surprised if it's not included in Zen 6. -
bit_user Reply
It has some improvements beyond simply 2x the data width.Stomx said:Have you really seen any improvement with it more than few percents? Ian Cutress 5x speedups with AVX512 look like a hoax "Some engineer at Intel who then left Intel improved my code with AVX512, I do not know how and what he has done there..."
https://en.wikipedia.org/wiki/AVX-512#Encoding_and_features
It did bother me that he quoted performance results on a closed-source benchmark with such significant code differences between the AVX2 and AVX-512 cases. I really wish he'd have published the source to it, and then people could inspect it and possibly even backport some of the AVX-512 optimizations to the AVX2 code path.
In general, no it's not a 5x difference vs AVX2. Beyond all of the standard cases that benefit from AVX2, it tends to help for code with lots of regular loops (which turns out to include things as diverse as string processing and even sorting), or when you use specialized instructions - like deep learning or cryptography. -
TerryLaze Reply
This is about making the pool of instructions the same so that any software that is optimized runs as well as it can on whatever CPU follows these guidelines.bit_user said:APX is conspicuous in its absence. It seems already baked into upcoming Intel cores (Nova Lake, Diamond Rapids?). I hope it doesn't get fused off, for lack of an agreed standard!
APX doesn't add any instructions it just changes the way they are being used, but for the developers it should make zero difference since the compiler takes care of everything.
APX is very much like 3d cache in that the devs don't have to change anything for it being there or not, it just improves performance if and when it can. -
thestryker From this announcement I'm mostly happy about FRED being standardized. Who knows when exactly the implementations will come but when it does everyone's will be the same.Reply -
johndyson Reply
On my own, SIMD heavy app, I get about 20% better performance running the full, intact app, using optimized, but mixed width AVX512 code using CLANG++ vs AVX2 on my AMD 9950X. The data type is DP FP using massive amounts of SIMD. The 20% increase appears to be from moving from ymm to the zmm data widths, not from the instruction set. The app does lots of FMA operations with mixed zmm, ymm, and xmm registers. (mostly long FIR filters.) Some of the other algorithms (e.g. massive numbers of IIR) don't lend themselves to convert from ymm-type data widths to zmm (or xmm-type widths to zmm.) My inability to further parallelize some xmm operations to zmm is probably the big reason why there is only a 20% increase. (Yes, the app does real work, these results are not from a synthetic benchmark.) I do appreciate the 20% greater speed, and the AVX512 instruction set on my machine allows testing my code for other AVX512 machines. My old i10900x Intel machine, on my app, was slower when using the AVX512 instructions/SIMD widths.Stomx said:Have you really seen any improvement with it more than few percents? Ian Cutress 5x speedups with AVX512 look like a hoax "Some engineer at Intel who then left Intel improved my code with AVX512, I do not know how and what he has done there..."
Of course, 'your mileage might vary -- A LOT'. -
bit_user Reply
First, I think that didn't factor into their decision. Its complexity might've been why they didn't ratify it in time for this announcement. Its apparent de-prioritization suggests Zen 6 won't be implementing it, but it might've come almost too late for that to happen, anyhow.TerryLaze said:This is about making the pool of instructions the same so that any software that is optimized runs as well as it can on whatever CPU follows these guidelines.
APX doesn't add any instructions it just changes the way they are being used,
As to your claim about it not adding new instructions, that's not how Intel characterized it.
Intel said:Conditional ISA improvements: New conditional load, store and compare instructions, ...
Optimized register state save/restore operations (PUSH2 and POP2 are two new instructions)
A new 64-bit absolute direct jump instruction
I'd agree that the bulk of the changes are in modifying the behavior of existing instructions, not unlike the 64-bit extensions to x86. However, even this happens via modifications to the instruction stream encoding and is absolutely visible to software.
That's irrelevant to whether the x86 EAG needs to consider it. For instance, a mechanism like FRED is also not used by regular developers, but instead handled by the kernel.TerryLaze said:but for the developers it should make zero difference since the compiler takes care of everything.
The reason for EAG to standardize on a feature is so that the industry doesn't need to special-case code for differing vendor-specific extensions. Even for compiler-generated code, adding more code paths has costs. Furthermore, it'd make life for toolchain and kernel developers easier to have only one version they have to support.
Not true. Like any ISA extension, it needs feature-testing, to see if it's there. If so, you dispatch to a code path that utilizes it. If not, you need to fallback on a legacy path. This part is very much like how vector extensions are handled.TerryLaze said:APX is very much like 3d cache in that the devs don't have to change anything for it being there or not, it just improves performance if and when it can.
FWIW, here's the entirety of how Intel summarized it:
Intel said:The main features of Intel® APX include:
16 additional general-purpose registers (GPRs) R16–R31, also referred to as Extended GPRs (EGPRs)
in this document;
Three-operand instruction formats with a new data destination (NDD) register for many integer
instructions;
Conditional ISA improvements: New conditional load, store and compare instructions, combined
with an option for the compiler to suppress the status flags writes of common instructions;
Optimized register state save/restore operations;
A new 64-bit absolute direct jump instruction
Source: https://www.intel.com/content/www/us/en/content-details/784266/intel-advanced-performance-extensions-intel-apx-architecture-specification.html -
bit_user Reply
That's a wonderful data point! Thanks for sharing!johndyson said:On my own, SIMD heavy app, I get about 20% better performance running the full, intact app, using optimized, but mixed width AVX512 code using CLANG++ vs AVX2 on my AMD 9950X. The data type is DP FP using massive amounts of SIMD.
Yeah, Clang/LLVM has really made big strides on auto-vectorization. In the little bit I've tinkered with it, I've found that a lot of the usual tricks aren't even necessary to encourage/enable it.
Yup, definitely.johndyson said:Of course, 'your mileage might vary -- A LOT'. -
TerryLaze Reply
Nope.bit_user said:Not true. Like any ISA extension, it needs feature-testing, to see if it's there. If so, you dispatch to a code path that utilizes it. If not, you need to fallback on a legacy path. This part is very much like how vector extensions are handled.
For APX you just tick the box in the compiler and if APX exists it changes the legacy instructions to use it, the intel compiler takes care of everything.
You don't need two code paths, the compiler makes any changes itself.
https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.htmlCompiler enabling is straightforward: A new REX2 prefix provides uniform access to the new registers across the legacy integer instruction set. Intel® Advanced Vector Extensions (Intel® AVX) instructions gain access via new bits defined in the existing EVEX prefix. In addition, legacy integer instructions now can also use EVEX to encode a dedicated destination register operand, turning them into three-operand instructions and reducing the need for extra register move instructions. While the new prefixes increase average instruction length, there are 10% fewer instructions in code compiled with Intel APX,2 resulting in similar code density as before.
-
Xajel Before this alliance, intel started a project to "clean up" x86, by removing compatibility of older x86 that is not used anymore. But they scrapped that with is this alliance which is expectable as it's a major move that must be done with AMD as x86 is not intel-only now.Reply
But there's no news about this, I guess they can remove these to simplify the ISA and make it more efficient but they can emulate it as well to retain compatibility without any issues.