AMD and Intel celebrate first anniversary of x86 alliance — new security features coming to x86 CPUs

Core ultra 200S CPU
(Image credit: Intel)

AMD and Intel are celebrating one year since the formation of the x86 Ecosystem Advisory Group, an alliance designed to coordinate the evolution of the x86 instruction set architecture (ISA) and ensure that new features are supported by both leading CPU designers. In the first year, AMD and Intel have managed to ratify four new features that are set to be supported by the upcoming processors from these companies, including long-awaited memory tagging.

The new cross-vendor capabilities agreed upon by AMD and Intel are ACE (Advanced Matrix Extension) and AVX10 to enhance the performance of matrix multiplication and vector operations, as well as FRED (Flexible Return and Event Delivery) and ChkTag (x86 Memory Tagging) to reduce latency between software and hardware, as well as to detect errors like buffer overflows or use-after free bugs.

Google Preferred Source

Follow Tom's Hardware on Google News, or add us as a preferred source, to get our latest news, analysis, & reviews in your feeds.

Anton Shilov
Contributing Writer

Anton Shilov is a contributing writer at Tom’s Hardware. Over the past couple of decades, he has covered everything from CPUs and GPUs to supercomputers and from modern process technologies and latest fab tools to high-tech industry trends.

  • bit_user
    The article said:
    The new cross-vendor capabilities agreed upon by AMD and Intel are ...
    APX is conspicuous in its absence. It seems already baked into upcoming Intel cores (Nova Lake, Diamond Rapids?). I hope it doesn't get fused off, for lack of an agreed standard!

    The article said:
    With the ratification by the x86 EDA, AVX10 and AMX will be supported by AMD's next-generation processors, though we can only wonder whether this will happen with Zen 6 or already with Zen 7. Other capabilities are less well-known.
    Well, AVX10.1 should be trivial to implement, if you've already got a fairly complete AVX-512 implementation (as AMD does). So, I'd be surprised if it's not included in Zen 6.

    AMX is another can of worms, entirely. It adds a lot of bloat to each core, since it adds 8 kB of ISA register state and requires a "sea of MACs" for a worthwhile implementation. I don't foresee either Intel or AMD putting it in all of their client cores any time soon. That still leaves open the question of whether to put it in CCD chiplets, so long as AMD continues to share them between client & server, but I'd bet it's not happening for Zen 6, at the very least.

    There's a lot else they seem not to have unified on, such as the memory encryption extensions they each developed and AMD's INVLPGB, which does remote shoot-downs of pages in other cores' TLBs. It's good to see standardization on FRED, at least.
    Reply
  • Stomx
    bit_user said:
    if you've already got a fairly complete AVX-512 implementation (as AMD does). So, I'd be surprised if it's not included in Zen 6.
    Have you really seen any improvement with it more than few percents? Ian Cutress 5x speedups with AVX512 look like a hoax "Some engineer at Intel who then left Intel improved my code with AVX512, I do not know how and what he has done there..."
    Reply
  • bit_user
    Stomx said:
    Have you really seen any improvement with it more than few percents? Ian Cutress 5x speedups with AVX512 look like a hoax "Some engineer at Intel who then left Intel improved my code with AVX512, I do not know how and what he has done there..."
    It has some improvements beyond simply 2x the data width.
    https://en.wikipedia.org/wiki/AVX-512#Encoding_and_features
    It did bother me that he quoted performance results on a closed-source benchmark with such significant code differences between the AVX2 and AVX-512 cases. I really wish he'd have published the source to it, and then people could inspect it and possibly even backport some of the AVX-512 optimizations to the AVX2 code path.

    In general, no it's not a 5x difference vs AVX2. Beyond all of the standard cases that benefit from AVX2, it tends to help for code with lots of regular loops (which turns out to include things as diverse as string processing and even sorting), or when you use specialized instructions - like deep learning or cryptography.
    Reply
  • TerryLaze
    bit_user said:
    APX is conspicuous in its absence. It seems already baked into upcoming Intel cores (Nova Lake, Diamond Rapids?). I hope it doesn't get fused off, for lack of an agreed standard!
    This is about making the pool of instructions the same so that any software that is optimized runs as well as it can on whatever CPU follows these guidelines.
    APX doesn't add any instructions it just changes the way they are being used, but for the developers it should make zero difference since the compiler takes care of everything.

    APX is very much like 3d cache in that the devs don't have to change anything for it being there or not, it just improves performance if and when it can.
    Reply
  • thestryker
    From this announcement I'm mostly happy about FRED being standardized. Who knows when exactly the implementations will come but when it does everyone's will be the same.
    Reply
  • johndyson
    Stomx said:
    Have you really seen any improvement with it more than few percents? Ian Cutress 5x speedups with AVX512 look like a hoax "Some engineer at Intel who then left Intel improved my code with AVX512, I do not know how and what he has done there..."
    On my own, SIMD heavy app, I get about 20% better performance running the full, intact app, using optimized, but mixed width AVX512 code using CLANG++ vs AVX2 on my AMD 9950X. The data type is DP FP using massive amounts of SIMD. The 20% increase appears to be from moving from ymm to the zmm data widths, not from the instruction set. The app does lots of FMA operations with mixed zmm, ymm, and xmm registers. (mostly long FIR filters.) Some of the other algorithms (e.g. massive numbers of IIR) don't lend themselves to convert from ymm-type data widths to zmm (or xmm-type widths to zmm.) My inability to further parallelize some xmm operations to zmm is probably the big reason why there is only a 20% increase. (Yes, the app does real work, these results are not from a synthetic benchmark.) I do appreciate the 20% greater speed, and the AVX512 instruction set on my machine allows testing my code for other AVX512 machines. My old i10900x Intel machine, on my app, was slower when using the AVX512 instructions/SIMD widths.
    Of course, 'your mileage might vary -- A LOT'.
    Reply
  • bit_user
    TerryLaze said:
    This is about making the pool of instructions the same so that any software that is optimized runs as well as it can on whatever CPU follows these guidelines.
    APX doesn't add any instructions it just changes the way they are being used,
    First, I think that didn't factor into their decision. Its complexity might've been why they didn't ratify it in time for this announcement. Its apparent de-prioritization suggests Zen 6 won't be implementing it, but it might've come almost too late for that to happen, anyhow.

    As to your claim about it not adding new instructions, that's not how Intel characterized it.
    Intel said:
    Conditional ISA improvements: New conditional load, store and compare instructions, ...
    Optimized register state save/restore operations (PUSH2 and POP2 are two new instructions)
    A new 64-bit absolute direct jump instruction

    I'd agree that the bulk of the changes are in modifying the behavior of existing instructions, not unlike the 64-bit extensions to x86. However, even this happens via modifications to the instruction stream encoding and is absolutely visible to software.

    TerryLaze said:
    but for the developers it should make zero difference since the compiler takes care of everything.
    That's irrelevant to whether the x86 EAG needs to consider it. For instance, a mechanism like FRED is also not used by regular developers, but instead handled by the kernel.

    The reason for EAG to standardize on a feature is so that the industry doesn't need to special-case code for differing vendor-specific extensions. Even for compiler-generated code, adding more code paths has costs. Furthermore, it'd make life for toolchain and kernel developers easier to have only one version they have to support.

    TerryLaze said:
    APX is very much like 3d cache in that the devs don't have to change anything for it being there or not, it just improves performance if and when it can.
    Not true. Like any ISA extension, it needs feature-testing, to see if it's there. If so, you dispatch to a code path that utilizes it. If not, you need to fallback on a legacy path. This part is very much like how vector extensions are handled.

    FWIW, here's the entirety of how Intel summarized it:
    Intel said:
    The main features of Intel® APX include:
    16 additional general-purpose registers (GPRs) R16–R31, also referred to as Extended GPRs (EGPRs)
    in this document;
    Three-operand instruction formats with a new data destination (NDD) register for many integer
    instructions;
    Conditional ISA improvements: New conditional load, store and compare instructions, combined
    with an option for the compiler to suppress the status flags writes of common instructions;
    Optimized register state save/restore operations;
    A new 64-bit absolute direct jump instruction

    Source: https://www.intel.com/content/www/us/en/content-details/784266/intel-advanced-performance-extensions-intel-apx-architecture-specification.html
    Reply
  • bit_user
    johndyson said:
    On my own, SIMD heavy app, I get about 20% better performance running the full, intact app, using optimized, but mixed width AVX512 code using CLANG++ vs AVX2 on my AMD 9950X. The data type is DP FP using massive amounts of SIMD.
    That's a wonderful data point! Thanks for sharing!

    Yeah, Clang/LLVM has really made big strides on auto-vectorization. In the little bit I've tinkered with it, I've found that a lot of the usual tricks aren't even necessary to encourage/enable it.

    johndyson said:
    Of course, 'your mileage might vary -- A LOT'.
    Yup, definitely.
    Reply
  • TerryLaze
    bit_user said:
    Not true. Like any ISA extension, it needs feature-testing, to see if it's there. If so, you dispatch to a code path that utilizes it. If not, you need to fallback on a legacy path. This part is very much like how vector extensions are handled.
    Nope.
    For APX you just tick the box in the compiler and if APX exists it changes the legacy instructions to use it, the intel compiler takes care of everything.
    You don't need two code paths, the compiler makes any changes itself.
    https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html
    Compiler enabling is straightforward: A new REX2 prefix provides uniform access to the new registers across the legacy integer instruction set. Intel® Advanced Vector Extensions (Intel® AVX) instructions gain access via new bits defined in the existing EVEX prefix. In addition, legacy integer instructions now can also use EVEX to encode a dedicated destination register operand, turning them into three-operand instructions and reducing the need for extra register move instructions. While the new prefixes increase average instruction length, there are 10% fewer instructions in code compiled with Intel APX,2 resulting in similar code density as before.
    Reply
  • Xajel
    Before this alliance, intel started a project to "clean up" x86, by removing compatibility of older x86 that is not used anymore. But they scrapped that with is this alliance which is expectable as it's a major move that must be done with AMD as x86 is not intel-only now.

    But there's no news about this, I guess they can remove these to simplify the ISA and make it more efficient but they can emulate it as well to retain compatibility without any issues.
    Reply