Intel's Arrow Lake for Desktops and Laptops Will Have Different Instruction Sets

Intel Meteor Lake
(Image credit: Intel)

It is not uncommon for server processors to support instructions that are not supported by client CPUs. But it looks like Arrow Lake S processors for desktops will support instructions that will not be supported by Arrow Lake CPUs for laptops, as noticed by @InstLatX64.

As it turns out, Arrow Lake processors in LGA1851 packaging will support such instructions as AVX-VNNI-INT16, SHA512, SM3, and SM4. In addition, the CPU will support an LBR Event Logging feature. The exact reasons why Intel decided not to implement these features into mobile parts is unclear, but it is possible that the company could not add support because ultra-low-power x86 cores in the SoC die do not support these instructions and therefore they will not be enabled on the compute die as well. 

Meanwhile, the new instructions may actually be missed on the mobile parts. Intel's AVX-VNNI-INT16 are Vector Neural Network Instructions with 16-bit integer data types designed specifically to accelerate convolutional neural network (CNN) and deep learning workloads, which should be quite handy for generative AI applications.

As for other instructions, SHA512, SM3, and SM4 are cryptographic technologies meant to accelerate appropriate algorithms, and given that there are always security concerns, these additions will also be welcome. It should be noted, though, that SM3 and SM4 are primarily used in China.

As far as Intel's Last Branch Record (LBR) feature is concerned, this is a debugging and performance tuning feature supported by some of its processors. LBR keeps a record of the processor's recently executed branches, including addresses of the branch and target instructions. This information helps developers understand program execution flow, identify performance bottlenecks, and analyze speculative execution side-channel attacks like Spectre and Meltdown.

Although mobile Arrow Lake CPUs may not support these instructions supported by desktop versions of these processors, it is likely that eventually, Intel's mobile processors will gain support for things like AVX-VNNI-INT16, SHA512, SM3, and SM4, after more software makers start using them.

Anton Shilov
Contributing Writer

Anton Shilov is a contributing writer at Tom’s Hardware. Over the past couple of decades, he has covered everything from CPUs and GPUs to supercomputers and from modern process technologies and latest fab tools to high-tech industry trends.

  • bit_user
    If you look at the table in the Tweet that the article references, the distinction seems to follow the generation of E-core. For instance Sierra Forest matches Arrow Lake (mobile), while Clearwater Forest matches Arrow Lake-S. The difference isn't that big, however. Just a couple crypto instructions and AVX-VNNI-INT16 - which doesn't seem like much of a game-changer for desktops. You'll still want a dGPU for any serious inferencing.

    This E-core revelation is very interesting. It could indicate that Arrow Lake (mobile) will use Intel 3, rather than Intel 20A!

    Intel's Arrow Lake S to support new AMX, AVX, SHA512, and SM instructions.
    I don't know why the article's subtitle said anything about AMX, as there's no indication that it does. Furthermore, the mention of AVX got my hopes up that they were doing AVX10, but it seems not.
    Reply
  • DavidC1
    bit_user said:
    This E-core revelation is very interesting. It could indicate that Arrow Lake (mobile) will use Intel 3, rather than Intel 20A!
    It does not.

    Just based on the chart then it would indicate Lunar Lake is also on Intel 3 based on your logic, and at the same time on Intel 18A because it's also on the same chart as Clearwater Forest.

    Arrowlake S doesn't have the LP E cores while mobile versions do hence why they need to disable the instructions on mobile(since LP E cores are still based on Crestmont), because the LP E cores doesn't support the latest instructions.

    Arrowlake - N3 and 20A
    Lunarlake - N3
    Reply
  • bit_user
    DavidC1 said:
    Just based on the chart then it would indicate Lunar Lake is also on Intel 3 based on your logic, and at the same time on Intel 18A because it's also on the same chart as Clearwater Forest.
    Maybe I didn't state that very clearly, because there's no inconsistency in my interpretation.

    If you take another look, you'll see that Lunar Lake gets all the features of Arrow Lake (mobile), Arrow Lake S, and some additional features too. That's because it's new enough to inherit everything, by virtue of having yet newer-generation cores.

    DavidC1 said:
    Arrowlake S doesn't have the LP E cores while mobile versions do hence why they need to disable the instructions on mobile(since LP E cores are still based on Crestmont), because the LP E cores doesn't support the latest instructions.
    Okay, fair point. So, the culprit is likely the SoC tile of Arrow Lake (mobile) being a hand-me-down from Meteor Lake. Makes sense.
    Reply
  • Lucky_SLS
    Are those instruction sets Intel specific? like the AVX-VNNI-INT16? which ones are used by AMD and Nvidia in their accelerators?
    Reply
  • bit_user
    Lucky_SLS said:
    Are those instruction sets Intel specific? like the AVX-VNNI-INT16? which ones are used by AMD and Nvidia in their accelerators?
    They're x86-64 instructions, so they have nothing to do with any GPUs.

    I don't know what sort of arrangement Intel and AMD might have (or not), regarding new x86-64 ISA extensions. AMD has done a pretty good job of keeping just a couple generations behind Intel. I don't consider it a given that AMD will necessarily implement everything Intel does, however. Especially big things, like AMX.
    Reply
  • thestryker
    The most interesting thing in that chart to me is Panther Lake with FRED.
    Reply
  • NinoPino
    x86 ISA is already fragmented as hell and this gets worse the situation of Intel CPUs. Programming for Intel is becoming an increasing nightmare.
    A really good choice in AMD CPUs is that of not fragment theiistruction set.
    It seems that Intel is doing all it can to kill x86 ISA.
    Reply
  • TerryLaze
    NinoPino said:

    A really good choice in AMD CPUs is that of not fragment theiistruction set.
    It seems that Intel is doing all it can to kill x86 ISA.
    So you are telling us that AMD CPUs are still using the x86 ISA from 1970 without any modification?!
    CPUs are constantly evolving and every gen has, or at least can have, different extra instructions on it. AMD is no different here, as they get access to newer instructions they implement them in their CPUs as well.
    AMD was the one that came up with x86_64 which was the biggest segmentation of x86 EVER.
    Reply
  • NinoPino
    TerryLaze said:
    So you are telling us that AMD CPUs are still using the x86 ISA from 1970 without any modification?!
    CPUs are constantly evolving and every gen has, or at least can have, different extra instructions on it. AMD is no different here, as they get access to newer instructions they implement them in their CPUs as well.
    AMD was the one that came up with x86_64 which was the biggest segmentation of x86 EVER.
    I'm obviously telling that in AMD CPUs, for every generetion, all market segments have the same istructions and every generation have a superset of the previous one.
    In Intel CPUs, from years server and desktop have different istructions, recently servers have also extension for matrix, crypto and so on. On recent desktop the same CPU have cores with different set of istructions (performance/efficiency).
    Look at the mess done in the years with MMX, SSE, AVX.
    AVX is so problematic that now Intel itself need a new specification (AVX10) not to expand, but to group AVX istructions.
    x86_64 (or AMD64 as originally called) is not a fragmentation but an expansion of x86 ISA, but I bet you know the difference. From my POV it was a very well done expansion considering the AMD resources and other factors.
    Reply
  • WebDevDeadDrop
    NinoPino said:
    I'm obviously telling that in AMD CPUs, for every generetion, all market segments have the same istructions and every generation have a superset of the previous one.
    In Intel CPUs, from years server and desktop have different istructions, recently servers have also extension for matrix, crypto and so on. On recent desktop the same CPU have cores with different set of istructions (performance/efficiency).
    Look at the mess done in the years with MMX, SSE, AVX.
    AVX is so problematic that now Intel itself need a new specification (AVX10) not to expand, but to group AVX istructions.
    x86_64 (or AMD64 as originally called) is not a fragmentation but an expansion of x86 ISA, but I bet you know the difference. From my POV it was a very well done expansion considering the AMD resources and other factors.
    AMD does not in fact have the same instruction set in every generation across market segments, this is trivially provable if you actually look at AMD specs. Different segments can and do have different feature sets. You can't use iGPU and decoding instructions on CPUs that don't have integrated graphics, you can't use certain AI and and security processor instructions on CPUs that don't have them, and AMD Pro CPUs have different instruction sets than non pro CPUs for various management tasks and more importantly for the people who want to buy them disabled certain instructions from being run. And yes, AMD has depreciated instructions so no, not every AMD instruction set is a superset of some other instruction set.

    AVX is not an Intel specific instruction set and AMD has the same fragmentation property for AVX as Intel across their supported products. What Intel has that AMD doesn't is CPU internally disaggregated instruction sets for AVX because some cores support some things and others don't. That's the purpose of AVX10. To create one instruction set with different features that can be present or not instead of AVX, AVX2, and AVX-512 and all the various additions to each being separate instruction sets. And that's something that will benefit AMD as well if they want to implement it because it would simplify their own instruction set situation allowing more rapid feature integration as well as simplify their own instruction aggregation that comes from doing AVX-512 by ganging AVX2 units.
    Reply