Next-gen Intel Arrow Lake-S CPU spotted with 24-threads and no AVX-512 functionality

Intel Core CPU
(Image credit: Intel)

InstLatX64 on X (formerly Twitter) spotted a new Intel test system featuring an Arrow Lake-S engineering sample. The CPU inside the test system features 24 threads, a 3GHz frequency, and lacks AVX512 (though it could be disabled in firmware). This new Arrow Lake test machine confirms that Intel is in the middle of testing Arrow Lake and preparing it for launch later this year.

There are a few interesting tidbits about this new Arrow Lake chip: one is the thread count. At 24 threads, the Arrow Lake chip has two more threads than the most potent Core Ultra 9 processor you can buy today, seemingly confirming that Arrow Lake will sport at least two more threads than Meteor Lake at least for the flagship model. What's even more interesting is that those 24 threads could be 24 actual cores, if rumors are true about Arrow Lake ditching hyperthreading altogether.

Another interesting tidbit is the lack of AVX-512 on this particular Arrow Lake sample. InstLatX64 specifically reports that the chip comes "without" AVX-512 but it is reasonable to assume that the instruction set might be disabled in the chip's firmware or motherboard UEFI. Nonetheless, there's a chance Arrow Lake might not come with AVX-512 and if that is true, it means that mainstream users won't have an easily accessible AVX-512 platform again and will be forced to move to other platforms like AMD's Ryzen 7000 series CPUs or Intel's latest workstation Xeon CPUs to access the feature.

However, this feature segmentation is not likely to last forever. Intel has plans to bring AVX10 and AVX-512 to its future E-core CPU architectures, which will enable Intel to re-activate AVX-512 functionality in future products.

Arrow Lake-S is Intel's next-generation desktop CPU architecture that will reportedly replace its outgoing Raptor Lake Refresh CPU lineup later this year. Intel's future desktop CPUs will allegedly bring the tile-based Meteor Lake design philosophy to the desktop market with some additional modifications to improve performance. Arrow Lake will come with a newer more enhanced Intel 20A process node that will introduce RibbonFET gate-all-round transistors and a backside power delivery technology called PowerVia to Arrow Lake processors. These two features are expected to help enhance Arrow Lake's IPC performance Meteor Lake, since Meteor Lake itself is merely a more power-efficient equivalent to Raptor Lake on the CPU side.

Arrow Lake is on track to launch sometime in the second half of 2024 along with Lunar Lake. Arrow Lake will focus on the desktop market while Lunar Lake will target mobile devices just like Meteor Lake. This marks the first architectural split between Intel's desktop and mobile segments for the first time in history since the introduction of hybrid x86 CPUs back in 2021. It won't be until Panther Lake arrives in 2025 that both segments will be treated by a singular architecture once again.

Aaron Klotz
Contributing Writer

Aaron Klotz is a contributing writer for Tom’s Hardware, covering news related to computer hardware such as CPUs, and graphics cards.

  • bit_user
    The CPU inside the test system features 24 threads, a 3GHz frequency, and lacks AVX512
    This is no surprise, because Intel discloses ISA extensions of upcoming CPUs a couple years in advance. They have yet to disclose AVX-512 or AVX10/512 support on an E-core.
    https://www.tomshardware.com/news/intel-arrow-lake-desktop-mobile-to-have-different-isas
    Speaking of which, if/when they do, it'll be AVX10/512 - not AVX-512. However, the whole thing about introducing AVX10 at a baseline width of 256-bit was telegraphing their intention to hold client CPUs at 256-bit, for the foreseeable future.

    this feature segmentation is not likely to last forever. Intel has plans to bring AVX10 and AVX-512 to its future E-core CPU architectures
    No, you misread their messaging. I've been through their documents (the whitepaper and datasheet) multiple times, and they decidedly characterize 512-bit (including AVX10/512) as legacy and even go so far as to say that hybrid CPUs will use AVX10/256. I'll probably have to go back and pull up the precise quotes...
    Reply
  • bit_user
    Regarding core count, this leak (if true) showed their intention to stick with the 8P + 16E configuration, in the top-spec die:
    https://www.tomshardware.com/news/intel-raptor-lake-refresh-arrow-lake-cpu-performance-projections-leaked
    FWIW, my take on the rumor of hyper-threading going away is one of cautious skepticism. I can see why they might do it, given all the threads contributed by the E-cores, but I can also see them not wanting to cede any ground on multithreaded performance to AMD. Perhaps what they'll do is take a page from an older playbook and disable HT in some of the lower-end and mid-range models.
    Reply
  • TerryLaze
    bit_user said:
    No, you misread their messaging. I've been through their documents (the whitepaper and datasheet) multiple times, and they decidedly characterize 512-bit (including AVX10/512) as legacy and even go so far as to say that hybrid CPUs will use AVX10/256. I'll probably have to go back and pull up the precise quotes...
    No, YOU misread.
    The Intel AVX-512 ISA will be frozen as of the introduction of Intel AVX10 and all CPUID feature flags will continue to be enabled on future P-core processors for legacy support.
    This means they will keep the coder calls for avx-512 so that no old software has to be re written, that's what legacy support is.
    All new subsequent vector instructions will be enumerated only as part of Intel AVX10.
    From here on out all new CPUs will have avx10 (even if they do have legacy support)
    Apart from a few special cases, those instructions will be supported at all vector lengths, with 128-bit and 256-bit vector lengths being supported across all processors, and 512-bit vector lengths additionally supported on P-core processors.
    AVX10 will work in the same way as directx or vulcan, the coder will not have to know exactly what the hardware can do they will only need to code what they want the hardware to do, coders be coding for avx10 ,and the small cores will be handling up to 256-bits while the big cores will be handling up to 512.
    with 128-bit and 256-bit vector lengths being supported across all processors,

    and 512-bit vector lengths additionally supported on P-core processors.
    Maybe the few special cases could mean all of the mainstream CPUs but that would be an incredibly bad way of phrasing that.
    Apart from a few special cases,
    https://cdrdv2-public.intel.com/784343/356368-intel-avx10-tech-paper.pdf
    Reply
  • TerryLaze
    bit_user said:
    Perhaps what they'll do is take a page from an older playbook and disable HT in some of the lower-end and mid-range models.
    That would be the stupidest thing ever, take perfromance away from the tiers that need it the most to stay desirable.
    They did that back then because there was nothing to compete against them at those tiers, that's not so now.
    The only reason intel would remove hyperthreading would be to re introduce it in the next gen for a huge boost in numbers which would boost sales.
    But that is a risk way too big to take.
    Reply
  • usertests
    TerryLaze said:
    That would be the stupidest thing ever, take perfromance away from the tiers that need it the most to stay desirable.
    They did that back then because there was nothing to compete against them at those tiers, that's not so now.
    The only reason intel would remove hyperthreading would be to re introduce it in the next gen for a huge boost in numbers which would boost sales.
    But that is a risk way too big to take.
    At least this is not a segmentation issue this time. Supposedly, Arrow Lake loses hyperthreading because Intel was going to introduce its replacement, "Rentable Units". But it's not ready in time, so Arrow Lake P-cores end up with neither hyperthreading nor Rentable Units.

    It's difficult for me to believe that they couldn't put hyperthreading back into the design, since it's been around for over 20 years. So I'm not sure about the rumor. No worries, we'll find out soon enough.

    If Arrow Lake has a decent +15% or better single-threaded performance increase, it could mostly cancel out the removal of hyperthreading. Some games and applications aren't helped by it at all.
    Reply
  • TerryLaze
    usertests said:
    At least this is not a segmentation issue this time. Supposedly, Arrow Lake loses hyperthreading because Intel was going to introduce its replacement, "Rentable Units". But it's not ready in time, so Arrow Lake P-cores end up with neither hyperthreading nor Rentable Units.
    What is this cookie baking?!
    I was going to use stevia so I forgot to put the sugar in and now it has neither?!

    That's not how that works, they can't just rip out pages of the design because they run out of time, and manufacture half a CPU.
    If they aren't ready they will delay, they did that more than often enough.
    usertests said:
    If Arrow Lake has a decent +15% or better single-threaded performance increase, it could mostly cancel out the removal of hyperthreading. Some games and applications aren't helped by it at all.
    So even if it works as expected some games and apps aren't helped at all so overall it would be a loss.
    Why would anybody think that they will remove HTT?

    Rentable units will break up threads into demanding and less demanding parts, that is perfect for making hyperthreading work even better, since it works on resources left empty by the main thread.
    If a core can run one demanding part and one light part at the same time with hyperthreading it will be faster than doing it on two separate cores, especially if the second one is a e-core with much lower clocks, also less complicated and needing less resources overall.
    Until now hyperthreading had no clue if the thread it's going to run would be demanding or not, so it could get two demanding threads issued to the same physical core at once, RE would help that tremendously.
    If the hyper threads are all filled up then they would go to the e-cores.
    Or at the very least the thread director can evaluate where it will run faster and act accordingly.
    Reply
  • usertests
    TerryLaze said:
    What is this cookie baking?!
    I was going to use stevia so I forgot to put the sugar in and now it has neither?!

    That's not how that works, they can't just rip out pages of the design because they run out of time, and manufacture half a CPU.
    If they aren't ready they will delay, they did that more than often enough.
    I'm just telling you what I've heard, which Tom's Hardware itself has mentioned a couple times now. Don't be shocked if Arrow Lake launches with 24 cores (8+16), 24 threads.
    Reply
  • bit_user
    TerryLaze said:
    No, YOU misread.
    *sigh*
    We've been through this, before.

    TerryLaze said:
    This means they will keep the coder calls for avx-512 so that no old software has to be re written, that's what legacy support is.
    The key phrase is "P-core processors", by which they mean non-hybrid CPUs containing only P-cores. So, they're just saying that product lines which currently support AVX-512 will continue to receive AVX-512 support. That's not relevant to the discussion.

    TerryLaze said:
    From here on out all new CPUs will have avx10 (even if they do have legacy support)
    Not "all new CPUs", since Arrow Lake won't have it.

    TerryLaze said:
    AVX10 will work in the same way as directx or vulcan, the coder will not have to know exactly what the hardware can do they will only need to code what they want the hardware to do, coders be coding for avx10 ,and the small cores will be handling up to 256-bits while the big cores will be handling up to 512.
    Not only is this wrong, but so is your DirectX/Vulkan analogy. Sure, in each case you can query the hardware to see what it supports, but then your program needs to have a codepath to exploit or deal with the hardware's capabilities or limitations.

    For instance, if you detect the CPU only implements AVX10/256 v2, then you can't execute instructions using 512-bit operands or any instructions added in v3 or later.

    Likewise, if the hardware is capable of 512-bit support, your program needs a dedicated codepath written using 512-bit operands, if you want to gain any benefit from it. In this regard, it's really no different than the current AVX2 vs. AVX-512 situation, except that AVX10/256 and AVX10/512 are identical except for operand size - not semantics or instruction set.

    Fun fact: AVX-512 already supports different operand sizes: 128-bit, 256-bit, and 512-bit. AVX10 continues this theme, except that it makes 512-bit support optional.
    TerryLaze said:
    https://cdrdv2-public.intel.com/784343/356368-intel-avx10-tech-paper.pdf
    Yes, I've been through that, multiple times. There's another document, as well.
    Reply
  • bit_user
    usertests said:
    If Arrow Lake has a decent +15% or better single-threaded performance increase, it could mostly cancel out the removal of hyperthreading.
    According to this, it doesn't. Geek Bench SC (Single Core) only improves 9% to 13%:

    And the SPECint and SPECfp (1-copy) only improve 4-8% and 3-6%, respectively:
    Source: https://www.tomshardware.com/news/intel-raptor-lake-refresh-arrow-lake-cpu-performance-projections-leaked
    That's relative to i9-13900K, as well. Compared to the i9-14900K, the improvement would be even less.

    usertests said:
    Some games and applications aren't helped by it at all.
    True. Some gamers either disable it in BIOS or use tools like Process Lasso to restrict games' use of it.

    IMO, the problem isn't hyperthreading, but rather bad APIs which limit the kernel's understanding of how the game is trying to use threads and make the game jump through hoops to try and deal with the hardware's capabilities and limitations.
    Reply
  • bit_user
    TerryLaze said:
    That's not how that works, they can't just rip out pages of the design because they run out of time, and manufacture half a CPU.
    If they aren't ready they will delay, they did that more than often enough.
    Although we're getting pretty far back in history, the Pentium 4: Prescott cores were designed with x86-64 support but didn't enable it. I think that's because it was buggy or incomplete and they decided to push the thing out the door, anyhow.

    TerryLaze said:
    Rentable units will break up threads into demanding and less demanding parts, that is perfect for making hyperthreading work even better, since it works on resources left empty by the main thread.
    If a core can run one demanding part and one light part at the same time with hyperthreading it will be faster than doing it on two separate cores, especially if the second one is a e-core with much lower clocks, also less complicated and needing less resources overall.
    What you're describing is incredibly complex and hard to implement (well). We need to see real data on how well it works. It's not something you can reliably simulate in your brain.

    TerryLaze said:
    Until now hyperthreading had no clue if the thread it's going to run would be demanding or not,
    The ThreadDirector (introduced in Gen 12) now gives the OS kernel insight into such things. Of course, only hybrid CPUs have a ThreadDirector, but the basic approach should apply to SMT as well as P-core vs. E-core.
    Reply