Intel's next-gen CPU boosts to 2.8 GHz without Hyper-Threading — Lunar Lake chip with eight cores, eight threads has a bigger L2 cache than the L3 cache

Critical specifications for an early sample of Intel's upcoming Lunar Lake CPU have leaked out, detailing its core and thread count, cache configuration, and frequency (via HXL). While the specifications imply Lunar Lake is very similar to its predecessor, Meteor Lake, in many ways, it might make some significant changes regarding cache and Hyperthreading.

The specifications come from a screenshot of Windows Task Manager running on what is purported to be a Lunar Lake-equipped PC, courtesy of Zhihu user Xziar. The alleged Lunar Lake chip is an A1 sample, which means it's the first working model to come out of the fabs. Historically, A1 silicon tends to be in prototype territory, especially for Intel; Meteor Lake was on stepping C0 (with the C meaning the third major revision), Raptor Lake was B0, and Alder Lake was C0. So, this Lunar Lake chip is probably not the final product.

Some of these specs are more or less expected. Lunar Lake is physically small and seemingly focused towards mobile efficiency, so there only being eight cores total isn't surprising. The L1 cache amount also implies that Lunar Lake doesn't come with any low-power E-Cores like Meteor Lake, or all are low-power but with increased L1 cache size. The 2.8 GHz boost clock is also unsurprising since the sample is likely ES silicon.

Swipe to scroll horizontally

Header Cell - Column 0	Lunar Lake*
P-Core Architecture	Lion Cove
E-Core Architecture	Skymont
P-Core Count	4
E-Core Count	4
Thread Count	8
L1 Cache	836KB (112KB per P-Core, 96KB per E-Core)
L2 Cache	14MB (2.5MB per P-Core, 4MB per four E-Cores)
L3 Cache	12MB
Boost Clock	~2.8GHz
Process	Intel 18A/TSMC N3B

* Specifications are unconfirmed.

Things get weird when you start looking at the cache, though. It would appear Lunar Lake is identical to Meteor Lake in the L1 and L2 cache, or at least that's what the screenshot implies. However, the Lunar Lake sample has only 12MB of L3 cache, lower than the 14MB of L2 cache. Usually, a higher level of cache means more capacity, and often significantly more, so it's very unintuitive that Lunar Lake should have less L3 than L2 cache. This directly contradicts an earlier leak that showed 16MB of L3 cache for Lunar Lake but has identical specifications otherwise.

Intel Lunar Lake CPU — (Image credit: Xziar/Zhihu)

It's possible that the Task Manager didn't read the Lunar Lake chip correctly and that the sample has 16MB of L3, but that might not be true. In response to a comment comparing Lunar Lake to Intel's low-end N300, XZiar said "this cache is obviously not up to par." That would be a weird response if Task Manager were wrong, and it would seem that the leaker thinks 12MB of L3 cache is the correct figure.

The other weird specification is the thread count, which is just eight. Intel's previous hybrid architecture CPUs have included Hyperthreading for the P-Cores, which should result in 12 threads. Since A1 silicon is unlikely to be the final product, it's possible it's simply disabled because of technical issues or for testing purposes. On the other hand, early Arrow Lake samples don't have Hyper-Threading either. Although that could be a coincidence, it raises the possibility that Intel may be moving on from Hyperthreading in 2024.

TOPICS

Matthew Connatser is a freelancing writer for Tom's Hardware US. He writes articles about CPUs, GPUs, SSDs, and computers in general.

9 Comments Comment from the forums

usertests

The design looks like a good low-power successor to Alder Lake-U and Meteor Lake-U, and far superior to gimped chips like Alder Lake-N (0+8 cores, but typically quad-core, 16/24/32 EUs, limited to single-channel memory).

Compare 4+4, 8 threads with no hyperthreading to:

Alder/Raptor Lake-U, with 2+8 cores, 12 threads.
Meteor Lake-U, with 2+8+2 cores, 14 threads. The last two being the new LP-E cores.

Increasing the P-cores, decreasing the E-cores seems like a good development which should increase its gaming performance. You probably won't miss the extra threads in a low-power device, and there should be at least some minor IPC gains from Lion Cove and Skymont. It's a little unusual to see LP E-cores thrown out, but who cares?

The YuuKi-AnS leak from November revealed some other details: A choice of only 16 GB or 32 GB memory-on-package, 16 GB being a good minimum these days. The memory speed is apparently fixed at LPDDR5x-8533. iGPU is 7/8 Xe2 cores. which is 112/128 execution units instead of the 64 EUs maximum in Meteor Lake-U. The NPU is faster for what that's worth. There's H.266 video decode and Wi-Fi 7 support.

It just looks like a good x86 low-power laptop chip when you look at it in full. The only thing I'll be complaining about is the pricing most likely.
Reply
thestryker

This seems squarely aimed at the same market Qualcomm seems to be trying to break into so at least Intel will have a very real threat to deal with which might keep pricing under control. Of course Qualcomm isn't exactly known for their rational pricing so maybe it won't be such an issue.

It's going to be very interesting to see how LNL and ARL turn out to be designed in the end.
Reply
rluker5

usertests said:
The design looks like a good low-power successor to Alder Lake-U and Meteor Lake-U, and far superior to gimped chips like Alder Lake-N (0+8 cores, but typically quad-core, 16/24/32 EUs, limited to single-channel memory).

Compare 4+4, 8 threads with no hyperthreading to:

Alder/Raptor Lake-U, with 2+8 cores, 12 threads.
Meteor Lake-U, with 2+8+2 cores, 14 threads. The last two being the new LP-E cores.

Increasing the P-cores, decreasing the E-cores seems like a good development which should increase its gaming performance. You probably won't miss the extra threads in a low-power device, and there should be at least some minor IPC gains from Lion Cove and Skymont. It's a little unusual to see LP E-cores thrown out, but who cares?

The YuuKi-AnS leak from November revealed some other details: A choice of only 16 GB or 32 GB memory-on-package, 16 GB being a good minimum these days. The memory speed is apparently fixed at LPDDR5x-8533. iGPU is 7/8 Xe2 cores. which is 112/128 execution units instead of the 64 EUs maximum in Meteor Lake-U. The NPU is faster for what that's worth. There's H.266 video decode and Wi-Fi 7 support.

It just looks like a good x86 low-power laptop chip when you look at it in full. The only thing I'll be complaining about is the pricing most likely.
It does seem focused on low power responsiveness. Probably isn't intended for serious gaming. Just good responsiveness for the non gamer/workstation crowd which is most people and businesses.

I wonder how it will compare to Apples offerings on battery life.

And that max boost clock is questionable, even at 33% load. Under a power savings type power plan, which is default for mobile, clocks fluctuate a lot. Also hopefully Intel has found a way to downclock better than what the chips have been doing lately. They could save more power.
Reply
usertests

rluker5 said:
It does seem focused on low power responsiveness. Probably isn't intended for serious gaming. Just good responsiveness for the non gamer/workstation crowd which is most people and businesses.
I'm not sure about the graphics, but it looks promising.

The leak says "8 Xe2 cores, 64 Vector Engines (16 wide)".

Wikipedia says Meteor Lake-H has 8 Xe cores, 128 Vector Engines. Meteor Lake-U has 4 Xe cores, 64 Vector Engines. So a newer generation of Arc/Xe graphics but some aspects may be halved since it's efficiency-focused. Or maybe that's just Vector Engines being wider and more powerful in Xe2-LPG, I don't know.

The apparent default speed of LPDDR5x-8533 is 14% higher than LPDDR5x-7467 top speed in Meteor Lake, and that's only in the top models.

So it would either be faster than only Meteor Lake-U, or faster than Meteor Lake-H/Phoenix which would actually be kinda serious to me at least. Older games, emulators, and some newer games would be fine with that level of iGPU for 1080p and the 4+4 cores, 8 threads. Microsoft may try to harness the NPU for its unannounced upscaling technique. The leak says it will run at 8W in fanless designs, or 17-30W with a fan, which would be the better choice for clocks and gaming performance.
Reply
purposelycryptic

rluker5 said:
It does seem focused on low power responsiveness. Probably isn't intended for serious gaming. Just good responsiveness for the non gamer/workstation crowd which is most people and businesses.

I wonder how it will compare to Apples offerings on battery life.

And that max boost clock is questionable, even at 33% load. Under a power savings type power plan, which is default for mobile, clocks fluctuate a lot. Also hopefully Intel has found a way to downclock better than what the chips have been doing lately. They could save more power.
I think you mean productivity crowd. Workstation processors would be the Xeon-W series and AMD's Threadripper PRO series processors.

Workstations generally have the meanest processors around, with the highest balance of clock speed and core/thread count of any production CPU line. Gaming processors may have higher clock speeds, server processors may have more cores, but workstation processors are the true all-rounder bruisers. You can definitely game on them - if you have the money. They are far on the other end of the price spectrum from this processor - you can easily spend $5k-$10k+ on one.
Reply
bit_user

Critical specifications for an early sample of Intel's upcoming Lunar Lake CPU have
Saying "leaked out" is redundant, as the "out" is implicit. The first place I noticed using such a construction is WCCFTech, which doesn't surprise me as I think that site is based in Pakistan, hence their English is sometimes a bit odd.

Meteor Lake was on stepping C0 (with the C meaning the third major revision), Raptor Lake was B0, and Alder Lake was C0. So, this Lunar Lake chip is probably not the final product.
I also thought the letters were simple iterations, but then I learned that the Alder Lake-S H0 stepping is a fundamentally different die than the C0 stepping. The H0 die is the one with only 6 P-cores and no E-cores, in its fully-enabled form. It's actually different silicon than the C0 stepping, which is the one with 8P + 8E. I don't know if these dies have other names, but the only distinction I've seen is the C0 vs. H0 stepping.

the Lunar Lake sample has only 12MB of L3 cache, lower than the 14MB of L2 cache. Usually, a higher level of cache means more capacity, and often significantly more, so it's very unintuitive that Lunar Lake should have less L3 than L2 cache. This directly contradicts an earlier leak that showed 16MB of L3 cache for Lunar Lake but has identical specifications otherwise.
Not weird. First, Intel CPUs tie L3 cache to the core tile. As cores are disabled, so are their L3 cache slices. So, if this isn't a fully-enabled sample, then that could explain certain L3 differences. For instance, Raptor Cove had a 3 MB slice of L3 per core, in which case 4 P-cores would yield 12 MB of L3. 6 P-cores would give you 18 MB.

The other thing to know about L3 cache is that Intel has been implementing as exclusive of L2 contents since Skylake-SP, making it somewhat complementary. That makes a lot of sense if you've got big L2 caches, and it's how they avoid needing so much L3.

Just look at the L3 cache in Alder Lake & Raptor Lake. Here's a plot I made, but note that the Y-axis is logarithmic.

Reply
jp7189

bit_user said:
Saying "leaked out" is redundant, as the "out" is implicit. The first place I noticed using such a construction is WCCFTech, which doesn't surprise me as I think that site is based in Pakistan, hence their English is sometimes a bit odd.

I also thought the letters were simple iterations, but then I learned that the Alder Lake-S H0 stepping is a fundamentally different die than the C0 stepping. The H0 die is the one with only 6 P-cores and no E-cores, in its fully-enabled form. It's actually different silicon than the C0 stepping, which is the one with 8P + 8E. I don't know if these dies have other names, but the only distinction I've seen is the C0 vs. H0 stepping.

Not weird. First, Intel CPUs tie L3 cache to the core tile. As cores are disabled, so are their L3 cache slices. So, if this isn't a fully-enabled sample, then that could explain certain L3 differences. For instance, Raptor Cove had a 3 MB slice of L3 per core, in which case 4 P-cores would yield 12 MB of L3. 6 P-cores would give you 18 MB.

The other thing to know about L3 cache is that Intel has been implementing as exclusive of L2 contents since Skylake-SP, making it somewhat complementary. That makes a lot of sense if you've got big L2 caches, and it's how they avoid needing so much L3.

Just look at the L3 cache in Alder Lake & Raptor Lake. Here's a plot I made, but note that the Y-axis is logarithmic.

Initially I thought the same re:L3, but that explanation doesn't make sense considering the L2 size. If cores were disabled, then L2 would be lost too.

I have a hard time imagining a use case for a cache that's both smaller and slower even if it's exclusive. I guess maybe if L2 is core specific and L3 is shared pool, there could be some single thread cases...
Reply
bit_user

jp7189 said:
Initially I thought the same re:L3, but that explanation doesn't make sense considering the L2 size. If cores were disabled, then L2 would be lost too.
You see the graph in my post, no? The per-core L2 amount has been creeping up, while they've been holding per-core L3 constant, for the past 3 generations. I suppose I should update it with Meteor Lake.

Edit: it looks like the per-core specs on Redwood Cove's caches are the same as Raptor Cove's. The biggest change seems to be that the L3 cache is no longer shared with the iGPU. Interestingly, the CPU die's quad-Crestmont tiles appear to have dropped back to having a 2 MB slice of shared L2, each (Raptor Lake increased this to 4 MB). In spite of this, the E-cores are the one area of Meteor Lake featuring higher IPC than Raptor Lake, from what I've seen.

jp7189 said:
I have a hard time imagining a use case for a cache that's both smaller and slower even if it's exclusive. I guess maybe if L2 is core specific and L3 is shared pool, there could be some single thread cases...
That's exactly how it works. L2 unifies code + data, for a single core (L1 separates them). Except in the case of E-cores, where the L2 is shared across the quad-core cluster. On Intel CPUs, L3 is global. On AMD CPUs, L3 only unifies the specific compute die, which makes L3 comparisons between 12+ core AMD CPUs and Intel CPUs artificially lopsided in AMD's favor.

BTW, note that current lithography techniques have reached a point where SRAM size has virtually stopped improving with new nodes. So, that's an argument for why Intel might be preferring to spend more of its area-budget on L2 cache than L3. In other words, they might get a better improvement in perf/mm^2 by enlarging L2 cache by some amount, rather than enlarging L3 by the same amount. Transistor-wise, the cost is about the same.

The traditional reason to keep L2 smaller is that lookups take longer, the larger it gets, and L2 is typically more latency-sensitive than L3. However, that cost increases logarithmically vs. size, so it's not an obvious win not to enlarge L2 a little bit more.
Reply
jp7189

bit_user said:
You see the graph in my post, no? The per-core L2 amount has been creeping up, while they've been holding per-core L3 constant, for the past 3 generations. I suppose I should update it with Meteor Lake.

Edit: it looks like the per-core specs on Redwood Cove's caches are the same as Raptor Cove's. Interestingly, the Crestmont tiles appear to have dropped back to having a 2 MB slice of shared L2, each.

That's exactly how it works. L2 unifies code + data, for a single core (L1 separates them). On Intel CPUs, L3 is global. On AMD CPUs, L3 only unifies the specific compute die, which makes L3 comparisons between 12+ core AMD CPUs and Intel CPUs artificially lopsided in AMD's favor.

BTW, note that current lithography techniques have reached a point where SRAM size has virtually stopped improving with new nodes. So, that's an argument for why Intel might be preferring to spend more of its area-budget on L2 cache than L3. In other words, they might get a better improvement in perf/mm^2 by enlarging L2 cache by some amount, rather than enlarging L3 by the same amount. Transistor-wise, the cost is about the same.

The traditional reason to keep L2 smaller is that lookups take longer, the larger it gets, and L2 is typically more latency-sensitive than L3. However, that cost increases logarithmically vs. size, so it's not an obvious win not to enlarge L2 a little bit more.
I was responding to your point of a small L3 being the result of disabled cores. If disabled cores were the cause, then we'd see a reduction of L2 also.

Otherwise, I agree with your thoughts.
Reply

Show more comments