Intel unwraps Lunar Lake architecture: Up to 68% IPC gain for E-cores, 14% IPC gain for P-Cores

Lion Cove delivers an impressive number of improvements to the core microarchitecture, but Skymont sees an even bigger advance over the prior gen with a 38% IPC improvement in integer workloads and a 68% IPC gain in floating point work, fueling an up to 2x increase in single-threaded performance and up to 4x more peak performance in multi-threaded workloads than the Meteor Lake LP E-cores. Intel also doubled throughput in vectorized AVX and VNNI workloads.

The Skymont architecture marks Intel’s third E-core design for x86 hybrid processors, following Gracemont in Alder Lake and Crestmont in Meteor Lake. (That's not counting the Tremont cores in Lakefield, which seem to be more of a proof of concept in hindsight.) The Meteor Lake design employed two E-cores placed in the SoC tile for extreme low-power workloads, with eight additional E-cores on the compute tile along with the P-cores (two quad-core clusters). With Lunar Lake, Intel employs a single quad core cluster on the compute tile to addresses both the low power E-core and high-power E-core roles with an expanded dynamic range.

Intel optimized the branch prediction engine by incorporating a 96-instruction byte parallel fetch to feed the decode engine. The decode clusters are also expanded from 6-wide (2x3) with Crestmont to 9-wide (3x3) with Skymont, so any core in the new design can sustain nine instruction decodes per clock. Skymont also now employs nanocode to enable parallel microcode generation to allow the three decode clusters to execute in parallel more frequently. Micro-op capacity was also increased from 64 to 96 entries to add more buffering between the front end and back end.

Skymont has an 8-wide allocation in the out of order engine, an increase from Crestmont’s 6-wide allocation. Skymont also expands to a 16-wide retire, a doubling over Crestmont’s 8-wide retire, to free up resources as quickly as possible after stalls, which improves power and area efficiency. The out of order window is 60% larger than the prior-gen, and the architecture has bigger register files, deeper reservation stations, and deeper load and store buffering. Parallelism is boosted by employing 26 dispatch ports, including eight ALUs, three jump ports, and support for three loads/cycle.

Intel targeted a 2X improvement in vector performance, made by going from the two 128-bit FP and SIMD vector pipes to four with Skymont. Other improvements to the vector engine targeted latency reductions and adding support for floating point rounding. Intel also enhanced its load/store engine with a few enhancements listed in the slide.

Previous E-core clusters had a shared 2MB L2 cache, but that has now been expanded to 4MB with double the L2 bandwidth. L1 to L1 transfer bandwidth was also improved.

The final results are impressive, with the aforementioned 38% and 68% improvement in single-threaded integer and floating-point performance, though this is notably compared to the low-power (LP) e-cores in the Meteor Lake SoC, not the standard quad-core cluster on the compute die. For perspective, the LP e-cores only have 2MB of cache compared to the standard e-core cluster with 4MB of cache. Again, Intel gives itself a rather large +/- 10% margin of error.

Skymont’s power and single-threaded performance curve is vastly enhanced over Crestmont, but the comparisons are once again being made to the low-power Meteor Lake E-core instead of the full E-core. Compared to Crestmont’s peak performance, Skymont consumes one-third the power to deliver the same level of performance. However, it has more gas in the tank with 1.7X more performance at the same power level. Overall, Skymont’s peak single-threaded performance is twice that of the Crestmont LP E-cores.

The multi-threaded power/performance metrics are skewed, as Intel compares Skymont’s quad-core cluster to Meteor Lake’s dual-core low-power E-core cluster instead of comparing it to the quad-core cluster. As such, we would expect to see half the stated advantages in these areas over the standard Meteor Lake quad-core cluster.

Intel also provided comparisons for Skymont vs Raptor Lake’s P-core, which uses the Raptor Cove architecture. Intel claims a 2% IPC advantage for Skymont in integer and floating point. 

Intel’s power-to-performance slides for the Skymont/Raptor Cove comparisons are easily misconstrued. In the last two slides, we can see that Intel zoomed in on an area of the performance curve that it says is the proper envelope for multi-threaded acceleration on a low power island. That yields the final slide, where Intel says that Skymont consumes 0.6X the power at the same performance as Raptor Cove, or 1.2X the performance at the same power. Again, we see the same high margin of error, so take these extrapolated comparisons with a healthy serving of salt.

Paul Alcorn
Managing Editor: News and Emerging Tech

Paul Alcorn is the Managing Editor: News and Emerging Tech for Tom's Hardware US. He also writes news and reviews on CPUs, storage, and enterprise hardware.

TOPICS
  • dimar
    I'd like to see extensive power consumption/performance benchmarks for the next gen. Intel, AMD, ARM CPUs and how slower and faster SSDs affect the battery life.
    Reply
  • cyrusfox
    Fascinating the disparity in improvement between Floating point vs integer single thread uplift. Huge FP uplift! Eye popping improvements!!! That is lame though, comparing to the gimped e-cores...
    but the comparisons are once again being made to the low-power Meteor Lake E-core instead of the full E-core
    iGPU uplift is looking really solid, this seems to check all the boxes of what I need in a device. For toting around I really don't need 24+ cores, 8 is plenty and with this much GPU as well as hopefully maturing platform. Excited to see the products and where the pricing lands. Hope Framework gets a flavor of this.
    Reply
  • jenci8888
    68% ipc gain e-core? That doesn't seem right... I think they meant 68% gain e-core on specfp. It should be 30% ipc around.
    Reply
  • usertests
    cyrusfox said:
    That is lame though, comparing to the gimped e-cores...
    They also compare it to Raptor Cove, and if I'm reading it right, claim +2% IPC over that. With the caveat repeatedly mentioned in the article that Intel is giving itself a 10% margin of error.
    Reply
  • thestryker
    Well it seems we have an answer regarding HT and that is it depends. It'll be interesting to see which version of the Lion Cove desktop ARL uses. I wouldn't be surprised if mobile ARL went without HT, but it seems like desktop could probably keep it even though they emphasize hybrid when referring to dropping it.

    Will be looking forward to seeing real world performance on LNL and ARL.
    cyrusfox said:
    That is lame though, comparing to the gimped e-cores...
    Just a guess based upon the testing Chips and Cheese did on the MTL LP E-cores removal from the ring bus and thus the L3 cache can have an outsized impact on performance. This would in theory be the closest comparison to an existing product.
    Reply
  • Giroro
    I think cheap Mini PCS using Intel's N100 all E-core processor are a perfectly usable office machine for many people, and I would love to see those upgraded with the new E cores.

    That said, I would never buy one, because they would definitely spec those machines with a worthless amount of non-upgradeable memory.
    One of the major features adding value of the N100 machines, is that you can usually upgrade the RAM.

    Now for the high end Lunar Lake products... There is no high end. If Intel doesn't convince manufacturers to keep ultrabooks with the highest available configuration under $1200, then they're going to have a problem
    Reply
  • Evildead_666
    Can people please stop using the word Architect as a verb please ?
    "Redesigned" would have been perfect for this article.
    Architected or Rearchitected do not exist.
    Cheers.
    Reply
  • Dragos Manea
    Evildead_666 said:
    Can people please stop using the word Architect as a verb please ?
    "Redesigned" would have been perfect for this article.
    Architected or Rearchitected do not exist.
    Cheers.
    That would to similar with the article from which they copied and pasted, they had to change some words even if it is with words that does not exist.
    Reply
  • bit_user
    The article said:
    38% and 68% IPC gains in the new Skymont architecture.
    This is based on a somewhat biased performance comparison (see below).

    cyrusfox said:
    Fascinating the disparity in improvement between Floating point vs integer single thread uplift. Huge FP uplift! Eye popping improvements!!!
    That is lame though, comparing to the gimped e-cores...
    Initially, I missed what you probably meant by "gimped". As I see @thestryker has pointed out, comparing to the LP E-cores is indeed quite lame of them, since its lack of L3 cache has been shown to disadvantage it relative to the Crestmont cores on Meteor Lake's CPU tile.

    Dang. I was really excited for a minute, there.
    : (
    Reply
  • bit_user
    Giroro said:
    I think cheap Mini PCS using Intel's N100 all E-core processor are a perfectly usable office machine for many people, and I would love to see those upgraded with the new E cores.
    Yeah, they seem to be somewhere around the performance of a Sandybridge or Haswell i5, which is still pretty usable. Of course, their iGPU is much better than those CPUs'.

    Giroro said:
    That said, I would never buy one, because they would definitely spec those machines with a worthless amount of non-upgradeable memory.
    One of the major features adding value of the N100 machines, is that you can usually upgrade the RAM.
    You can find some that take a DDR5 SO-DIMM. I have 32 GB in my N97 machine. It doesn't need that much, but I did it just to get dual-rank, for the small performance boost it provides.
    Reply