Intel unwraps Lunar Lake architecture: Up to 68% IPC gain for E-cores, 14% IPC gain for P-Cores

Lunar Lake NPU 4.0

Intel shared deep dive architectural details of its fourth-gen NPU unit at the event, but I was unfortunately unable to attend this specific briefing. I do have a recording that I will watch and use to update this section, but for now I’ll have to let the slides do most of the talking.

The NPU is the central component in Intel’s AI strategy, and with 48 TOPS of performance it easily meets Microsoft’s requirements for next-gen PCs. However, the NPU is primarily designed for AI offloading for low-intensity work, thus saving tremendous amounts of battery power. The GPU steps in for more demanding workloads with 67 TOPS of performance, while the CPU contributes another 5 TOPS. Overall, that gives Lunar Lake 120 total TOPS of AI performance.

The key architectural components include 12 enhanced SHAVE DSPs, six neural compute engines, and a MAC array and DMA engine. This is fed with twice the memory bandwidth of the prior-gen NPU on Meteor Lake, and the NPU also has access to the 8MB shared side cache on the compute tile. This further enhances efficiency.

Overall Intel claims a 4X improvement in peak performance and a 2X improvement in performance at the same power over the previous-gen NPU 3.0 used in Meteor Lake.

Lunar Lake Platform Controller Tile and Connectivity

The Platform Controller Tile houses all of the external I/O functions for the chip, including Wi-Fi and Bluetooth, USB 3.0 and 2. 0, Thunderbolt, and the PCIe 4.0 and 5.0 interfaces. It also houses the memory controllers.

Intel guarantees that all Lunar Lake laptops will have at least two ports of Thunderbolt 4 connectivity, while some models will offer up to three ports. Intel used Thunderbolt 4 instead of the newer Thunderbolt 5 due to the target market for this class of laptop. The interface also supports the new Thunderbolt Share feature, which allows the interface to provide drag-and-drop file sharing functionality between PCs, along with screen and peripheral sharing.

The platform also supports Bluetooth 5.4 and Wi-Fi 7 that’s partially embedded into the Platform Controller Tile. Wi-Fi 7 functionality still requires another CNVi module that’s connected externally via the CNVi 3.0 interface. The new BE201 CRF module is 28% smaller than prior-gen Wi-Fi modules.

Lunar Lake Thread Director Improvements

This is another area of the architecture that I wasn't able to attend the briefing, but we'll update this section once we have more time. The above slides provide most of the high-level overview.

Thoughts

Intel’s rethinking of its first-order priorities is important as it looks to fend off Apple’s M3, Qualcomm’s new Snapdragon X Elite, and AMD’s Ryzen AI 300 series processors. Intel will release Lunar Lake as two models, at least initially, but it hasn’t shared the final specifications for those models yet. Intel plans to ship 40 million AI-enabled processors by the end of the year, and Lunar Lake wafers are already in the company's fabs. The chips will arrive in shipping systems in Q3, 2024.

Intel’s Lunar Lake architecture, and all of the associated core IPs, represents a dramatic rethinking of the company’s design goals to a power-first design to maximize battery life and performance. The improved design methodology and CPU and GPU microarchitectures will soon filter down to Intel’s other mobile products, like the upcoming Panther Lake, its Arrow Lake chips for desktop PCs, and its data center Xeon 6 processors.

The current Meteor Lake processors were the first step on this road to placing multiple tiles on a single package, and Intel looks to make improvements in every aspect of the design with Lunar Lake. With more competition in the mobile sector, Intel needs its upcoming processors to be more revolutionary than evolutionary, and the architectural deep dive seems to indicate everything is in place. We'll find out this fall how it all comes together, and how Lunar Lake competes with other options.