Intel 13th-Gen Raptor Lake Architecture
Intel significantly improved its 'Intel 7' process node by employing faster third-gen SuperFin transistors and increasing the channel mobility (Intel hasn't provided more specifics on the latter). Intel claims this yielded what would typically be considered a full node's worth of performance improvement. As you can see above, Intel shifted and extended the voltage/frequency curve to deliver better performance at both high and low voltages (>50mV reduction at the same frequency, or a 200 MHz increase at the same voltage).
We'll soon see a 6 GHz Raptor Lake model because the improved node performance paired with optimized speed paths within the design (faster, leakier transistors in key paths) allowed Intel to dial up the peak clock rates by 600 MHz. Intel says that Raptor Cove is the fastest core it's ever built, but the node's improved dynamic range can be tuned more for the lower-power end of the V/F curve to improve efficiency for the Raptor Lake mobile chips.
Intel's larger 4MB L2 cache on the e-core clusters also employs a new dynamic prefetching algorithm that uses real-time telemetry data and machine learning to dynamically adjust the prefetch algorithm based on the type of workload it is executing. Intel says this helps improve performance in some types of applications by ~2%, like Adobe Photoshop, After Effects, and Lightroom, and in an unspecified application by up to 16%.
Intel also employs a different dynamic 'Inclusive/non-inclusive' (INI) approach for its L3 cache, but this adjusts between inclusive and non-inclusive caching schemes on the fly. Caches are typically hard-coded to run with either of these policies and can't be changed. As a reminder, inclusive caching keeps a copy of L2 data in the L3 cache and typically improves performance in single-threaded work by providing a faster hit rate for serial workloads, while non-inclusive caching only keeps part of the L2 cache in L3 to improve performance in threaded workloads. Intel's INI uses machine learning and real-time telemetry to make changes to this policy on the fly, thus providing the best of both worlds.
This new feature comes courtesy of a microcontroller unit (MCU) that Intel originally designed into each Alder Lake core, but hadn't yet fully used. Intel was the first to employ a microcontroller built directly into each core, but the controller was only used for SoC-level power management in Alder Lake. With Raptor Lake it now manages power on a per-core basis and provides real-time control of certain characteristics while providing a foundation for new functionalities in the future.
New firmware updates to this microcontroller employ machine learning algorithms that better analyze telemetry data to characterize workloads, thus allowing the MCU to make adjustments to power management, the caching scheme, and prefetch characteristics at a 200-microsecond granularity. The telemetry data is sent from another unit in the core.
Interestingly, the MCU can be further tuned via uploadable firmwares, and such updates could even conceivably be added to Alder and Raptor Lake chips in the field via microcode updates, thus providing a performance upgrade. Intel says there are no firm plans for this option as of yet.
Intel also increased the L2 cache from 1.25MB to 2MB for each Raptor Cove p-core but didn't need to make any significant architectural changes to enable the increased capacity. This cache, and some other features like increased vector compute, were already designed into the edge of the die in the previous-gen Golden Cove architecture, but they were only used for the Sapphire Rapids server processors. Intel's architects can exercise a 'chop option' to remove those features for processors that didn't need them, like Alder Lake.
This approach speeds design and validation while leaving room for future revisions of the same architecture. With Raptor Lake, Intel chose to leave the L2 cache active. This adds roughly 1 cycle of extra latency, a byproduct of having to travel further across the die to access the extra capacity.
Neither the p-cores nor e-cores come with any meaningful IPC increase — we're told the 'one to two percent' increase comes largely from improved memory throughput and higher clock rates. Roughly half of Raptor Lake's frequency improvement stems from the Intel 7 process node's improved capacitance and speed, while the remainder comes from improved instruction timing and fine-grained power optimizations. Intel originally designed the Golden Cove architecture to achieve such speeds, but some headroom was left on the table in order to speed time to market for Alder Lake.
As a result of frequency, memory, cache optimizations, and more threads, Intel claims that Raptor Lake provides 15% more single-threaded performance than Alder Lake and 41% more multi-threaded performance, which was largely borne out in our own testing.
Intel also says that Raptor Lake can provide the same performance as Alder Lake in multi-threaded work at a mere quarter of the power. Yes, you read that right -- Intel says Raptor can match a 241W Alder lake chip, but with only 65W of power. We're putting that to the test; stay tuned. Intel also says it is working on a new dynamic core parking feature for Raptor mobile chips, but the company will share more details later.
Intel’s Core i9, i7, and some i5 models come with the new larger Raptor Lake 8+16 die (8 P-core + 16 E-core). Above, you can see an image of the Raptor Lake die, with the e-cores highlighted in blue. Intel has added two additional quad-core e-core clusters to the die, elongating the die and adding two more ring stops and two more L3 cache slices that are shared among the cores.
As you can see from the unofficial die measurements we have in the table below, the die is also slightly wider. We theorize that the wider dimensions stem from the extra L2 cache placed at the edge of the die. The larger die allows Intel to stay competitive with AMD, but it will also inevitably lead to higher costs. We see that cost passed on to the customer with the Core i5 model, but Intel is obviously absorbing the cost increase of the Core i7 and i9 models.
|Row 0 - Cell 0||Die Area||Die Dimensions||Cores||Process|
|Raptor Lake Core i9-13900K||257 mm^2||23.8 x 10.8 mm||8 P-Cores | 16 E-Cores||Intel 7|
|Alder Lake Core i9-12900K||208 mm^2||20.4 x 10.2 mm||8 P-Cores | 8 E-Cores||Intel 7|
|Rocket Lake Core i9-11900K||281 mm^2||24 x 11.7 mm||8 P-Cores||14nm|
|Comet Lake Core i9-10900K||206 mm^2||9.2 x 22.4 mm||10 P-Cores||14nm|
Raptor Lake sees significant improvements to DDR5 throughput, but leveraging faster DDR5 memory speeds requires a faster fabric. Intel has increased the ring bus frequency to 5 GHz and it is 900 MHz faster during all-core turbo, addressing a glaring issue we saw with Alder Lake. Intel says the ring bus enhancements alone can result in 5% higher framerates (we've long seen big performance increases from overclocking this setting on Alder Lake, but this will now yield negligible results with Raptor Lake due to the higher native speed).