Arm whipped the covers off its next-gen series of CPU and GPU cores based on the Armv9 architecture today, along with other system IP, all of which it packages under its Total Compute Solutions collection of technologies.
These new Arm chips span from the flagship Cortex-X2 cores that will power next-gen laptops and mobile devices, with what Arm says is up to 40% more single-threaded performance than modern laptop silicon, down to high-efficiency "Little" Cortex-A710 cores that slot in for mixed uses in higher-end laptops and mobile applications. Arm also has its new Cortex-A710 on tap for the traditional 'big' CPU role, along with a new range of Mali GPU cores.
Laptops and mobile devices with the new Arm chips will debut in 2022, though the timeline will vary by the respective manufacturers. Let's dive right in and see what the near future of the Arm ecosystem looks like.
Arm Cortex-X2, Cortex-A510, Cortex-A710 Cores and Mali GPUs
The Cortex-X2 slots in as the highest-end cores of the stack, with Arm claiming a 16% performance improvement in single-core performance at the same process node and clocks (ISO process/frequency) over the Cortex-X1, along with a doubling of machine learning performance (matrix multiplication).
Arm designed the X2 cores for the highest IPC of the range and tuned the voltage/frequency curve for performance. Clusters of these cores will drop into higher-end laptops and notebooks in groups as large as eight cores enabled by the enhanced DynamIQ Shared Unit (DSU-110) fabric that ties the elements together. Eight-core X2 clusters also support up to 16MB of L3 cache and 32MB of unified System Level Cache (SLC). The CoreLink CI/NI-700 interfaces also provide connections to the other IP blocks, like GPU cores and DRAM. More on that in a bit.
X2 cores will drop into the growing fleet of Windows and Arm-based Chromebooks. Arm says these devices will offer all-day battery life paired with the highest-end performance.
As per usual, Arm also has both 'big' and 'Little' cores, both of which it uses for big.Little-esque designs that combine higher-performance big cores with lower-performance efficiency cores.
The 'big' Cortex-A710 slots in for tasks requiring a better blend of performance and efficiency for sustained multi-core workloads, with up to 10% more performance in single-core tasks and twice the machine learning performance than the prior-gen Cortex-A78. This chip is Arm's first 'big' core to support the Armv9 architecture.
The 'Little' Cortex-A510 slots in as the company's first new little core in four years and supports the Armv9 architecture. Arm claims these small efficiency cores, designed primarily for background tasks and light workloads, offer nearly the same performance as the prior-gen big cores. Arm says these cores offer up to 35% more performance in single-core work and a 3x improvement in machine learning performance over the prior-gen little cores.
Notably, the A510 sports an in-order microarchitecture, which Arm says helps tune it for a wide range of efficiency-focused tasks that span from smartphones to smart home and wearable applications. The company used the latest prefetching and branch prediction tech, along with fine-grained pipeline tuning, to wring out the best efficiency and performance possible from the in-order design. Compared to the prior-gen big-core A73, Arm says the A510 is within 10% on IPC and 15% on frequency, all while consuming 35% less power.
Arm also introduced its refreshed Mali GPU lineup, which the company says represents the broadest range of performance it's ever released for its graphics cores.
The Mali-G710 slots in as the flagship with a claimed 20% performance improvement and 35% improvement in machine learning over the previous-gen Mali-G78. Meanwhile, the G510 slots in for applications like TV and augmented reality with 22% higher efficiency and a doubling of machine learning performance, while the lowest-end Mali-G310 slots in for low-cost devices with a claimed 6X performance increase in texturing.
Here we can see how the new cores could be arranged in next-gen devices. Laptops can come with several types of configurations, like the eight-core X2 cluster shown in the prior section, or in clusters of four fast X2 cores and four 'big' A710 cores.
You can also expect the X2 to make an appearance in some of the highest-end mobile devices, but as a single dominant core that's paired with clusters of both 'big' A710 and 'little' A510 cores. As shown in the third slide, this type of arrangement is significantly more performant than the previous generation, part of which comes on the back of the doubled L3 cache capacity. Compared to today's devices, Arm says the X2 core can provide up to 30% more performance than the latest flagship Android smartphones, the A710 cores are 30% more efficient, and it has wrung out 35% higher performance from the little cores.
As you can see, different pairings allow a mix-and-match design philosophy that spans down to low-power devices like wearables and AR devices. Naturally, that requires an efficient and performant fabric to tie the elements together.
The DynamIQ Shared Unit-110 (DSU-110) steps into that role nicely. The design leverages a bi-directional dual-ring structure to connect the cores and cache slices and offers five times the L3 bandwidth and support for up to 16MB of L3 cache. Additionally, the CoreLink CI/NI-700 ties the DSU-110 to other system IP and devices, such as the GPU cores, DRAM, 5G modems, and third-party IP.
Arm offers the Cortex and Mali cores under its Total Compute umbrella. This includes the standard feature set of the Armv9 architecture that the company recently announced like enhanced machine learning capabilities, bolstered security, and full compatibility with Armv8.
Arm plans to have all of its cores on 64-bit architectures in 2023. The improvements on the machine learning side include support for Scalable Vector Extensions v2 (SVE2), support for the Bfloat16 format, and support for matrix multiply instructions for both Int8 and Bfloat16 numerical formats.
Here are the CPU microarchitecture slide decks for each of the three new cores, along with a rollup of the GPU architecture slides.
Laptops and mobile devices with the new Arm chips will debut in 2022, though the timeline will vary by the respective manufacturers. Naturally, we'll be watching Cortex-X2 devices closely, as Apple's recent successes with the M1 could spur a willingness among other vendors, and customers, to adopt Arm architectures for mobility and desktop PC applications.