ARM's Cortex-A72 CPU is a natural progression of the A57. At a high level, the two processors look similar, but ARM has made a number of power and performance optimizations to every stage in the pipeline. Most integer workloads see no appreciable performance gain, but there's a few specific cases, encryption in particular, that benefit from zero-cycle forwarding and the new Radix-16 integer divider. The A72's lower-latency floating-point units make a larger impact, increasing single-core Geekbench performance by about 15 percent overall relative to the A57 at the same clock frequency, with most of the individual workloads showing 30 to 60 percent gains.
Despite the A72's improvements, it's still a narrower architecture than Apple's Twister CPU or Qualcomm's new Kryo core, which limits IPC. After adjusting for differences in clock frequency, Kryo holds a 20 percent advantage in Geekbench integer and a 41 percent advantage in Geekbench floating-point performance. The A72 can, however, reach higher frequencies (we should see A72 cores running at 2.5GHz), which helps mitigate, and in some cases overcome, Kryo's greater IPC.
HiSilicon is the first to deliver the A72 on TSMC's 16nm FinFET+ process, and the first to use ARM's latest high-end GPU -- the Mali-T880 -- in its Kirin 950 SoC. This combination gives it better system performance and power efficiency than A57-based SoCs, including Qualcomm's Snapdragon 810 and Samsung's Exynos 7420. Kirin 950 also looks like it will be competitive with Snapdragon 820, at least on non-GPU related tasks, with each SoC having an edge in certain workloads.
At first glance, HiSilicon's decision to use the Mali-T880 in a quad-core configuration looks puzzling. While the T880's one additional ALU and higher max frequency helps keep it within 15 to 30 percent of the Exynos 7420's octa-core T760 GPU in shader-heavy games, having only half as many ROPs, texture units, and triangle units hurts peak performance over a wide range of gaming workloads, allowing the Exynos 7420 to extend its lead to around 60 percent. In all of our gaming benchmarks, the Kirin 950 performed more like a mid-range SoC.
A closer look, however, reveals HiSilicon's logic. Sure, the Kirin 950 is not going to wow anyone with its peak performance, but our tests show that its gaming stability is excellent, able to sustain near max performance over long periods of time. When paired with the Mate 8's 1080p display, the Kirin 950 actually performs better than the Exynos 7420 in the Galaxy S6 after a short period of time, because high temperature forces the S6 to throttle back its GPU frequency. This means the Kirin 950 and Mate 8 should not have any issues playing real-world games. We do not think the Kirin 950's Mali-T880MP4 GPU is powerful enough to drive QHD displays, however, limiting its use in some high-end flagships. The smaller GPU core and hybrid LPDDR3/LPDDR4 memory controller does make the Kirin 950 a high-performing mid-range SoC option from a cost standpoint, though.
The Kirin 950 seems a good fit for Huawei's Mate 8, which set new performance and battery life records in PCMark, our best benchmark for predicting real-world behavior. These tests corroborate our own first-hand experience: the Mate 8's UI was very fluid, Web pages loaded and scrolled quickly, and it just felt very fast overall. There's still several things we have not looked at yet, such as the display and cameras, but the Mate 8 seems like the real deal when it comes to performance and battery life.
Update, 1/19/16, 6:50am PT: Clarified Huawei's growth statistics in first paragraph.