Huawei Mate 8, Kirin 950, Cortex-A72 Benchmarks

CPU And System Performance

Now that we have a better understanding of the hardware, it's time to see how it performs. In this section, we'll evaluate system-level performance by running a series of synthetic and real-world workloads, along with some browser-based Web tests. If you're interested in learning more about how these benchmarks work, what versions we use, or our testing methodology, please read about how we test mobile device system performance.

The Huawei Mate 8 scores well overall in Basemark OS II, held back only by its GPU performance. In the OpenGL ES 2.0-based Graphics test, the Galaxy S6's Mali-T760MP8 easily outperforms the Mate 8's Mali-T880MP4 GPU by 54 percent, which is a larger margin than we would expect. This test does perform alpha blending and various texture operations, so it's possible having only half as many ROPs limits throughput compared to the Galaxy S6.

The Mate 8 performs much better in the CPU-centric System and Web tests. Only Apple's iPhone 6s Plus, buoyed by its Twister CPU's higher instructions per cycle (IPC), performs better. The octa-core Kirin 950 does finish ahead of Qualcomm's Snapdragon 820 by a small 7 percent margin, but keep in mind the 820's quad-core CPU is at a disadvantage in the multi-threaded portions of the test, and the 820's Kryo CPUs also run at a lower peak frequency. The fact that Snapdragon 820 finishes so close to the Kirin 950 despite these limitations seems to suggest that the Kryo CPU core has higher IPC than the A72, although we cannot pinpoint by how much from this test.

Compared to the Exynos 7420 in the Galaxy S6, the Mate 8's Kirin 950 scores about 11 percent higher in both the System and Web tests, a lower than expected result that basically mirrors the average difference in CPU clock frequency. Perhaps the A72's architectural improvements will have a bigger impact in our other tests.

While the Memory test purports to measure the speed of the internal NAND storage, it turns into more of a memory test on high-end devices due to how the OS uses a RAM cache to buffer storage access. Because this test does not work as intended, we cannot draw any definitive conclusions here.

The Mate 8 achieves the best overall score we've seen in AndEBench, narrowly defeating the Galaxy S6 and outpacing the Moto X Pure Edition by 30 percent. Its advantage comes primarily from the CoreMark-HPC CPU performance test, where its Kirin 950 outperforms the Exynos 7420 by 29 percent and Snapdragon 820 by 34 percent in a mixture of single- and multi-threaded integer and floating-point workloads. The newer revision of the Snapdragon 810 falls to the bottom of the chart, below the Snapdragon 808 that has two fewer A57 cores, because of thermal throttling. This exemplifies the A57's problems at 20nm: poor sustained performance. Unable to keep its four A57's, let alone both the A53 and A57 islands, running concurrently, the OnePlus 2 shuts down the A57's and relies only on the lower-power, lower-performing A53 cores. The combination of the relatively power-hungry A57 core and the 20nm planar process node results in a chip with a high power density and poor thermal performance. Being able to use FinFET and the power-optimized A72 CPU helps the Kirin 950 avoid this problem.

In both the streaming memory bandwidth and the memory latency tests, the Mate 8 performs similar to the Galaxy S6, not unexpected since they both use LPDDR4 RAM. The latest Snapdragon SoCs use a memory controller optimized for serial access patterns, which gives the 820 a big advantage in the memory bandwidth test (the 808 uses lower-bandwidth LPDDR3 RAM and the 810 cannot fully utilize the bandwidth of LPDDR4). This works against them, however, in the memory latency test that measures the time to complete a series of random memory operations. The memory controllers in the Kirin 950 and Exynos 7420 are more equal opportunity, favoring neither serial nor random patterns.

The Platform test mimics real-world workloads, testing CPU, memory, and storage performance. This test seems to use more random memory access patterns, which is why we see the BLU Pure XL and Asus ZenFone 2 perform quite well. The Mate 8 performs better than both of these devices and 17 percent better than the Galaxy S6. Storage performance for the Mate 8 is typical for an eMMC solution, trailing the S6's UFS 2.0 NAND, so its advantage in this test comes primarily from the A72 CPU.

Turning to the synthetic Geekbench suite, we get a clearer picture of the IPC differences between the A72, A57, and Qualcomm's Kryo CPU cores. Looking at the single-core overall scores first, the A72 in the Mate 8 performs 16 percent better in the integer tests than the A57 in the Galaxy S6, which turns into a mild 6 percent increase after accounting for the A72's 9.5 percent clock speed advantage. The A72 increases its lead to 25 percent (15 percent normalized) in floating-point workloads.

The small improvements over the A57 in these tests are not enough to match Kryo's performance, which outperforms the A72 by 12 percent (20 percent) in the integer tests and 32 percent (41 percent) in the floating-point tests, where the numbers in parentheses are the normalized differences after accounting for the A72's 7 percent clock speed advantage.

Geekbench 3 Pro Integer Results *

Swipe to scroll horizontally
TestKirin 950Exynos 7420Snapdragon 820
AES (single-core)848694(22.2%)796(6.5%)
AES (multi-core)33283568(-6.7%)2281(45.9%)
Twofish (single-core)19441741(11.7%)2128(-8.6%)
Twofish (multi-core)85598026(6.6%)6144(39.3%)
SHA1 (single-core)79576433(23.7%)9063(-12.2%)
SHA1 (multi-core)2866626286(9.1%)27406(4.6%)
SHA2 (single-core)23502118(11.0%)3111(-24.5%)
SHA2 (multi-core)1219010113(20.5%)8845(37.8%)
BZip2 Compress (single-core)16711397(19.6%)1808(-7.6%)
BZip2 Compress (multi-core)64245693(12.8%)5099(26.0%)
BZip2 Decompress (single-core)16711579(5.8%)1805(-7.4%)
BZip2 Decompress (multi-core)80926697(20.8%)4474(80.9%)
JPEG Compress (single-core)15841441(9.9%)1813(-12.6%)
JPEG Compress (multi-core)75577314(3.3%)5332(41.7%)
JPEG Decompress (single-core)20771932(7.5%)2504(-17.1%)
JPEG Decompress (multi-core)86687552(14.8%)6917(25.3%)
Sobel (single-core)16991539(10.4%)2404(-29.3%)
Sobel (multi-core)74387313(1.7%)6680(11.3%)
Lua (single-core)19781408(40.5%)1789(10.6%)
Lua (multi-core)79476672(19.1%)5139(54.6%)
Dijkstra (single-core)12881073(20.0%)1565(-17.7%)
Dijkstra (multi-core)47994768(0.7%)3923(22.3%)

* Values in parentheses are the percent advantage for Kirin 950

Taking a closer look at the individual Geekbench integer workloads reveals several instances where the A72 sees no performance gain relative to the A57 (the values in the table are not adjusted for the 9.5 percent clock speed difference), including BZip2 Decompress, Sobel, and the various JPEG operations. This is not a complete surprise, because the integer execution units are similar between the A72 and A57, with a new Radix-16 integer divider (doubling bandwidth over A57) and a 1-cycle CRC unit the primary improvements. Both the Sobel and JPEG workloads rely heavily on multiplication and addition, so they see no improvement when running on the A72. The impact of the A72's improved branch predictor is also minimized when running benchmarks like Geekbench that primarily run math operations in a tight loop.

There are a few workloads, however, where the A72's architectural improvements make a noticeable difference: Lua, Dijkstra, AES, and SHA1. These tests rely heavily on lookup tables and data structures and seem to benefit from the A72's higher cache bandwidth and expanded zero-cycle forwarding.

Qualcomm's Kryo CPU clearly holds an IPC advantage over the A72 in integer operations. In our Snapdragon 820 Performance Preview, we determined that Kryo has a single integer multiply/divide unit just like the A57 and A72; however, we estimate it has only a 3-cycle latency compared to the longer 4-cycle latency of the ARM cores.

While single-core IPC is still the best indicator of smartphone application performance, multi-core throughput is becoming increasingly important, especially on Android. Comparing the multi-core integer performance of Kirin 950 to Exynos 7420 reveals an interesting pattern: the multi-core results are the exact opposite of the single-core results. Namely, tests that see a small gain when running on a single core see a larger gain when running on multiple cores and vice versa. Since the tests that see the biggest single-core gains take advantage of the A72's improved cache bandwidth, it's possible that the Kirin 950's reliance on the CCI-400 interconnect, instead of the newer CCI-500 that ARM recommends for the A72, is limiting the A72's performance. We'll need to see an example of a big.LITTLE processor using CCI-500 to know for sure, though.

Geekbench 3 Pro Floating-Point Results *

Swipe to scroll horizontally
TestKirin 950Exynos 7420Snapdragon 820
BlackScholes (single-core)18931240(52.7%)2345(-19.3%)
BlackScholes (multi-core)83545663(47.5%)6944(20.3%)
Mandelbrot (single-core)20261178(72.0%)1947(4.1%)
Mandelbrot (multi-core)89736027(48.9%)6051(48.3%)
Sharpen Filter (single-core)18941599(18.4%)2828(-33.0%)
Sharpen Filter (multi-core)90616755(34.1%)8537(6.1%)
Blur Filter (single-core)16811440(16.7%)3297(-49.0%)
Blur Filter (multi-core)80256430(24.8%)9207(-12.8%)
SGEMM (single-core)1087953(14.1%)1440(-24.5%)
SGEMM (multi-core)36882847(29.5%)3136(17.6%)
DGEMM (single-core)818875(-6.5%)1350(-39.4%)
DGEMM (multi-core)32742383(37.4%)3101(5.6%)
SFFT (single-core)14021365(2.7%)1901(-26.2%)
SFFT (multi-core)65264815(35.5%)5306(23.0%)
DFFT (single-core)14161236(14.6%)1870(-24.3%)
DFFT (multi-core)50193827(31.1%)5212(-3.7%)
N-Body (single-core)20801406(47.9%)2255(-7.8%)
N-Body (multi-core)77914956(57.2%)6128(27.1%)
Ray Trace (single-core)23021660(38.7%)2429(-5.2%)
Ray Trace (multi-core)93666056(54.7%)7059(32.7%)

* Values in parentheses are the percent advantage for Kirin 950

In our detailed discussion of the A72 architecture we highlighted the A72's new Floating-Point/Advanced SIMD units, whose shorter pipeline lengths reduce execution latencies by up to 40 percent. There's also a new Radix-16 FP divider, which does more work per cycle, doubling bandwidth. These changes give the A72 a significant boost in floating-point performance over the A57 as shown in the table above. Other than two outliers (DGEMM and SFFT), the A72 performs anywhere from 4 percent (SGEMM) to 57 percent (Mandelbrot) better than the A57 after accounting for the difference in clock speed. 

For its Kryo CPU, Qualcomm made floating-point performance a priority, increasing IPC with a wider architecture that looks similar to Apple's Typhoon core in last year's A8 SoC. Even with latency minimized, the A72's narrower architecture cannot keep pace. It manages to almost pull even in Mandelbrot, gets less than half of Kryo's throughput executing Blur Filter, and finishes 41 percent behind overall after removing the A72's frequency advantage.

ARM's big.LITTLE philosophy of using multiple, narrower cores allows the Kirin 950 to pull ahead of the Snapdragon 820 in most of the multi-core tests. 

Unlike Geekbench, PCMark tests overall device performance by stressing the CPU, memory, and storage systems. Its lighter and more varied workloads are also affected by the device's CPU governor, just like other common apps, so it's indicative of real-world performance. 

The Mate 8 and its Kirin 950 SoC post the highest PCMark score of any device we've tested, outperforming the Moto X Pure Edition by 12 percent and the Galaxy S6 by 24 percent. Looking at the individual test scores, there's no obvious weak points for the Mate 8, which scores the highest in the Writing, Web Browsing, and Video Playback tests. It's only in the Photo Editing test, where image processing is supposed to occur on the GPU using the android.media.effect API, that it falls behind the Asus ZenFone 2 and Snapdragon 820 MDP.

The Mate 8 also does well in our JavaScript benchmarks, averaging 27 percent better than the Galaxy S6 in these three tests. Its performance is also very similar to the Snapdragon 820.

MORE: Best Smartphones
MORE: Smartphones in the News
MORE: All Smartphone Content

  • ak47jar3d
    Kirin 950 unfortunately disappoints again on the gpu end. The six core snapdragon 808 does better.
    Reply
  • megamanxtreme
    The SD 820 does horrible on the battery life tests, as it doesn't show anywhere. :(
    Reply
  • Nintendork
    4gpu cores are simply too low for their flagship SOC. 6gpu cores would've been better.
    Reply
  • MobileEditor
    The SD 820 does horrible on the battery life tests, as it doesn't show anywhere. :(

    The only SD 820 device we've tested so far is the Qualcomm MDP, which is the company's own development hardware. Because we had less than two hours to complete our testing, we were not able to collect any battery life data.

    - Matt Humrick, Mobile Editor, Tom's Hardware
    Reply
  • Onus
    The non-removable battery is a dealbreaker. Only a fool pays $700 for a device that may only have a two year service life. As fast as this device-space changes, will it even be possible to get a battery replacement in two years?
    Reply
  • bit_user
    17348494 said:
    ...
    Great analysis -- I've been waiting for this. Thanks!
    :)

    BTW, how did HiSilicon & Huawei get way out front of everyone else on the A72? That's a story I'd like to read.
    Reply
  • bit_user
    17351759 said:
    As fast as this device-space changes, will it even be possible to get a battery replacement in two years?
    Sure, why not? A flagship phone will probably be sold on to another owner. It will still be fast enough in 2 years, and there would probably be enough of them to justify a small battery market.

    Plus, I've had no trouble getting replacement batteries for lots of discontinued things - laptops, cameras, MP3 players, to name a few.
    Reply
  • bit_user
    17348857 said:
    Kirin 950 unfortunately disappoints again on the gpu end. The six core snapdragon 808 does better.
    But I see it mainly as a test of the A72. True, Kirin made a bad call on the GPU, but there will be plenty of A72-based SoC's that'll have similar CPU performance to this and possibly different GPUs, so I don't see the GPU performance as such a problem.

    Now, the only piece missing from this picture is Samsung's Exynos 8890, with their custom Mongoose core.
    Reply
  • kenjitamura
    Does this company release the source code for their android products? Googling this company and open source shows that at least they seem to put some effort into contributing to open source software but couldn't find if that policy extends past their networking operations. If they do comply with the licenses and release source code I'll gladly consider their products but if they're a POS company like Mediatek then I want no part of it and hope they don't manage to penetrate the US market.

    Seriously, the single most important factor to buying an Android product is whether or not the company behind them actually complies with the license for using the software and it feels like most people don't even consider that. If the company stops maintaining the device a few months down the road, as is the case with 99% of android devices from china, then you'll find yourself SoL and left with no more security patches or platform upgrades.
    Reply
  • zodiacfml
    Impressive as it beats Mediatek. But a few months from now, Samsung and Qualcomm will release theirs which have better GPUs and image processing.
    Reply