GPU And Gaming Performance

This section explores GPU performance with several synthetic and real-world game engine tests. To learn more about how these benchmarks work, what versions we use, or our testing methodology, please read our article about how we test mobile device GPU performance.

HiSilicon's Kirin 950 uses an ARM Mali-T880MP4 GPU. While the architectural differences between the T880 and the model it replaces, the T760, are not completely known, we do know that the T880 includes three ALU units per core versus two ALUs per core for the T760.

SoC Kirin 950 Exynos 7420 Apple A8 Apple A9 GPU ARM Mali-T880MP4 ARM Mali-T760MP8 PowerVR GX6450 PowerVR GT7600 Number of "cores" 4 8 4 6 FP32 ALUs per "core" 2 3 32 32 FP16 ALUs per "core" ✗ ✗ 64 64 Total FP32 FLOPS/cycle 120 160 256 384 Total FP16 FLOPS/cycle 216 288 512 768 Pixels/cycle 4 8 8 12 Texels/cycle 4 8 8 12

The table above summarizes the relevant current and previous generation GPUs from ARM and Imagination Technologies. Differences in architecture and nomenclature make it difficult to directly compare GPU hardware, so it focuses more on end performance. Qualcomm's Adreno GPUs are not included, because the company does not disclose details about its architecture.

Mali's Midgard differs from other GPU architectures in several ways. For starters, Midgard's vector SIMDs (single instruction multiple data) rely exclusively on instruction level parallelism (ILP) to keep its ALUs full of instructions. Competing GPU architectures are not vector based, and use a combination of ILP and thread level parallelism (TLP). There's pros and cons for both methods, making it difficult to say which offers better performance, but it ultimately depends on the workload.

The table highlights another difference between the Mali Midgard, PowerVR Rogue, and Qualcomm's Adreno architectures: IPC. Midgard performs fewer operations per cycle than its peers, but ramps up to a higher max frequency. The Kirin 950 in the Mate 8, for example, runs at up to 900MHz, resulting in 108 GFLOPS of FP32 throughput. This trails the A8's 115 GFLOPS (6 percent), Exynos 7420's 124 GFLOPS (15 percent), and the A9's more than 173 GFLOPS (exact GPU frequency unknown).

Here we see the Mate 8's Kirin 950 performing very much like a mid-range device rather than a flagship, its quad-core GPU besting only the PowerVR G6200 in the MediaTek Helio X10. The Mate 8 actually does pretty well in the first graphics test that focuses on vertex operations (with minimal pixel processing), essentially tying with the Galaxy S6 and performing 12 percent better than the Moto X Pure Edition's Adreno 418 GPU. It's in the second graphics test, which focuses heavily on pixel operations, where it falls behind.

The Mate 8's Physics score is surprisingly low. Since this tests CPU and memory performance, the Mate 8 should score the same as or a little better than the Galaxy S6. Instead, the S6, with its A57 CPU cores, scores 31 percent better than the Mate 8's higher clocked A72 cores. Unfortunately, we had to return our review unit before we could investigate this further. Without a closer look (and a device running final software), we do not want to make too much of this result, but it warrants further investigation.

In Basemark X, which runs on the Unity 4.2.2 game engine and uses OpenGL ES 2.0, the Galaxy S6 is 39 percent faster overall than the Mate 8. Most of its advantage comes from Dunes, a test that uses a lot of triangles (more than any of our other benchmarks), where it's 73 percent faster. HiSilicon's decision to use a quad-core GPU is definitely holding it back in this test; however, like all benchmarks, Basemark X puts more sever load on the GPU than actual games. When testing the LG G4 and its Snapdragon 808 SoC, for example, we found it played a number of modern games just fine. Considering that the Mate 8 actually performs better than the LG G4 in both offscreen and onscreen tests, and typical games do not use anywhere near as many triangles as Basemark X, the Mate 8's lower triangle performance should not be a severe limitation.

The Mate 8's 1080p display also works in its favor. By avoiding the greater rendering overhead of QHD, its onscreen Hangar results are actually better than the Galaxy S6's.

The Mali-T880MP4 GPU in the Kirin 950 really starts to struggle in the high quality test. With twice as many cores (giving it twice as much triangle and texturing throughput), the Mali-T760MP8 in the Exynos 7420 extends its lead to 70 percent overall (65 percent Dunes and 75 percent Hangar) when running offscreen.

GFXBench Manhattan runs on an OpenGL ES 3.0 based game engine that uses deferred rendering for its lighting effects. The PowerVR GT7600 in Apple's A9 and Adreno 530 in Qualcomm's Snapdragon 820 flex their ALU muscles, powering through the test's pixel operations. The Exynos 7420 in the Galaxy S6 also outperforms the Mate 8 by 35 percent. The Kirin 950 continues to outshine the Snapdragon 808 devices, though.

In the onscreen test, the Mate 8 pulls ahead of the Galaxy S6, since it only needs to render about half as many pixels.

Despite using an older OpenGL ES 2.0 based engine, we see essentially the same results in T-Rex. The Galaxy S6 is only 27 percent faster in the offscreen test than the Mate 8, and the Kirin 950 still holds a slim lead over the Snapdragon 808.

The limited number of ROPs in its quad-core GPU really holds the Kirin 950 and Mate 8 back in the Alpha Blending test, allowing the Snapdragon 808 to jump ahead of it. There's something amiss with the Galaxy S6's Alpha Blending results: The offscreen values are lower than the onscreen values, which is the opposite of what we should see. This appears to be a driver issue specific to the Galaxy S6, so we'll just ignore its results in this test.

The table at the top of this page documents the comparatively low FP32 ALU throughput of the Mali-T880MP4 GPU in Kirin 950, so it's no surprise to see it land near the bottom of the ALU performance chart. The Galaxy S6 scores 30 percent better than the Mate 8. Even the more budget friendly ZenFone 2 boasts better performance here. Qualcomm has worked steadily over the past several generations to boost the ALU performance of its Adreno GPUs, which is evident in this test. In the ALU onscreen test, nearly all of the devices are capped at the 60fps vsync limit.

While the T880 benefits from one additional ALU per core versus the T760, the Mali Midgard architecture makes no similar provision for texture units; the T880 still only has one per core. With twice as many texture units, the Galaxy S6 is 59 percent faster than the Mate 8 in the Fill test.

The Driver Overhead test measures CPU performance and driver efficiency by making a large number of draw calls. It's curious then to see the Mate 8 fall to the bottom of the chart, considering its strong CPU performance. The Mate 8 we're testing is still running prerelease software, so perhaps this is a driver issue that's left to be fixed.

