CPU And System Performance
Since this is our first look at Kryo, Qualcomm’s first custom-designed 64-bit CPU core, there are a number of outstanding questions. How does the dual-cluster, quad-core arrangement compare to Apple’s dual-core approach or the octa-core big.LITTLE processors common in smartphones today? Does Kryo use a narrower architecture similar to ARM’s A57/A72, or has Qualcomm gone wider and more complex like Apple’s A9? We’re also curious to see if Snapdragon 820 gets better memory performance than the 810.
Taking a look at the single- and multi-threaded CPU System test first, we see the four Kryo cores easily outpace the older Krait CPU cores and the all Cortex-A53 Helio X10 SoC. Snapdragon 820 also surpasses the hexa-core Snapdragon 808 and octa-core 810 by at least 19%. The margin of victory over the Exynos 7420, which makes better use of its four Cortex-A57 cores than the Snapdragon 810, is very small, however. Considering that the Exynos 7420 has an additional four A53 cores at its disposal, though, it appears that Kryo’s IPC (instructions per cycle) is a bit better than the A57’s. Apple’s A9 is 42% faster than the 820 here, suggesting that Kryo is a narrower architecture that has more in common with ARM’s A57 than Apple’s Twister CPU.
In the OpenGL ES 2.0 based Graphics test, the 820’s Adreno 530 GPU is well ahead of most of its competitors. The 59% advantage over the 810’s Adreno 430, using similar clock frequencies, is impressive. Only the PowerVR GT7600 in Apple’s recently released A9 gets close, pulling within 9%.
While the Memory test is meant to measure the speed of the internal NAND storage, on high-end devices it turns into a memory test due to how the OS uses a RAM cache to buffer storage access. It’s no surprise then to see the LPDDR4 based devices with higher throughput.
Snapdragon 820 manages to pull ahead overall in AndEBench, but that’s not as important as how it performs in the individual tests. In CoreMark-HPC, which measures both single- and multi-core CPU performance using a mixture of integer and floating-point workloads, Snapdragon 820’s Kryo cores perform well, but the Exynos 7420’s octa-core design gives it the edge when dealing with many threads.
The Snapdragon 810 struggles mightily in this synthetic test. Because of thermal throttling, it leaves the A57 cores idle and relies on the four A53 cores almost exclusively (click and scroll down for graph). With twice as many A53 cores, even the Helio X10 outperforms it. The 810 also falls behind the Snapdragon 808, which actually uses its two A57 cores, and only attains half the performance of the 820. While encouraging, we cannot draw any conclusions here about the 820’s heat generation for the reasons stated on the previous page.
The memory bandwidth results for Snapdragon 820 are nothing short of amazing, improving performance over the 810’s beleaguered memory controller by a factor of 2.6x. The 820 is even 62% faster than the Exynos 7420, which also uses LPDDR4 memory. We’ve noticed in previous testing that the memory controllers in Qualcomm’s latest SoCs—the Snapdragon 808 and 810—seem to be heavily optimized for serial memory access patterns like those used in the STREAM memory bandwidth tests included in AndEBench. Based on these results, it’s safe to say that the 820 extends this trend. This is why the 808, 810, and 820 do not perform as well in the memory latency test, which uses a randomly ordered data structure. In theory, this type of optimization should help improve performance when working with large, contiguous chunks of data, like high-resolution graphics textures and frame buffers or when processing pictures and video. Qualcomm is not the only company who’s gone this direction; Apple is also using serial-optimized memory controllers in its SoCs.
Once again we see the 820’s Adreno 530 GPU lead the pack in the 3D graphics test. This test is a bit more demanding than the older Basemark OS II test, but the Adreno 530 still performs 39% faster than the Adreno 430.
While we’re including the results from the Storage and Platform tests, they really are not relevant here, since we’re not concerned with NAND performance. It should be noted that the Snapdragon 820 MDP’s storage is encrypted, unlike all of the other tested devices. This could explain the MDP’s low storage performance (we were unable to determine if encryption was being accelerated in hardware), or the MDP just used slow NAND. The 820’s score in the Platform test is also affected by this, since the workload includes reading and writing files to internal storage.
Looking at single-core performance in Geekbench confirms what we’ve already seen in our other tests: Qualcomm’s Kryo CPU core is slower than Apple’s Twister CPU, but faster than ARM’s A57. After normalizing the clock frequency, Kryo’s integer performance is 27% faster than the A57. Apple’s Twister CPU ends up being 38% faster than Kryo. Based on what we know about the architecture for both Twister and A57—and after doing some back-of-the-envelope calculations—it looks like Kryo has a single integer multiply/divide unit with a 3-cycle latency. This is very similar to the A57 and A72, which also have a single integer multiply/divide unit, but with a longer 4-cycle latency.
In Geekbench’s single-core floating-point workloads, Kryo is 61% faster than the A57, which has two floating-point/NEON units, the same as A72. The A72 does see a 25-40% latency reduction for floating-point operations, among other tweaks, which should help close the gap, but we expect Kryo to maintain a slim lead over the A72 in floating-point performance. Apple’s Twister core is only 39% faster than Kryo in this test.
|Geekbench 3 Pro Memory Bandwidth (Single-Core)|
|STREAM Copy (GB/s)||STREAM Scale (GB/s)||STREAM Add (GB/s)||STREAM Triad (GB/s)|
Once again, Snapdragon 820 shows impressive memory bandwidth in the STREAM test. Whatever memory controller issues existed in Snapdragon 810 seem to have been remedied in the 820.
Moving on to the multi-core test, we see the 820 jump ahead of the A9 in floating-point performance, thanks to its two extra CPU cores. It does fall behind the octa-core SoCs in integer performance, but there really are not many real-world apps that use eight cores at once, so this is unlikely to adversely affect the user experience.
PCMark runs several realistic workloads and is very sensitive to CPU governor behavior. For this reason, its results tend to be more device dependent. While it’s encouraging to see Snapdragon 820 top the chart in overall PCMark performance, just keep in mind that these results will likely change a bit in shipping hardware.
Looking at the individual test scores, the 820 does well with video playback, but does not really stand out in the Web Browsing or Writing tests. Where it really shines, though, is in the Photo Editing test. Most of the image processing during this test is supposed to occur on the GPU using the android.media.effect API. In our review of the Asus ZenFone 2, which also performs well in this test, we were able to confirm that it does use the GPU as intended. All of the other devices we’ve tested, however, run the Photo Editing test on the CPU. While we were unable to confirm this in the short time we had for testing, the Snapdragon 820 seems to utilize the compute power of the GPU like the ZenFone 2, serving as an example for the benefits of heterogeneous computing.
While it’s unfair to compare the scores of the A9 to the other SoCs, we’re including the results to show the effect software plays in browser benchmarks. Since Apple controls the design of its hardware and operating system, it can heavily optimize its Safari web browser, resulting in substantial performance gains.
The older version of the Opera browser we use for testing Android devices is not nearly as optimized, resulting in lower scores across the board. Still, it’s useful for making hardware comparisons among Android devices. In all three of the browser benchmarks, Snapdragon 820 eeks out a slim lead.
Based on these initial tests, Qualcomm’s new Kryo CPU performs better than ARM’s Cortex-A57, but slower than Apple’s Twister. Integer performance is a little better than A57/A72, which implies that Kryo uses the same number of integer units as ARM’s best cores, but with better latency. Kryo pulls further ahead of the A57 in floating-point performance, and while it’s more difficult to ascertain architectural differences in floating-point units based on system-level tests, our data suggests Kryo looks more like the Typhoon core in Apple’s A8 rather than A57.