Snapdragon S4 Pro: Krait And Adreno 320, Benchmarked

Nitty Gritty: CPU Core Performance, Per Clock

Performance Per Clock Cycle

Up until now, we've compared the performance of different SoCs in different devices. But we know even just from Qualcomm's APQ8064 spec sheet that the Krait CPU cores can be made to run from 1.5 to 1.7 GHz. And we've seen Tegra 3 running from 1.2 to 1.6 GHz.

So, the conclusions we draw about the devices in our lab can't automatically be applied to other tablets or smartphones, particularly if their SoCs operate at higher or lower frequencies. That's precisely why Sandra's Core Performance Per Clock index is valuable: it lets us drill down one level more from performance-per-core to performance-per-core at a constant clock rate.

Swipe to scroll horizontally
Core Performance At A Given Clock Rate
Row 0 - Cell 0 OMAP 4430Tegra 3 (T30L)S3 (APQ8060)S4 Plus (MSM8960)S4 Pro (APQ8064)
CPUTwo Cortex-A9 Cores @ 1 GHzFour Cortex-A9 Cores @ 1.3 GHzTwo Scorpion Cores @ 1.2 GHzTwo Krait Cores @ 1.5 GHzFour Krait Cores @ 1.5 GHz
Native Arithmetic(MOPS/MHz)0.230.210.150.200.20
Native Multi-media(kPix/s/MHz)1.151.141.371.691.60
Java Arithmetic(MOPS/MHz)0.0450.0430.0350.0570.051
Memory (MB/s/MHz).3010.190.531.100.75

Qualcomm's Krait processor architecture certainly does well, but it relies largely on its 1.5 GHz clock rate (at least in our mobile development platform) to exert its advantage over the OMAP 4430. Per cycle, TI's SoC actually has an advantage.

Of course, that's not to detract from what Qualcomm is achieving with its APQ8064. The company designed this SoC to run at 1.5 GHz at least. TI's chip operates between 1 and 1.2 GHz. So, even if it does achieve slightly better arithmetic performance per cycle, it's specific implementation simply cannot catch the more modern Krait-based design.

Swipe to scroll horizontally
Core Performance At A Given Clock Rate: Arithmetic
Row 0 - Cell 0 OMAP 4430Tegra 3 (T30L)S3 (APQ8060)S4 Plus (MSM8960)S4 Pro (APQ8064)
CPUTwo Cortex-A9 Cores @ 1 GHzFour Cortex-A9 Cores @ 1.3 GHzTwo Scorpion Cores @ 1.2 GHzTwo Krait Cores @ 1.5 GHzFour Krait Cores @ 1.5 GHz
Dhrystone (MIPS/MHz)2.342.211.922.552.64
Whetstone Double (FLOPS/MHz)0.0230.0210.0120.150.015
Whetstone Float (FLOPS/MHz)0.0310.0290.0160.160.022
Whetstone Float/Double(FLOPS/MHz)0.0260.0250.0110.150.018

Breaking out the Arithmetic sub-test, we can get inside the OMAP 4430's advantage, which was reflected in the first table. Although Qualcomm's APQ8064 achieves superior integer performance per cycle, its showing in the floating-point-based Whetstone metric is consistently worse than TI's. 

Again, though, these results are completely synthetic. The OMAP 4430 and APQ8064 will never be made to compete at the same clock rate. We're simply interested in where each architecture derives its strengths.

Swipe to scroll horizontally
Core Performance At A Given Clock Rate: Multi-media
Row 0 - Cell 0 OMAP 4430Tegra 3 (T30L)S3 (APQ8060)S4 Plus (MSM8960)S4 Pro (APQ8064)
CPUTwo Cortex-A9 Cores @ 1 GHzFour Cortex-A9 Cores @ 1.3 GHzTwo Scorpion Cores @ 1.2 GHzTwo Krait Cores @ 1.5 GHzFour Krait Cores @ 1.5 GHz
Multi-media Integer [NEON] (kPix/s/MHz)1.151.141.231.341.38
Multi-media Float [NEON] (kPix/s/MHz)1.161.091.532.131.81
Multi-media Double [FPU] (kPix/s/MHz)0.560.540.400.330.42
Multi-media Float/Double (kPix/s/MHz)0.800.770.770.830.87

When we perform the same exercise with Sandra's Multi-media module, we see where the Krait architecture earns its advantage over Scorpion, first, and the OMAP's Cortex-A9 cores, second.

Particularly when it's able to exploit ARM's NEON 64- and 128-bit instruction set, Krait dominates handily. Only when Sandra drops back to measuring performance using the Vector Floating Point mode does Qualcomm's latest cede its lead. Not that you should be worried; NEON is far more powerful, making it a more likely instruction set to see in real-world apps.

Multi-Core Efficiency


Many years ago, Intel and AMD stopped emphasizing fast single-core desktop processors and started designing CPUs with multiple cores per package. Software developers had to learn how to exploit those duplicated resources in order to extract some benefit from them.

The same thing is happening in the mobile space as multi-core SoCs facilitate parallelism in power-optimized architectures. As on the desktop, though, the performance of a dual- or quad-core chip doesn't scale linearly. Synthetic measurements make it possible for us to get a best-case scaling number, but the real-world is far less exact.

This is partly a result of how cores work together. Threaded apps involve data sharing between cores. If this isn't done efficiently, performance drops. Lots of bandwidth and low latency are important. TI's OMAP 4430 is able to move the most data per second between its cores, while Nvidia's Tegra 3 follows closely behind, instead standing out for its minimal latency.

  • blackmagnum
    Apple: Use this for the next iPad and all will be forgiven.
    Reply
  • shotgunz
    Naw, more like give foxconn workers better salary, do something good with mountain of money they have, stop patent trolling, stop silly war with google/samsung and stop lieing. Then maybe Apple will be forgiven.
    Reply
  • luciferano
    shotgunzNaw, more like give foxconn workers better salary, do something good with mountain of money they have, stop patent trolling, stop silly war with google/samsung and stop lieing. Then maybe Apple will be forgiven.
    Well, it'd be a start. I wouldn't go nearly as far as all is forgiven.
    Reply
  • mayankleoboy1
    No comparison with Samsung exynos4 ?
    Reply
  • mayankleoboy1
    AFAIUI, the PowerVR GPU in ipad3 is more of a brute force architecture. "Just throw more transistors" is its mantra. So its good in current workloads.
    The Adreno320 is more refined and optimised arch. Trying to get the most performance from least silicon area. It is still being refined. Hence, it will do well in future applications.
    Reply
  • esrever
    This just makes the tegra 3 look bad.
    Reply
  • Why the ST-ericsson 8500 isn't in this list either? too few phone out? Xperia S/P... looks it would rock some of concurrents.. omap..S3...
    Please be as exaustive as possible ;)
    Reply
  • Error in the chart on second page.
    The cortex A15 DMIPS/MHz should read above the A9. Around 3.5 DMIPS/MHz from the rumblings.
    Reply
  • These krait numbers aren't very impressive if you normalize for clockspeeds.
    In fact, they seem to suggest only a very small improvement over A9, if any at all.
    Reply
  • mayne92
    Very nice review Andrew!
    Reply