Sign in with
Sign up | Sign in

Nitty Gritty: CPU Core Performance, Per Clock

Snapdragon S4 Pro: Krait And Adreno 320, Benchmarked
By

Performance Per Clock Cycle

Up until now, we've compared the performance of different SoCs in different devices. But we know even just from Qualcomm's APQ8064 spec sheet that the Krait CPU cores can be made to run from 1.5 to 1.7 GHz. And we've seen Tegra 3 running from 1.2 to 1.6 GHz.

So, the conclusions we draw about the devices in our lab can't automatically be applied to other tablets or smartphones, particularly if their SoCs operate at higher or lower frequencies. That's precisely why Sandra's Core Performance Per Clock index is valuable: it lets us drill down one level more from performance-per-core to performance-per-core at a constant clock rate.

Core Performance At A Given Clock Rate

OMAP 4430
Tegra 3 (T30L) S3 (APQ8060)
S4 Plus (MSM8960)
S4 Pro (APQ8064)
CPU
Two Cortex-A9 Cores @ 1 GHzFour Cortex-A9 Cores @ 1.3 GHzTwo Scorpion Cores @ 1.2 GHzTwo Krait Cores @ 1.5 GHz
Four Krait Cores @ 1.5 GHz
Native Arithmetic
(MOPS/MHz)
0.230.21
0.15
0.20
0.20
Native Multi-media
(kPix/s/MHz)
1.151.14
1.37
1.69
1.60
Java Arithmetic
(MOPS/MHz)
0.045
0.043
0.035
0.057
0.051
Memory
(MB/s/MHz)
.301
0.19
0.53
1.10
0.75


Qualcomm's Krait processor architecture certainly does well, but it relies largely on its 1.5 GHz clock rate (at least in our mobile development platform) to exert its advantage over the OMAP 4430. Per cycle, TI's SoC actually has an advantage.

Of course, that's not to detract from what Qualcomm is achieving with its APQ8064. The company designed this SoC to run at 1.5 GHz at least. TI's chip operates between 1 and 1.2 GHz. So, even if it does achieve slightly better arithmetic performance per cycle, it's specific implementation simply cannot catch the more modern Krait-based design.

Core Performance At A Given Clock Rate: Arithmetic

OMAP 4430Tegra 3 (T30L) S3 (APQ8060)S4 Plus (MSM8960)
S4 Pro (APQ8064)
CPU
Two Cortex-A9 Cores @ 1 GHzFour Cortex-A9 Cores @ 1.3 GHzTwo Scorpion Cores @ 1.2 GHzTwo Krait Cores @ 1.5 GHzFour Krait Cores @ 1.5 GHz
Dhrystone (MIPS/MHz)
2.34
2.211.92
2.55
2.64
Whetstone Double
(FLOPS/MHz)
0.0230.021
0.012
0.15
0.015
Whetstone Float
(FLOPS/MHz)
0.0310.029
0.016
0.16
0.022
Whetstone Float/Double
(FLOPS/MHz)
0.026
0.0250.011
0.15
0.018


Breaking out the Arithmetic sub-test, we can get inside the OMAP 4430's advantage, which was reflected in the first table. Although Qualcomm's APQ8064 achieves superior integer performance per cycle, its showing in the floating-point-based Whetstone metric is consistently worse than TI's. 

Again, though, these results are completely synthetic. The OMAP 4430 and APQ8064 will never be made to compete at the same clock rate. We're simply interested in where each architecture derives its strengths.

Core Performance At A Given Clock Rate: Multi-media

OMAP 4430Tegra 3 (T30L) S3 (APQ8060)S4 Plus (MSM8960)
S4 Pro (APQ8064)
CPU
Two Cortex-A9 Cores @ 1 GHzFour Cortex-A9 Cores @ 1.3 GHzTwo Scorpion Cores @ 1.2 GHzTwo Krait Cores @ 1.5 GHzFour Krait Cores @ 1.5 GHz
Multi-media Integer [NEON]
 (kPix/s/MHz)
1.15
1.141.231.34
1.38
Multi-media Float [NEON]
 (kPix/s/MHz)
1.161.09
1.53
2.13
1.81
Multi-media Double [FPU]
 (kPix/s/MHz)
0.560.54
0.40
0.33
0.42
Multi-media Float/Double
 (kPix/s/MHz)
0.80
0.770.77
0.83
0.87


When we perform the same exercise with Sandra's Multi-media module, we see where the Krait architecture earns its advantage over Scorpion, first, and the OMAP's Cortex-A9 cores, second.

Particularly when it's able to exploit ARM's NEON 64- and 128-bit instruction set, Krait dominates handily. Only when Sandra drops back to measuring performance using the Vector Floating Point mode does Qualcomm's latest cede its lead. Not that you should be worried; NEON is far more powerful, making it a more likely instruction set to see in real-world apps.

Multi-Core Efficiency


Many years ago, Intel and AMD stopped emphasizing fast single-core desktop processors and started designing CPUs with multiple cores per package. Software developers had to learn how to exploit those duplicated resources in order to extract some benefit from them.

The same thing is happening in the mobile space as multi-core SoCs facilitate parallelism in power-optimized architectures. As on the desktop, though, the performance of a dual- or quad-core chip doesn't scale linearly. Synthetic measurements make it possible for us to get a best-case scaling number, but the real-world is far less exact.

This is partly a result of how cores work together. Threaded apps involve data sharing between cores. If this isn't done efficiently, performance drops. Lots of bandwidth and low latency are important. TI's OMAP 4430 is able to move the most data per second between its cores, while Nvidia's Tegra 3 follows closely behind, instead standing out for its minimal latency.

Ask a Category Expert

Create a new thread in the Reviews comments forum about this subject

Example: Notebook, Android, SSD hard drive

Display all 32 comments.
This thread is closed for comments
Top Comments
  • 20 Hide
    shotgunz , October 11, 2012 5:03 AM
    Naw, more like give foxconn workers better salary, do something good with mountain of money they have, stop patent trolling, stop silly war with google/samsung and stop lieing. Then maybe Apple will be forgiven.
  • 17 Hide
    mayankleoboy1 , October 11, 2012 6:01 AM
    No comparison with Samsung exynos4 ?
  • 12 Hide
    darkchazz , October 11, 2012 9:59 AM
    Tegra 3 is complete overhyped trash. abysmal memory bandwidth and weak GPU performance with useless extra cores.
    Why isn't the exynos quad in the comparison?
Other Comments
  • 20 Hide
    shotgunz , October 11, 2012 5:03 AM
    Naw, more like give foxconn workers better salary, do something good with mountain of money they have, stop patent trolling, stop silly war with google/samsung and stop lieing. Then maybe Apple will be forgiven.
  • 12 Hide
    luciferano , October 11, 2012 5:33 AM
    shotgunzNaw, more like give foxconn workers better salary, do something good with mountain of money they have, stop patent trolling, stop silly war with google/samsung and stop lieing. Then maybe Apple will be forgiven.


    Well, it'd be a start. I wouldn't go nearly as far as all is forgiven.
  • 17 Hide
    mayankleoboy1 , October 11, 2012 6:01 AM
    No comparison with Samsung exynos4 ?
  • 8 Hide
    mayankleoboy1 , October 11, 2012 6:01 AM
    AFAIUI, the PowerVR GPU in ipad3 is more of a brute force architecture. "Just throw more transistors" is its mantra. So its good in current workloads.
    The Adreno320 is more refined and optimised arch. Trying to get the most performance from least silicon area. It is still being refined. Hence, it will do well in future applications.
  • 3 Hide
    esrever , October 11, 2012 6:30 AM
    This just makes the tegra 3 look bad.
  • -4 Hide
    Anonymous , October 11, 2012 7:47 AM
    Why the ST-ericsson 8500 isn't in this list either? too few phone out? Xperia S/P... looks it would rock some of concurrents.. omap..S3...
    Please be as exaustive as possible ;) 
  • 0 Hide
    Anonymous , October 11, 2012 7:58 AM
    Error in the chart on second page.
    The cortex A15 DMIPS/MHz should read above the A9. Around 3.5 DMIPS/MHz from the rumblings.
  • 9 Hide
    Anonymous , October 11, 2012 8:13 AM
    These krait numbers aren't very impressive if you normalize for clockspeeds.
    In fact, they seem to suggest only a very small improvement over A9, if any at all.
  • 0 Hide
    mayne92 , October 11, 2012 8:15 AM
    Very nice review Andrew!
  • 9 Hide
    Memnarchon , October 11, 2012 9:39 AM
    Why placing only the weakest version of Tegra 3 and not Tegra 3 T33 (ASUS Transformer Pad Infinity 700)??? Or both of them.
  • 12 Hide
    darkchazz , October 11, 2012 9:59 AM
    Tegra 3 is complete overhyped trash. abysmal memory bandwidth and weak GPU performance with useless extra cores.
    Why isn't the exynos quad in the comparison?
  • 3 Hide
    dgingeri , October 11, 2012 12:08 PM
    This is a pretty good article, but I have one problem with it: the Tegra 3 is probably not the best example for the Cortex-A9. It's hobbled with very low memory bandwidth. It could probably perform better (including the GPU) with a better memory interface. Nvidia made a major design mistake with that.
  • 0 Hide
    jaquith , October 11, 2012 2:05 PM
    Nice article.

    You might want to fix (bold) the 'Aggregate Performance Per Core' chart; note the Plus & Pro numbers.
  • 4 Hide
    fulle , October 11, 2012 2:54 PM
    mayankleoboy1AFAIUI, the PowerVR GPU in ipad3 is more of a brute force architecture. "Just throw more transistors" is its mantra. So its good in current workloads. The Adreno320 is more refined and optimised arch. Trying to get the most performance from least silicon area. It is still being refined. Hence, it will do well in future applications.


    No. PowerVR holds some very specific advantages over there competition... Such as shader-driven tile-based deferred rendering (TBDR) architecture.... and their GPUs are actually MORE efficient than their competition. Not less.

    It's still very impressive what Qualcomm's done... I'm just not thinking they'll dethrone PowerVR anytime soon as the mobile graphics performance leader.
  • -4 Hide
    ddpruitt , October 11, 2012 3:21 PM
    It's interesting that the S4 needs 1.5 Ghz to beat the Tegra. And that's a high end Snapdragon vs bottom of the line Tegra. I'd bet that at the same clock rate with a level playing field these two would trade blows. This tells me that the Snapdragon is still way behind Tegra. S4 isn't in any production hardware and Nvidia is going to release the next Tegra chips in a few months.

    When Tegra and Snapdragon devices are available at the same time I'd bet that Tegra will handily beat Snapdragon. I wouldn't say Qualcomm is back in the fight, I would say this is Qualcomm's last desperate attempt to stay relevent
  • 1 Hide
    Bricktop , October 11, 2012 5:27 PM
    Good Review. I'm curious as to how accurate your "aggregate" single core performance numbers are to a real single threaded benchmark. I'm not an expert, but I thought the memory controller and GPU were separate from the CPU Cores. Can you simply divide those benchmark numbers by 4 to get single core memory and multimedia performance?
  • 0 Hide
    edlivian , October 11, 2012 5:27 PM
    this soc shall be at the heart of my next phone.

    Google, please allow the next nexus to be powerful, and have expandable memory.

    Enough imitating apples cloud dreams, we want local storage!!!!
    64GB microsd card is $48 on pace to be $36 by next year.
  • 3 Hide
    hameem_1 , October 11, 2012 6:16 PM
    this is so much great tosee Toms's hardaware doing mobile gpu benchmarks
  • 2 Hide
    marcel17 , October 11, 2012 8:01 PM
    Now , let's wait and see until some phones/tablets come on the market with the S4Pro .
    BTW , why are the Exynos4 and the A6 missing from the review ? I think they are the most relevant competition to the S4 .
Display more comments