Snapdragon S4 Pro: Krait And Adreno 320, Benchmarked

Qualcomm's Snapdragon S4 Line-Up: Krait CPUs And Adreno Graphics

Qualcomm's product portfolio is both deep and wide. Its mobile SoCs in the Snapdragon family stretch back to 2008, when the S1 platform was first made available. Now, in 2012, we're looking at the S4 series, indicating the company's fourth generation. 

You'll find four product families under Qualcomm's S4 umbrella, each consisting of individual chips organized in such a way as to address specific workloads.

S4 Prime, for example, is being positioned as a solution for smart TVs and set-top boxes. The MPQ8064 SoC is the only component under the S4 Prime moniker, boasting a quad-core Krait architecture with Adreno 320 graphics.

The focus of today's story, S4 Pro, includes a couple of different components: MSM8960T and APQ8064, the former featuring a dual-core Krait-based processor and the latter equipped with four cores. Both are 28 nm components with the same high-end Adreno 320 graphics engine. Whereas the MSM8960T part features an integrated cellular radio, the APQ8064 does not.

S4 Plus and Play, intended for smartphones and tablets, are composed of an additional 14 SoCs with and without built-in modems. 

In Qualcomm's hierarchy, S4 Pro is the highest-end platform you'll see used in mobile devices, and so it makes sense that the company built its mobile development platforms using an APQ8064, and that's what we have in the lab today.

Although it takes the second spot in the S4 line-up, the Pro segment is certainly still a performance-oriented part. As mentioned, the APQ8064 features a quad-core Krait-based processor operating between 1.5 and 1.7 GHz. Qualcomm couldn't get us access to a block diagram of the APQ8064, so imagine the shot of the MSM8960 above with a much smaller modem subsystem (no cellular radio, just Wi-Fi and Bluetooth), and an additional pair of cores.

Each core has 16 KB of L1 data and 16 KB of L1 instruction cache, and each pair of cores shares a 1 MB L2 cache. Qualcomm's Krait-based cores succeed the Scorpion-based design that we first covered in Third-Generation Snapdragon: The Dual-Core Scorpion. In the table below, we drill down into more granular specifics of the Krait and Scorpion architectures, comparing them to ARM's Cortex-A9 and Cortex-A15 core designs.

Swipe to scroll horizontally
Architecture ComparisonCortex-A9Cortex-A15ScorpionKrait
Pipeline DepthEight-Stages15/17-24-Stage(Integer/FPU)10-Stage11-Stage
Out-of-Order ExecutionYesYesPartialYes
Fab Node45/30/32 nm32/28 nm65/45 nm28 nm
Core ConfigurationsSingle, Dual, QuadDual, QuadSingle, DualDual, Quad
CacheL1: 32 KB + 32 KBL2: 1 MBL1: 32 KB + 32 KBL2: 4 MB maxL1: 32 KB + 32 KB L2: 256 kB (per core)L1: 16 KB + 16 KB L2: 1 MB (per dual-core)
DMIPS/MHz2.53.52.53.3

Unlike many of its competitors, Qualcomm is unique in that it employs custom processor design based on ARM IP, investing considerable time and money developing its own cores. For example, its Scorpion design employs the same ARMv7-A architecture used by the Cortex-A8 and -A9 cores. However, Qualcomm's specific implementation breaks the instruction pipeline down into a different number of stages, utilizes non-speculative out-of-order execution, and offers 128-bit SIMD functionality. Featuring a lot of in-house work, Scorpion is easily differentiated from the standard Cortex-A9, which helps explain certain benchmark victories.

Krait improves performance tangibly through increased complexity (due in no small part, we imagine, to a smaller 28 nm process node). Each core can now decode up to three instructions per clock cycle (up from two), similar to the Cortex-A15 design. Its integer pipeline is now 11 stages long, though, which is one stage longer than Scorpion's, but not as long as the -A15's 15-stage implementation. In practice, the longer pipeline should translate into a clock rate advantage 

Qualcomm also enables Krait with the ability to run each core's clock rate asymmetrically. This helps facilitate power savings in applications where all of the SoC's compute resources aren't needed. Useful though it may be, this isn't a new feature. The Scorpion core featured it as well, and Nvidia's Tegra 3 leans on the same principle for its fifth companion core.

  • blackmagnum
    Apple: Use this for the next iPad and all will be forgiven.
    Reply
  • shotgunz
    Naw, more like give foxconn workers better salary, do something good with mountain of money they have, stop patent trolling, stop silly war with google/samsung and stop lieing. Then maybe Apple will be forgiven.
    Reply
  • luciferano
    shotgunzNaw, more like give foxconn workers better salary, do something good with mountain of money they have, stop patent trolling, stop silly war with google/samsung and stop lieing. Then maybe Apple will be forgiven.
    Well, it'd be a start. I wouldn't go nearly as far as all is forgiven.
    Reply
  • mayankleoboy1
    No comparison with Samsung exynos4 ?
    Reply
  • mayankleoboy1
    AFAIUI, the PowerVR GPU in ipad3 is more of a brute force architecture. "Just throw more transistors" is its mantra. So its good in current workloads.
    The Adreno320 is more refined and optimised arch. Trying to get the most performance from least silicon area. It is still being refined. Hence, it will do well in future applications.
    Reply
  • esrever
    This just makes the tegra 3 look bad.
    Reply
  • Why the ST-ericsson 8500 isn't in this list either? too few phone out? Xperia S/P... looks it would rock some of concurrents.. omap..S3...
    Please be as exaustive as possible ;)
    Reply
  • Error in the chart on second page.
    The cortex A15 DMIPS/MHz should read above the A9. Around 3.5 DMIPS/MHz from the rumblings.
    Reply
  • These krait numbers aren't very impressive if you normalize for clockspeeds.
    In fact, they seem to suggest only a very small improvement over A9, if any at all.
    Reply
  • mayne92
    Very nice review Andrew!
    Reply