Sign in with
Sign up | Sign in

Qualcomm's Snapdragon S4 Line-Up: Krait CPUs And Adreno Graphics

Snapdragon S4 Pro: Krait And Adreno 320, Benchmarked
By

Qualcomm's product portfolio is both deep and wide. Its mobile SoCs in the Snapdragon family stretch back to 2008, when the S1 platform was first made available. Now, in 2012, we're looking at the S4 series, indicating the company's fourth generation. 

You'll find four product families under Qualcomm's S4 umbrella, each consisting of individual chips organized in such a way as to address specific workloads.

S4 Prime, for example, is being positioned as a solution for smart TVs and set-top boxes. The MPQ8064 SoC is the only component under the S4 Prime moniker, boasting a quad-core Krait architecture with Adreno 320 graphics.

The focus of today's story, S4 Pro, includes a couple of different components: MSM8960T and APQ8064, the former featuring a dual-core Krait-based processor and the latter equipped with four cores. Both are 28 nm components with the same high-end Adreno 320 graphics engine. Whereas the MSM8960T part features an integrated cellular radio, the APQ8064 does not.

S4 Plus and Play, intended for smartphones and tablets, are composed of an additional 14 SoCs with and without built-in modems. 

In Qualcomm's hierarchy, S4 Pro is the highest-end platform you'll see used in mobile devices, and so it makes sense that the company built its mobile development platforms using an APQ8064, and that's what we have in the lab today.

Although it takes the second spot in the S4 line-up, the Pro segment is certainly still a performance-oriented part. As mentioned, the APQ8064 features a quad-core Krait-based processor operating between 1.5 and 1.7 GHz. Qualcomm couldn't get us access to a block diagram of the APQ8064, so imagine the shot of the MSM8960 above with a much smaller modem subsystem (no cellular radio, just Wi-Fi and Bluetooth), and an additional pair of cores.

Each core has 16 KB of L1 data and 16 KB of L1 instruction cache, and each pair of cores shares a 1 MB L2 cache. Qualcomm's Krait-based cores succeed the Scorpion-based design that we first covered in Third-Generation Snapdragon: The Dual-Core Scorpion. In the table below, we drill down into more granular specifics of the Krait and Scorpion architectures, comparing them to ARM's Cortex-A9 and Cortex-A15 core designs.

Architecture Comparison
Cortex-A9
Cortex-A15
Scorpion
Krait
Pipeline Depth
Eight-Stages15/17-24-Stage
(Integer/FPU)
10-Stage11-Stage
Out-of-Order Execution
Yes
Yes
PartialYes
Fab Node
45/30/32 nm32/28 nm
65/45 nm28 nm
Core Configurations
Single, Dual, Quad
Dual, Quad
Single, Dual
Dual, Quad
Cache
L1: 32 KB + 32 KB
L2: 1 MB
L1: 32 KB + 32 KB
L2: 4 MB max
L1: 32 KB + 32 KB
L2: 256 kB (per core)
L1: 16 KB + 16 KB
L2: 1 MB (per dual-core)
DMIPS/MHz
2.5
3.5
2.5
3.3


Unlike many of its competitors, Qualcomm is unique in that it employs custom processor design based on ARM IP, investing considerable time and money developing its own cores. For example, its Scorpion design employs the same ARMv7-A architecture used by the Cortex-A8 and -A9 cores. However, Qualcomm's specific implementation breaks the instruction pipeline down into a different number of stages, utilizes non-speculative out-of-order execution, and offers 128-bit SIMD functionality. Featuring a lot of in-house work, Scorpion is easily differentiated from the standard Cortex-A9, which helps explain certain benchmark victories.

Krait improves performance tangibly through increased complexity (due in no small part, we imagine, to a smaller 28 nm process node). Each core can now decode up to three instructions per clock cycle (up from two), similar to the Cortex-A15 design. Its integer pipeline is now 11 stages long, though, which is one stage longer than Scorpion's, but not as long as the -A15's 15-stage implementation. In practice, the longer pipeline should translate into a clock rate advantage 

Qualcomm also enables Krait with the ability to run each core's clock rate asymmetrically. This helps facilitate power savings in applications where all of the SoC's compute resources aren't needed. Useful though it may be, this isn't a new feature. The Scorpion core featured it as well, and Nvidia's Tegra 3 leans on the same principle for its fifth companion core.

Ask a Category Expert

Create a new thread in the Reviews comments forum about this subject

Example: Notebook, Android, SSD hard drive

Display all 32 comments.
This thread is closed for comments
Top Comments
  • 20 Hide
    shotgunz , October 11, 2012 5:03 AM
    Naw, more like give foxconn workers better salary, do something good with mountain of money they have, stop patent trolling, stop silly war with google/samsung and stop lieing. Then maybe Apple will be forgiven.
  • 17 Hide
    mayankleoboy1 , October 11, 2012 6:01 AM
    No comparison with Samsung exynos4 ?
  • 12 Hide
    darkchazz , October 11, 2012 9:59 AM
    Tegra 3 is complete overhyped trash. abysmal memory bandwidth and weak GPU performance with useless extra cores.
    Why isn't the exynos quad in the comparison?
Other Comments
  • 20 Hide
    shotgunz , October 11, 2012 5:03 AM
    Naw, more like give foxconn workers better salary, do something good with mountain of money they have, stop patent trolling, stop silly war with google/samsung and stop lieing. Then maybe Apple will be forgiven.
  • 12 Hide
    luciferano , October 11, 2012 5:33 AM
    shotgunzNaw, more like give foxconn workers better salary, do something good with mountain of money they have, stop patent trolling, stop silly war with google/samsung and stop lieing. Then maybe Apple will be forgiven.


    Well, it'd be a start. I wouldn't go nearly as far as all is forgiven.
  • 17 Hide
    mayankleoboy1 , October 11, 2012 6:01 AM
    No comparison with Samsung exynos4 ?
  • 8 Hide
    mayankleoboy1 , October 11, 2012 6:01 AM
    AFAIUI, the PowerVR GPU in ipad3 is more of a brute force architecture. "Just throw more transistors" is its mantra. So its good in current workloads.
    The Adreno320 is more refined and optimised arch. Trying to get the most performance from least silicon area. It is still being refined. Hence, it will do well in future applications.
  • 3 Hide
    esrever , October 11, 2012 6:30 AM
    This just makes the tegra 3 look bad.
  • -4 Hide
    Anonymous , October 11, 2012 7:47 AM
    Why the ST-ericsson 8500 isn't in this list either? too few phone out? Xperia S/P... looks it would rock some of concurrents.. omap..S3...
    Please be as exaustive as possible ;) 
  • 0 Hide
    Anonymous , October 11, 2012 7:58 AM
    Error in the chart on second page.
    The cortex A15 DMIPS/MHz should read above the A9. Around 3.5 DMIPS/MHz from the rumblings.
  • 9 Hide
    Anonymous , October 11, 2012 8:13 AM
    These krait numbers aren't very impressive if you normalize for clockspeeds.
    In fact, they seem to suggest only a very small improvement over A9, if any at all.
  • 0 Hide
    mayne92 , October 11, 2012 8:15 AM
    Very nice review Andrew!
  • 9 Hide
    Memnarchon , October 11, 2012 9:39 AM
    Why placing only the weakest version of Tegra 3 and not Tegra 3 T33 (ASUS Transformer Pad Infinity 700)??? Or both of them.
  • 12 Hide
    darkchazz , October 11, 2012 9:59 AM
    Tegra 3 is complete overhyped trash. abysmal memory bandwidth and weak GPU performance with useless extra cores.
    Why isn't the exynos quad in the comparison?
  • 3 Hide
    dgingeri , October 11, 2012 12:08 PM
    This is a pretty good article, but I have one problem with it: the Tegra 3 is probably not the best example for the Cortex-A9. It's hobbled with very low memory bandwidth. It could probably perform better (including the GPU) with a better memory interface. Nvidia made a major design mistake with that.
  • 0 Hide
    jaquith , October 11, 2012 2:05 PM
    Nice article.

    You might want to fix (bold) the 'Aggregate Performance Per Core' chart; note the Plus & Pro numbers.
  • 4 Hide
    fulle , October 11, 2012 2:54 PM
    mayankleoboy1AFAIUI, the PowerVR GPU in ipad3 is more of a brute force architecture. "Just throw more transistors" is its mantra. So its good in current workloads. The Adreno320 is more refined and optimised arch. Trying to get the most performance from least silicon area. It is still being refined. Hence, it will do well in future applications.


    No. PowerVR holds some very specific advantages over there competition... Such as shader-driven tile-based deferred rendering (TBDR) architecture.... and their GPUs are actually MORE efficient than their competition. Not less.

    It's still very impressive what Qualcomm's done... I'm just not thinking they'll dethrone PowerVR anytime soon as the mobile graphics performance leader.
  • -4 Hide
    ddpruitt , October 11, 2012 3:21 PM
    It's interesting that the S4 needs 1.5 Ghz to beat the Tegra. And that's a high end Snapdragon vs bottom of the line Tegra. I'd bet that at the same clock rate with a level playing field these two would trade blows. This tells me that the Snapdragon is still way behind Tegra. S4 isn't in any production hardware and Nvidia is going to release the next Tegra chips in a few months.

    When Tegra and Snapdragon devices are available at the same time I'd bet that Tegra will handily beat Snapdragon. I wouldn't say Qualcomm is back in the fight, I would say this is Qualcomm's last desperate attempt to stay relevent
  • 1 Hide
    Bricktop , October 11, 2012 5:27 PM
    Good Review. I'm curious as to how accurate your "aggregate" single core performance numbers are to a real single threaded benchmark. I'm not an expert, but I thought the memory controller and GPU were separate from the CPU Cores. Can you simply divide those benchmark numbers by 4 to get single core memory and multimedia performance?
  • 0 Hide
    edlivian , October 11, 2012 5:27 PM
    this soc shall be at the heart of my next phone.

    Google, please allow the next nexus to be powerful, and have expandable memory.

    Enough imitating apples cloud dreams, we want local storage!!!!
    64GB microsd card is $48 on pace to be $36 by next year.
  • 3 Hide
    hameem_1 , October 11, 2012 6:16 PM
    this is so much great tosee Toms's hardaware doing mobile gpu benchmarks
  • 2 Hide
    marcel17 , October 11, 2012 8:01 PM
    Now , let's wait and see until some phones/tablets come on the market with the S4Pro .
    BTW , why are the Exynos4 and the A6 missing from the review ? I think they are the most relevant competition to the S4 .
Display more comments