Sign in with
Sign up | Sign in

Qualcomm's Snapdragon S4 Line-Up: Krait CPUs And Adreno Graphics

Snapdragon S4 Pro: Krait And Adreno 320, Benchmarked
By

Qualcomm's product portfolio is both deep and wide. Its mobile SoCs in the Snapdragon family stretch back to 2008, when the S1 platform was first made available. Now, in 2012, we're looking at the S4 series, indicating the company's fourth generation. 

You'll find four product families under Qualcomm's S4 umbrella, each consisting of individual chips organized in such a way as to address specific workloads.

S4 Prime, for example, is being positioned as a solution for smart TVs and set-top boxes. The MPQ8064 SoC is the only component under the S4 Prime moniker, boasting a quad-core Krait architecture with Adreno 320 graphics.

The focus of today's story, S4 Pro, includes a couple of different components: MSM8960T and APQ8064, the former featuring a dual-core Krait-based processor and the latter equipped with four cores. Both are 28 nm components with the same high-end Adreno 320 graphics engine. Whereas the MSM8960T part features an integrated cellular radio, the APQ8064 does not.

S4 Plus and Play, intended for smartphones and tablets, are composed of an additional 14 SoCs with and without built-in modems. 

In Qualcomm's hierarchy, S4 Pro is the highest-end platform you'll see used in mobile devices, and so it makes sense that the company built its mobile development platforms using an APQ8064, and that's what we have in the lab today.

Although it takes the second spot in the S4 line-up, the Pro segment is certainly still a performance-oriented part. As mentioned, the APQ8064 features a quad-core Krait-based processor operating between 1.5 and 1.7 GHz. Qualcomm couldn't get us access to a block diagram of the APQ8064, so imagine the shot of the MSM8960 above with a much smaller modem subsystem (no cellular radio, just Wi-Fi and Bluetooth), and an additional pair of cores.

Each core has 16 KB of L1 data and 16 KB of L1 instruction cache, and each pair of cores shares a 1 MB L2 cache. Qualcomm's Krait-based cores succeed the Scorpion-based design that we first covered in Third-Generation Snapdragon: The Dual-Core Scorpion. In the table below, we drill down into more granular specifics of the Krait and Scorpion architectures, comparing them to ARM's Cortex-A9 and Cortex-A15 core designs.

Architecture Comparison
Cortex-A9
Cortex-A15
Scorpion
Krait
Pipeline Depth
Eight-Stages15/17-24-Stage
(Integer/FPU)
10-Stage11-Stage
Out-of-Order Execution
Yes
Yes
PartialYes
Fab Node
45/30/32 nm32/28 nm
65/45 nm28 nm
Core Configurations
Single, Dual, Quad
Dual, Quad
Single, Dual
Dual, Quad
Cache
L1: 32 KB + 32 KB
L2: 1 MB
L1: 32 KB + 32 KB
L2: 4 MB max
L1: 32 KB + 32 KB
L2: 256 kB (per core)
L1: 16 KB + 16 KB
L2: 1 MB (per dual-core)
DMIPS/MHz
2.5
3.5
2.5
3.3


Unlike many of its competitors, Qualcomm is unique in that it employs custom processor design based on ARM IP, investing considerable time and money developing its own cores. For example, its Scorpion design employs the same ARMv7-A architecture used by the Cortex-A8 and -A9 cores. However, Qualcomm's specific implementation breaks the instruction pipeline down into a different number of stages, utilizes non-speculative out-of-order execution, and offers 128-bit SIMD functionality. Featuring a lot of in-house work, Scorpion is easily differentiated from the standard Cortex-A9, which helps explain certain benchmark victories.

Krait improves performance tangibly through increased complexity (due in no small part, we imagine, to a smaller 28 nm process node). Each core can now decode up to three instructions per clock cycle (up from two), similar to the Cortex-A15 design. Its integer pipeline is now 11 stages long, though, which is one stage longer than Scorpion's, but not as long as the -A15's 15-stage implementation. In practice, the longer pipeline should translate into a clock rate advantage 

Qualcomm also enables Krait with the ability to run each core's clock rate asymmetrically. This helps facilitate power savings in applications where all of the SoC's compute resources aren't needed. Useful though it may be, this isn't a new feature. The Scorpion core featured it as well, and Nvidia's Tegra 3 leans on the same principle for its fifth companion core.

React To This Article