Qualcomm's Snapdragon S4 Line-Up: Krait CPUs And Adreno Graphics
Qualcomm's product portfolio is both deep and wide. Its mobile SoCs in the Snapdragon family stretch back to 2008, when the S1 platform was first made available. Now, in 2012, we're looking at the S4 series, indicating the company's fourth generation.
You'll find four product families under Qualcomm's S4 umbrella, each consisting of individual chips organized in such a way as to address specific workloads.
S4 Prime, for example, is being positioned as a solution for smart TVs and set-top boxes. The MPQ8064 SoC is the only component under the S4 Prime moniker, boasting a quad-core Krait architecture with Adreno 320 graphics.
The focus of today's story, S4 Pro, includes a couple of different components: MSM8960T and APQ8064, the former featuring a dual-core Krait-based processor and the latter equipped with four cores. Both are 28 nm components with the same high-end Adreno 320 graphics engine. Whereas the MSM8960T part features an integrated cellular radio, the APQ8064 does not.
S4 Plus and Play, intended for smartphones and tablets, are composed of an additional 14 SoCs with and without built-in modems.
In Qualcomm's hierarchy, S4 Pro is the highest-end platform you'll see used in mobile devices, and so it makes sense that the company built its mobile development platforms using an APQ8064, and that's what we have in the lab today.
Although it takes the second spot in the S4 line-up, the Pro segment is certainly still a performance-oriented part. As mentioned, the APQ8064 features a quad-core Krait-based processor operating between 1.5 and 1.7 GHz. Qualcomm couldn't get us access to a block diagram of the APQ8064, so imagine the shot of the MSM8960 above with a much smaller modem subsystem (no cellular radio, just Wi-Fi and Bluetooth), and an additional pair of cores.
Each core has 16 KB of L1 data and 16 KB of L1 instruction cache, and each pair of cores shares a 1 MB L2 cache. Qualcomm's Krait-based cores succeed the Scorpion-based design that we first covered in Third-Generation Snapdragon: The Dual-Core Scorpion. In the table below, we drill down into more granular specifics of the Krait and Scorpion architectures, comparing them to ARM's Cortex-A9 and Cortex-A15 core designs.
|Fab Node||45/30/32 nm||32/28 nm||65/45 nm||28 nm|
|Core Configurations||Single, Dual, Quad||Dual, Quad||Single, Dual||Dual, Quad|
|Cache||L1: 32 KB + 32 KBL2: 1 MB||L1: 32 KB + 32 KBL2: 4 MB max||L1: 32 KB + 32 KB L2: 256 kB (per core)||L1: 16 KB + 16 KB L2: 1 MB (per dual-core)|
Unlike many of its competitors, Qualcomm is unique in that it employs custom processor design based on ARM IP, investing considerable time and money developing its own cores. For example, its Scorpion design employs the same ARMv7-A architecture used by the Cortex-A8 and -A9 cores. However, Qualcomm's specific implementation breaks the instruction pipeline down into a different number of stages, utilizes non-speculative out-of-order execution, and offers 128-bit SIMD functionality. Featuring a lot of in-house work, Scorpion is easily differentiated from the standard Cortex-A9, which helps explain certain benchmark victories.
Krait improves performance tangibly through increased complexity (due in no small part, we imagine, to a smaller 28 nm process node). Each core can now decode up to three instructions per clock cycle (up from two), similar to the Cortex-A15 design. Its integer pipeline is now 11 stages long, though, which is one stage longer than Scorpion's, but not as long as the -A15's 15-stage implementation. In practice, the longer pipeline should translate into a clock rate advantage
Qualcomm also enables Krait with the ability to run each core's clock rate asymmetrically. This helps facilitate power savings in applications where all of the SoC's compute resources aren't needed. Useful though it may be, this isn't a new feature. The Scorpion core featured it as well, and Nvidia's Tegra 3 leans on the same principle for its fifth companion core.
Well, it'd be a start. I wouldn't go nearly as far as all is forgiven.
The Adreno320 is more refined and optimised arch. Trying to get the most performance from least silicon area. It is still being refined. Hence, it will do well in future applications.
Please be as exaustive as possible ;)
The cortex A15 DMIPS/MHz should read above the A9. Around 3.5 DMIPS/MHz from the rumblings.
In fact, they seem to suggest only a very small improvement over A9, if any at all.