The Pledge: CPU Architecture
A lot of people assume that all MSoCs are the same. A company takes a licensed core from ARM (say a Cortex-A9 or a Cortex-A15), combines it with a graphics processor like Mali or technology from PowerVR, it adds memory and I/O, and then ships it off to manufacturing.
It's suggested, then, that all licensed cores perform exactly the same way. More savvy techies know that ARM is an instruction set, and while companies can buy a fully-capable CPU core from ARM, layout design (such as what Intrinsity did for Apple and Samsung) can improve performance. Companies can also develop a brand new chip on their own, without using any of the ARM design. That’s what Qualcomm has been doing with its Scorpion core, and soon, its Krait design. Nvidia is doing the same thing with “Project Denver,” relying on its patent/technology acquisitions and human expertise from PortalPlayer, Transmeta, and ULi.
But the computational core only plays a small part of the overall system’s performance. You have to deal with variables like memory bandwidth, bus architecture, and cache policies. It’s not just bandwidth either, but also memory latency. That was one of the many reasons why AMD’s Athlon 64 was superior to Intel’s Pentium 4, and the biggest reason why Apple’s iPad 2 does so much better than its competition in terms of responsiveness and performance. It’s not simply an off-the-shelf design.
In order to predict which company will have the dominant MSoC in three years, we have to figure out two different things: which team is best set up to achieve the highest raw performance, and which team is most likely to go the furthest with power consumption?
Raw CPU Performance
Let’s talk about raw performance before we discuss power consumption. There is no question whether Intel has the best resources to achieve the fastest processors. ARM and Qualcomm are going to face the same growing pains that the x86 world has already struggled through.
In the next processor generation, Qualcomm is transitioning from its partially out-of-order Scorpion architecture to Krait, a full out-of-order design. Krait should more effectively facilitate peak CPU utilization, maximizing efficiency.
At the same time, Qualcomm is now navigating uncharted territory, where its engineers have less expertise. ARM already has some experience with its Cortex-A9, which is out-of-order-capable. But even with the upcoming Cortex-A15, the company will be relying on dedicated reservation stations (the instruction queue) for each of the execution units. While Intel and AMD used dedicated reservation stations in the past, both now employ unified reservation stations to improve performance and utilization. Unlike ARM, Qualcomm is attempting to jump directly to a unified reservation station design. The original Pentium Pro used a unified reservation station, so it’s not inconceivable to think that a company could pull this off successfully.
The Atom architecture doesn’t incorporate any of Intel’s advanced technology. It’s a single-core, in-order design that is more reminiscent of the Pentium CPU than anything modern. But here’s the thing: it’s already faster than the ARM-based competition. As performance demands start to increase, Intel has access to decades of expertise to drop into Atom. We’ve heard that Atom would go to an out-of-order core within five years of its launch, landing it in the 2013 range. So, ignoring power consumption, there is little doubt that Intel can put out faster processor designs.