Sign in with
Sign up | Sign in
Intel Silvermont Architecture: Does This Atom Change It All?
By ,
1. Can Silvermont Take Atom From Zero To Hero?

Intel’s Atom was once the Rodney Dangerfield of the processor world. It just didn't get no respect. The first Silverthorne-based Atoms were little single-core affairs that dipped into sub-1 W territory, but required a System Controller Hub that took platform power closer to 5 W. More capable versions from the Diamondville family bumped consumption higher still—all the way to the strange pairing of Atom and the 945GC chipset, which used more than 22 W on its own.

Not surprisingly, then, we haven’t published a lot of flattering coverage on Atom (I think the last time I even bothered with an Atom-based desktop was for Intel’s Atom D510 And NM10 Express: Down The Pine Trail With D510MO in 2009). Even today, five years after expressing its intentions to compete against ARM-based SoCs, the industry continues questioning Intel’s ability to deliver ample performance at power targets low enough to facilitate compelling tablets and smartphones.

Methodical progress compelled us to reconsider Intel’s efforts last year, though. Sixteen months ago, one of our writers went underground and made the bold prediction that Intel will overtake Qualcomm in three years. And that was when Intel didn’t have a single phone design win. The analysis was predicated on Intel’s ability to deliver a performance-competitive CPU based on 32 nm manufacturing and in-order execution, knowledge of the company’s manufacturing roadmap, and anticipation of a forthcoming out-of-order architecture.

Meet Silvermont, the predecessor to AirmontMeet Silvermont, the predecessor to Airmont

Well, the details of that design, already known as Silvermont, become public today. And if the Atom processors based on Silvermont can do everything Intel says they can, then we won’t even need granular measurements like the ones we collected for ARM Vs. x86: The Secret Behind Intel Atom's Efficiency to quantify the company’s efficiency story compared to its ARM-based competition.

If you’ve followed the Atom family’s evolution, then you know that Intel hasn’t modified its fundamental microarchitecture in five years. Yes, it made a shift from 45 to 32 nm manufacturing. But the cores themselves—code-named Saltwell at 32 nm, but based on the original Bonnell design—continue to employ in-order execution, clearly favoring low power use at the expense of performance.  

With Silvermont, that changes. We’re now looking at a more complex out-of-order execution engine largely enabled by a transition to 22 nm manufacturing. This isn’t a “see you again in five years” introduction, either. Intel is committing significant resources to dramatically accelerating development of its “light” architecture, promising yearly refreshes (the first of which will be Airmont at 14 nm, extending Intel's manufacturing advantage beyond the lead it enjoys at 22 nm).

In fact, Intel files all of the changes made to Atom into three categories: those that improve performance, others intended to achieve better power efficiency, and specific optimizations to the company’s process technology.

2. The Silvermont Architecture

So again, we know that Silvermont is based on an out-of-order execution engine, which has huge ramifications for performance compared to Saltwell (remember, that design is already competitive with other SoCs available today). Intel continues to lean on macro-op execution for more efficient handling of certain x86 instruction combinations, though.

The Saltwell Execution PipelineThe Saltwell Execution Pipeline

The 32 nm Saltwell execution pipeline is 16 stages long, and because it’s in-order, macro ops have to go through the whole thing, even if they don’t need the cache access stages. As a result, branch mispredicts waste 13 cycles. In Silvermont, the op can bypass the access stages and execute if cache isn’t needed. Mispredicts consequently only burn 10 cycles.

The Silvermont Execution PipelineThe Silvermont Execution Pipeline

Each Silvermont core receives a number of tweaks and improvements, from larger branch predictors to the reworked execution units and bigger caches. A lot of effort went into identifying instructions that were on the slower side in Intel’s Bonnell design. Silvermont improves much of that, reducing latency and increasing throughput. Floating-point add operations are down several cycles each, packed SIMD double results are achieved in four clocks (instead of nine), and signed multiplies are sped-up significantly. All told, Intel claims that its per-core IPC is about 50% higher across a wide swath of workloads. Consider the jump from Sandy Bridge to Ivy Bridge, where we saw single-digital IPC gains comparing two CPUs running at the same frequency. A 50% boost is outright massive.

Silvermont Block DiagramSilvermont Block Diagram

But of course, Atom typically shows up in multi-core configurations. When the processor family first launched, it was a single-core chip. Not long after, Intel introduced a dual-core model, also manufactured at 45 nm. When it came time to adopt 32 nm, only dual-core versions surfaced. And as the company advances it process technology, more parallelized configurations become viable. In fact, Silvermont can scale as high as eight physical cores.

Now, the L2 cache is tightly coupled to the cores, yielding low latency and high bandwidth. Intel’s architects didn’t want to share that cache across more than two cores, though. So they went with a module-based approach. Each little building block includes a pair of cores and 1 MB of L2 cache shared between them (previous Atom processors had 512 KB of L2 per core). Individual cores, the L2 cache, and the interface between the cores and cache can all be power-gated. The cores in a module can even run at different frequencies, though they’ll operate symmetrically by default.

Silvermont Module ArchitectureSilvermont Module Architecture

Modules communicate over a point-to-point in-die interface with independent read and write channels, replacing the front-side bus topology altogether. Incidentally, Intel identifies its IDI as one of the keys to the modularity of the Nehalem/Westmere generation, and it’d seem that a lot of work from the “big” core space is affecting Atom here today.

Intel took a look at its core architecture, optimized for single-threaded performance, along with its modular approach to scalability, and chose to drop Hyper-Threading. Including the technology would have increased power use in single-threaded workloads. So the company bypassed SMT altogether, favoring more cores to boost performance in parallelized tasks.

At the same time, Intel’s engineers incremented its instruction set architecture to the 2010 Westmere class—up four years from the original Atom design’s Merom-compatible ISA. SSE4.1, SSE4.2, and POPCNT (which operates on integer registers) are part of this ISA package update, augmenting the Atom’s performance picture. AES-NI acceleration and Secure Key (including the RDRAND instruction and Digital Random Number Generator) also make it in.

Virtualization acceleration evolves from VT-x support to the technology’s second generation, introduced with Nehalem, supporting Extended Page Tables. Virtual Processor IDs in the TLBs and Unrestricted Guest (allowing KVM guests to run real and unpaged mode code natively when EPT are turned on) are part of that same evolution.

3. Power Management: The Key To Any Successful Mobile Architecture

To maximize the clock rate of its 32 nm Saltwell-based core, Intel employed a feature that opportunistically exposed additional P-states based on available thermal headroom. Silvermont’s implementation of this is apparently more similar to Turbo Boost in that the burst frequency is managed in hardware according to thermal, electrical, and power measurements. More important than the extra speed you get from this burst mode, though, is how it handles the ride back down.

Presently, there are mobile devices that will run at full-speed until they’re thermally overwhelmed, at which point they throttle back dramatically to recover. It’s jarring enough to affect the user experience. Intel is saying that Silvermont will handle those situations more elegantly, stepping back clock rate naturally before a thermal event is triggered.

The SoC’s power budget can be shared between the cores and other IP on the die, including third-party IP. Graphics is probably the most notable. The illustration below describes this behavior pretty clearly: cores can share power, cores can borrow budget from the graphics (which spins down), and cores can burst up dynamically, even with graphics active, if the thermal situation is favorable enough. Intel says the concepts come from Turbo Boost, but the algorithms and implementation mechanisms are different.

Coming back the other direction, Intel enables a lot of familiar core power state functionality, with the addition that cores can drop into C6 independently, whereas they couldn’t before. And because Silvermont is module-based, Intel introduced sub-states allowing software policy-based control of the L2 cache’s contents, too. Building on the S0ix connected standby system states introduced back in 2010 with the Moorsetown platform, Silvermont can now retain the state of the core through SoC standby mode transitions. This means you can resume from those modes faster, though Intel wasn’t clear on how much faster.

4. Putting It All Together

Based on the per-core and modular design changes that Intel made to Silvermont, next-generation Atom processors could very well change how we think about the family’s performance attributes. And that’s great. But up against the ARM-based SoC competition, speed isn’t Atom’s biggest issue. Power is. Although we’ve seen the 32 nm Z2760 hold off Qualcomm and Nvidia, truly overcoming an incumbent ARM-based architecture requires that Intel lean hard on its process experience to optimize for efficiency, too.

That’s exactly what it says it’s doing with 22 nm. In fact, because Atom is a SoC, Intel can leverage multiple versions of the 22 nm process to maximize performance or density. So, the total sum of Intel’s process advantage, the architecture it enables, and the optimizations baked in to curb power consumption, equals what the company calls a wide dynamic range of operation. How does this range manifest itself? Check out the slide below:

Don’t read the lines as final relative performance—Intel and the unnamed-vendor-with-asymmetric-cores probably won’t wind up in the same places on a chart with actual data labels. Intel’s point is clear, though. The Silvermont architecture is expected to enable very low power consumption and very high performance using the same symmetric approach employed by Saltwell. It’s that much better, though, due to the interplay between 22 nm manufacturing, per-core IPC improvements, scalability across multiple cores, and tweaks to bring minimum core power down.

Meanwhile, asymmetric approaches incur performance penalties for switching from power- to performance-optimized logic, and then lose efficiency to the higher power requirements of those faster cores. If there’s one key visual that reflects the potential impact of Silvermont, this is it. Achieving lower power at higher minimum performance and better performance (also at lower core power) than the competition is what will make Silvermont shine, should Intel’s projections come to pass.

So Does Silvermont Change The Game?

During its deep-dive briefing, Intel showed off a number of slides with projections of performance and power. Some of them compared Saltwell to Silvermont, reflecting big jumps and cuts in each of those categories, respectively. Others showed dual-core Silvermont outperforming dual- and quad-core solutions from the competition well under the power target for smartphones. A third set illustrated Silvermont’s performance advantage at fixed power, and power-savings at peak performance. In every chart, the same message was hammered home. You can compare the field at a set power figure and Silvermont is faster—and not by a small amount. Or, go all-out on performance and the Silvermont architecture uses less power—also not by a small amount.

Of course, products based on Silvermont aren’t yet available, to say nothing of the tablets or smartphones built using those devices. So, Intel’s upcoming solution is competing against hardware already on the market in these slides. Nevertheless, today’s architecture announcement is the next logical step toward the predictions we made in Mobile: Intel Will Overtake Qualcomm In Three Years, keeping our outlook on-track.

We eagerly await more detail on actual SoCs based on Silvermont and Intel’s choice of graphics (rumored to be its own Ivy Bridge-based technology) to test. Intel proved with its Atom Z2760 and Windows 8 that tablets can be every bit as flexible as PCs. That combination ultimately lacked performance and build quality. Silvermont will almost assuredly address the former. Now Intel needs its partners to step up and deliver handheld devices that don’t suck. Then, it’ll truly bury the ecosystem-limited ARM-based competition.