The steps that Intel takes to update its processors are well documented, and old hat to anyone who follows the CPU industry. It is referred to as the company's "tick-tock" strategy, where the tick represents a node shrink that can squeeze more transistors into a smaller die, followed by a tock that indicates a significant architecture update. This repeats itself in a cycle of roughly a year and a half cadence. Last year's 22nm Haswell processor was a tock, so we're fast approaching the next tick: essentially a Haswell die shrink to 14nm, that tick is known as Broadwell.
If you're already familiar with this, then you already know what we expect from Intel's ticks: smaller processors, lower power usage, higher performance per watt, and similar overall performance compared to the previous generation product. That expectation shouldn't belittle the accomplishment as much as highlight the company's consistency over the last few product generations. What may surprise you is that this progression has resulted in a Haswell-Y processor with a TDP low enough to enable fanless enclosures less than 9 millimeters thick. That's an arena that Intel's Core brand has never ventured into before. But more on that later, let's start our analysis with the star of the show: Intel's new 14nm process node.
The 14nm Node: 2nd Generation FinFET
It might seem reasonable to assume that the numerical designation of a process node refers to a specific dimension (i.e. the 22nm node or 14nm node). While this was the case in early generations where the measurement corresponded to the smallest part of the transistor (usually the gate), this relationship no longer exists in modern nomenclature.
Today's nodes are named after a theoretical representation designed to indicate its average physical scale relative to previous generation nodes. For example, if we compare Intel's 22nm to 14nm nodes, we find that transistor fin pitch (the space between fins) has been reduced from 60nm to 42nm, transistor gate pitch (the space between the edge of adjacent gates) has gone from 90nm to 70nm, and the interconnect pitch (the minimum space between interconnecting layers) has changed from 80nm to 52nm. An SRAM memory cell that takes up 108 square nanometers of area on the 22nm node scales down to 59nm2 on the 14nm node.
Those dimensions range from a scaling factor of 0.70x (the transistor fin pitch size) to 0.54x (SRAM memory cell area scaling). If you take the number 22 and multiply it by 0.64x you end up with about 14, so it's probably fair to say that Intel assigned an appropriate numerical designation to its 14nm process node. In fact, the Broadwell-Y die has about 63% less area than the Haswell-Y die.
Intel's 22nm node is the company's first-generation FinFET (also known as Tri-Gate) transistor design. The new 14nm process represents Intel's second-generation FinFET, with a tighter fin pitch for improved density. Combining this with taller and thinner fins results in higher drive current and better transistor performance. The number of fins per transistor has been reduced from three to two, which also improves density while lowering capacitance.
Intel's competitors are currently transitioning from MOSFET to FinFET transistor designs, but the company claims that it has a competitive edge when it comes to logic area scaling. Based on published information from TSMC and the IBM alliance, and using the scaling formula (gate pitch x metal pitch), Intel claims that TSMC's upcoming 16nm node yields no logic area scaling improvement over 20nm and that the competition will trail significantly for the next two generations. Of course this formula is only one metric, but it does make us curious to see how TSMC's 16nm node will perform once it is implemented next year. We also have to wonder if the laws of physics won't become an insurmountable barrier under 10nm, which may give the competition some time to catch up to Intel. Having said that, Moore's Law appears to continue unabated for the moment.
Let's quickly touch on yields. No semiconductor company is completely transparent when it comes to this topic, but Intel did share a few tidbits of information. In general terms, Intel told us that its 22nm process produces the highest yield of the past few node generations, and that the 14nm Broadwell SoC yield is in a healthy range and trending in an optimistic direction. The first products are qualified and currently in volume production, with expected availability at the end of 2014.
The point of all this is that leakage, power usage, and the cost per transistor is reduced, while both performance and performance per watt is increased compared to the previous-generation node. As we said, none of this is a surprise but it's always a welcome change, especially if it enables new usage models. That comes into play when we consider the actual products that Intel will ship on the 14nm node. One of those products is Broadwell-Y, the next-generation mobile chip that Intel shared the most details on. We'll talk more about that on the next page, but let's consider the general architectural improvements that will be leveraged across all Broadwell-based processors first.
The Broadwell Converged Core
Intel claims that Broadwell boasts at least a 5% IPC increase over Haswell. That's a minor difference, but not much of a surprise considering that this is a process improvement tick and not a new architecture tock.
As such, the improvements are mostly the result of beefing up existing resources, not re-engineering them. The 14nm node density improvement was successful enough to allow Intel more room to add transistors, so they did: a larger out-of-order scheduler (Intel didn't specify the size difference) results in faster store-to-load forwarding. The L2 Translation Lookaside Buffer (TLB) has been increased from 1k to 1.5k entries, and a new 1GB/16 entry page of L2 was added. A second TLB page miss handler was added so that page walks can now be performed in parallel.
The floating point multiplier is much more efficient, now able to accomplish in three clock cycles what takes Haswell five cycles to complete. Broadwell also has a radix-1,024 divider and is purportedly faster at performing vector gather operations. Intel also asserts that branch predictions and returns are improved.
Aside from these general areas, some specific functionality was targeted. Cryptography acceleration instructions are improved, and virtualization round-trips are faster. Of course, power usage reduction is high on Intel's priority list, and the company claims that it only spent transistors on the features that add performance with a minimal power cost. On the next page, we'll learn more about some of the power gating and efficiency optimizations that Intel implemented in Broadwell.
Broadwell-Y: Introducing the Intel Core M Processor
The new 14nm node will be leveraged across multiple product segments, from data centers to tablets,via three separate Broadwell dies. For now Intel limited the information it shared with us to the mobile-oriented Broadwell-Y chip. We are told to expect an increasing amount of details on the other Broadwell products over the next quarter, with more to come during next month's Intel Developer Conference (IDF), but for now we'll have to remain content with details about the low-power mobile spin of this chip. Of course, improvements to the Broadwell core architecture will be reflected in the other dies, too. With that in mind, let's take a closer look at Broadwell-Y, marketed under the Intel Core M moniker.
The new Core M brand will cover all of the mobile space, and Intel told us that other brands like the Celeron and Pentium M will not be applied to the Broadwell-Y SoC. While the company did not disclose any actual model numbers or clock speeds, it stressed the new chip's capability to drive a hypothetical 7 to 10mm-thick fanless form factor with a 10.1" display, enabled by a 3 to 5W version of this processor. We were allowed to handle a working prototype in the form of a sexy 7mm-thin tablet, but were not allowed to run any software or examine specifications via the control panel. Until we have a working model to test, we'll have to take Intel's word that Broadwell-Y provides a "greater than 2x reduction in TDP with better performance than Haswell-Y".
We do have some specifics when it comes to the die and package sizes though. The Broadwell-Y chip is 82mm2, scaled down about 63% compared to Haswell-Y's 130mm2 die size. As for the board package, Broadwell-Y has a 50% smaller surface area and 30% thinner package compared to Haswell-Y. Part of this reduction is made possible by the relocation of the 3DL modules to a separate, tiny PCB that is attached to the bottom of the Broadwell-Y package. Of course, motherboards will need to have an appropriately-sized hole cut to accommodate it.
Because area scaling was better than expected on the 14nm node, Intel added 20% more transistors for increased features and performance. For instance, Haswell-Y's integrated graphics has a maximum of 20 AUs, while Broadwell-Y can use up to 24. That's 20% more compute resources, and Intel also claims a 50% increase in graphics sampler throughput. The company also mentions geometry, Z, and pixel fill performance improvements due to micro-architecture changes, though the company hasn't yet been specific about what these are. 4K display compatibility was also trumpeted, and two of them are theoretically supported, but practical power draw limits for a mobile device probably make this an unlikely scenario.
Intel's Core M: Focused on Low Power
Intel claims that the Broadwell-Y 14nm design and process optimizations account for the 2x lower power than Haswell-Y, which enables fanless functionality. The key SoC statistics include 25% lower power thanks to capacitance improvements, 20% less power thanks to lower minimum voltages combined with design optimizations, up to 15% improved transistor performance at low voltages, about a 10% reduction in power thanks to lower transistor leakage, and a smaller, denser processor. Of course, Intel hasn't disclosed the specific TDPs of the products that they've based these claims on, so we'll have to wait a bit longer for more information. We do know that the chips Intel is talking about have a 10 to 15W high burst speed for quick response, followed by a three to four watt sustained operating speed a few milliseconds later under load.
Broadwell-Y features the second-generation implementation of Intel's Full Integrated Voltage Regulator (FIVR), which can speed the transition between idle and load clock states. FIVR now features non-linear droop control, and a new dual FIVR LVR mode has been added. It turns out that FIVR is not particularly efficient at very low voltages, so it can now be bypassed when necessary to save power.
The SoC has an extensive list of optimizations that target active power reduction: design process optimization to reduce minimum operating voltage and dynamic capacitance (Cdyn), a re-architecture of DDR/IO/PLL/Graphics, optimizations for Cdyn in IA/Graphics/PH, and lower operating frequency ranges for IA/GT and Cache. There are other enhancements, too, such as dynamic display voltage resolution. Graphics can be controlled by Duty Cycling Control (DCC) to reduce usable power, or even turn it on and off as necessary. The latency needed to switch the GPU on and off is negligible, and it can be reduced to as low as 12.5% of its rated operating frequency.
Clock rates are, of course, tied to both power usage and thermal output. Three boost states are leveraged to deliver the highest possible clocks while maintaining system stability. The PL3 boost state allows the most amount of power that won't damage the system's battery, used for very short spikes as needed - the kind of time spans that are measured in milliseconds. PL2 is the standard burst limit, and PL1 represents the long-term system limit of sustainable power delivery. If necessary, duty cycle throttling can turn blocks of the processor on or off to minimize power usage and heat generation.
The power and thermal management framework is handled on a system level, by tracking multiple components to measure and change as necessary. This is controlled via Intel's Dynamic Power and Thermal Management software driver.
Intel's Platform Controller Hub (PCH) has also been re-engineered for Broadwell with an eye toward power efficiency. Idle power has been reduced by 25% compared to 2013, and active power has been dropped by 20% versus Haswell's PCH-LP. There are new power domains, and a new power reduction in firmware, hardware, and software updates which include fine-grain monitoring and reporting.
In addition to power enhancements, the PCH has been improved with an Audio DSP upgrade that includes more SRAM and higher MIPS. Advanced post-processing is available, with new wake-on-voice functionality. There are also fresh management and security features. Note that the PCH is manufactured using the 22nm node, so it hasn't shrunk.
Conclusion: Promising Statistics For Intel's Core M
We probably won't have access to test Intel's Broadwell CPU for a few months at least, but based on what we've seen, the company has lived up to the lofty expectations it has set. As we mentioned in our introduction: smaller processors, lower power usage, higher performance per watt, and similar overall performance compared to the previous generation at comparable clocks. All of those attributes are very desirable from a consumer standpoint.
But a lot of questions remain this early in the game. What kind of clock rates can we expect from Intel's 14nm node? How will the GPU scale compared to Haswell? What kind of Core M pricing can we expect, and will it be low enough to inject some life into low-cost Intel tablets? All of these questions will be answered over time, and we hope that at least some of them will be answered at IDF next month. Until then, we can certainly see a lot of potential in the upcoming Broadwell line.







