Intel finally provides solid information on Haswell's successor, the next-generation Broadwell core. We also learn some detailed info about the new 14nm processing node, a must-read for CPU enthusiasts who are interested in the future of Intel's Core!
The steps that Intel takes to update its processors are well documented, and old hat to anyone who follows the CPU industry. It is referred to as the company's "tick-tock" strategy, where the tick represents a node shrink that can squeeze more transistors into a smaller die, followed by a tock that indicates a significant architecture update. This repeats itself in a cycle of roughly a year and a half cadence. Last year's 22nm Haswell processor was a tock, so we're fast approaching the next tick: essentially a Haswell die shrink to 14nm, that tick is known as Broadwell.
If you're already familiar with this, then you already know what we expect from Intel's ticks: smaller processors, lower power usage, higher performance per watt, and similar overall performance compared to the previous generation product. That expectation shouldn't belittle the accomplishment as much as highlight the company's consistency over the last few product generations. What may surprise you is that this progression has resulted in a Haswell-Y processor with a TDP low enough to enable fanless enclosures less than 9 millimeters thick. That's an arena that Intel's Core brand has never ventured into before. But more on that later, let's start our analysis with the star of the show: Intel's new 14nm process node.
The 14nm Node: 2nd Generation FinFET
It might seem reasonable to assume that the numerical designation of a process node refers to a specific dimension (i.e. the 22nm node or 14nm node). While this was the case in early generations where the measurement corresponded to the smallest part of the transistor (usually the gate), this relationship no longer exists in modern nomenclature.
Today's nodes are named after a theoretical representation designed to indicate its average physical scale relative to previous generation nodes. For example, if we compare Intel's 22nm to 14nm nodes, we find that transistor fin pitch (the space between fins) has been reduced from 60nm to 42nm, transistor gate pitch (the space between the edge of adjacent gates) has gone from 90nm to 70nm, and the interconnect pitch (the minimum space between interconnecting layers) has changed from 80nm to 52nm. An SRAM memory cell that takes up 108 square nanometers of area on the 22nm node scales down to 59nm2 on the 14nm node.
Those dimensions range from a scaling factor of 0.70x (the transistor fin pitch size) to 0.54x (SRAM memory cell area scaling). If you take the number 22 and multiply it by 0.64x you end up with about 14, so it's probably fair to say that Intel assigned an appropriate numerical designation to its 14nm process node. In fact, the Broadwell-Y die has about 63% less area than the Haswell-Y die.
Intel's 22nm node is the company's first-generation FinFET (also known as Tri-Gate) transistor design. The new 14nm process represents Intel's second-generation FinFET, with a tighter fin pitch for improved density. Combining this with taller and thinner fins results in higher drive current and better transistor performance. The number of fins per transistor has been reduced from three to two, which also improves density while lowering capacitance.
Intel's competitors are currently transitioning from MOSFET to FinFET transistor designs, but the company claims that it has a competitive edge when it comes to logic area scaling. Based on published information from TSMC and the IBM alliance, and using the scaling formula (gate pitch x metal pitch), Intel claims that TSMC's upcoming 16nm node yields no logic area scaling improvement over 20nm and that the competition will trail significantly for the next two generations. Of course this formula is only one metric, but it does make us curious to see how TSMC's 16nm node will perform once it is implemented next year. We also have to wonder if the laws of physics won't become an insurmountable barrier under 10nm, which may give the competition some time to catch up to Intel. Having said that, Moore's Law appears to continue unabated for the moment.
Let's quickly touch on yields. No semiconductor company is completely transparent when it comes to this topic, but Intel did share a few tidbits of information. In general terms, Intel told us that its 22nm process produces the highest yield of the past few node generations, and that the 14nm Broadwell SoC yield is in a healthy range and trending in an optimistic direction. The first products are qualified and currently in volume production, with expected availability at the end of 2014.
The point of all this is that leakage, power usage, and the cost per transistor is reduced, while both performance and performance per watt is increased compared to the previous-generation node. As we said, none of this is a surprise but it's always a welcome change, especially if it enables new usage models. That comes into play when we consider the actual products that Intel will ship on the 14nm node. One of those products is Broadwell-Y, the next-generation mobile chip that Intel shared the most details on. We'll talk more about that on the next page, but let's consider the general architectural improvements that will be leveraged across all Broadwell-based processors first.
The Broadwell Converged Core
Intel claims that Broadwell boasts at least a 5% IPC increase over Haswell. That's a minor difference, but not much of a surprise considering that this is a process improvement tick and not a new architecture tock.
As such, the improvements are mostly the result of beefing up existing resources, not re-engineering them. The 14nm node density improvement was successful enough to allow Intel more room to add transistors, so they did: a larger out-of-order scheduler (Intel didn't specify the size difference) results in faster store-to-load forwarding. The L2 Translation Lookaside Buffer (TLB) has been increased from 1k to 1.5k entries, and a new 1GB/16 entry page of L2 was added. A second TLB page miss handler was added so that page walks can now be performed in parallel.
The floating point multiplier is much more efficient, now able to accomplish in three clock cycles what takes Haswell five cycles to complete. Broadwell also has a radix-1,024 divider and is purportedly faster at performing vector gather operations. Intel also asserts that branch predictions and returns are improved.
Aside from these general areas, some specific functionality was targeted. Cryptography acceleration instructions are improved, and virtualization round-trips are faster. Of course, power usage reduction is high on Intel's priority list, and the company claims that it only spent transistors on the features that add performance with a minimal power cost. On the next page, we'll learn more about some of the power gating and efficiency optimizations that Intel implemented in Broadwell.




What Gaurav Rai said:
"Meanwhile Amd innovates with 220W processsor XD"
Was humor. You know, like Ha-ha and stuff?
at 4.6Ghz my 2700K is more than a capable CPU .
Bring on the Skylake,, then we'll talk .
I would expect this to also translate into even more unpredictable and voltage/temperature-sensitive overclock outcomes.
thus beat amd. but, to me, amd chips like the fx-series and the phenoms before have a simplicity to them that i admire. although i can't specifically say how or what it is.
Can you really call it innovation when AMD needs a 200W chip to compete with Intel's sub-100W chips? Unless you meant innovation in the high-tech space-heater market.
Intel has gone down the crank-clocks-power-be-damned path with Prescott about a decade ago and that did not work too well. AMD just tried the same thing and "shockingly," that did not work particularly well for them either.
What Gaurav Rai said:
"Meanwhile Amd innovates with 220W processsor XD"
Was humor. You know, like Ha-ha and stuff?
Even if you compare Sandy Bridge (32nm) Intel CPUs with AMD's FX83xx (28nm) which theoretically gives the advantage to AMD, Intel's older chips still win most benchmarks. Intel being one process node ahead has very little to do with their performance lead; their architecture itself is just that far ahead.
Hardly. Performance of the current Intel 4-core isn't that much better than the
equivalent model from 18 months ago. I know they've improved power consumption,
etc., but without significant speedups, most potential users really won't care.
Mike Stewart, you should be able to run your 2700K at 5.0. Every 2700K I've
obtained runs at 5 no problem, with good temps, etc.
OTOH, the chipset improvements with Z97 do at least offer a vaguely passable
rationale for upgrading, re the greater number of Intel SATA3 ports, newer
storage tech, etc. If budget was not an issue, I'd build with a 4790K without
hesitation.
Can't help feeling though, with various comments I've seen this past few weeks,
that what may be holding many people back from their ideal build is RAM pricing
which is now completely ridiculous. RAM is just too expensive. Huge step backwards
in system cost. And please I don't want to hear about chip shortages, etc., we all
know why RAM is more expensive now, because it's happened so many times before:
the suppliers don't like the pricing levels, so they restrict supply to raise prices. Well
IMO it's counter productive, because I can't be the only one who thinks no thanks,
I'm not paying that much for an 8GB 1600 kit when for about a 3rd less one could
get an 8GB 2133 kit a year+ ago, so heck with it I'll look for used kits instead, save
a bundle. I've bought four used GSkill 2x4GB 2133 kits this year, saved over 100 UKP
so far.
Price drops & efficiency improvements on CPUs are all fine & lovely, but what's the
point if potential future power savings are being wiped out by an artificially upfront
cost increase via the RAM?
Ian.
Which makes it all the more funny considering the Athlon XPs at the same time were more focused on efficient computing with better IPC instead of insane clock rates. You'd think AMD would have learned enough from that time not to fall into the Netburst trap.
Intel slides says >5% which mean over 5%.
[Edit: Thx, fixed!]
Hardly. Performance of the current Intel 4-core isn't that much better than the
equivalent model from 18 months ago. I know they've improved power consumption,
etc., but without significant speedups, most potential users really won't care.
Moore's Law states that the number of transistors on the chip will double every 24 months: http://en.wikipedia.org/wiki/Moore's_law
From the Wiki article: Moore's law is the observation that, over the history of computing hardware, the number of transistors in a dense integrated circuit doubles approximately every two years.
Double Transistors <> Double Performance (although early on it seemed that way)
Actually pitch means the space from the center of one fin to the center of the adjacent fin... it is not just the space between the 2 fins...
Back in those days, newer chips with more transistors were also on a smaller process, significantly higher clocks and usually accompanied with some fundamental performance enhancements/breakthroughs so the performance doubling every ~18 months was a combination of multiple compounding factors.
Today, practically all the fundamental discoveries have been made and all they are doing is refine them so that side of performance scaling is effectively shut down. The clock scaling also appears to have hit a brick wall since the latency hit from making pipelines longer to enable higher clocks causes the execution pipelines to stall on dependencies more often and negate gains from higher clocks. Process wise, they are at a point where they are starting to fight with fundamental laws of physics, which does not help with smooth progress either.
There is little reason to believe things are going to improve much any time soon when all aspects are well into their diminishing return curve.
Even if you compare Sandy Bridge (32nm) Intel CPUs with AMD's FX83xx (28nm) which theoretically gives the advantage to AMD, Intel's older chips still win most benchmarks. Intel being one process node ahead has very little to do with their performance lead; their architecture itself is just that far ahead.
FX-83xx series are 32mm, btw/fyi. fx-8350 vs. i7-2600k is probably a fair fight. i bet they'd trade blows. or, an fx-8350 is not far behind if it is behind. and amd has a software/platform/optimization disadvantage, meaning that the programs are not optimized for amd chips since most pc's have intel chips inside them.
Even if you compare Sandy Bridge (32nm) Intel CPUs with AMD's FX83xx (28nm) which theoretically gives the advantage to AMD, Intel's older chips still win most benchmarks. Intel being one process node ahead has very little to do with their performance lead; their architecture itself is just that far ahead.
FX-83xx series are 32mm, btw/fyi. fx-8350 vs. i7-2600k is probably a fair fight. i bet they'd trade blows. or, an fx-8350 is not far behind if it is behind. and amd has a software/platform/optimization disadvantage, meaning that the programs are not optimized for amd chips since most pc's have intel chips inside them.