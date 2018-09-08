The History Of Intel CPUs: Updated!

by
Intel Begins with The 4004

The first microprocessor sold by Intel was the four-bit 4004 in 1971. It was designed to work in conjunction with three other microchips, the 4001 ROM, 4002 RAM, and the 4003 Shift Register. Whereas the 4004 itself performed calculations, those other components were critical to making the processor function. The 4004 was mostly used inside of calculators and similar devices, and it was not meant for use inside of computers. Its max clock speed was 740 kHz.

The 4004 was followed by a similar processor known as the 4040, which was essentially an improved variation of the 4004 with an extended instruction set and higher performance.

8008 And 8080

The 4004 made a name for Intel in the microprocessor business, and to capitalize on the situation, Intel introduced a new line of eight-bit processors. The 8008 came first in 1972, followed by the 8080 in 1974 and the 8085 in 1975. Although the 8008 was the first eight-bit processor produced by Intel, it is not as notable as its predecessor or its successor, the 8080. It was faster than the 4004 thanks to its ability to process data in eight-bit chunks, but it was clocked rather conservatively between 200 and 800 kHz, and the 8008's performance simply didn't attract many system developers. The 8008 used 10-micrometer transistor technology.

Intel's 8080 was far more successful. It expanded on the design of the 8008 by adding new instructions and transitioning to six-micrometer transistors. This allowed Intel to more than double the clock rates, and the highest-performance 8080 chips in 1974 came running at 2 MHz. The 8080 was used in countless devices, which lead to several software developers, such as the recently formed Microsoft, to focus on software for Intel's processors.

Eventually when the 8086 was released, it was made source compatible with the 8080 to maintain backwards compatibility with this software. As a result, the 8080s and key hardware elements have been present inside of all x86-based processor ever produced, and 8080 software can technically still run on any x86 processor.

The 8085 was essentially a less expensive and higher-clocked variant of the 8080, which was highly successful as well though less influential.

8086: The Beginning Of x86

Intel's first 16-bit processor was the 8086, which helped to boost performance considerably compared to earlier designs. Not only was it clocked higher than the budget-oriented 8088, but it also employed a 16-bit external data bus and a longer six-byte prefetch queue. It was also able to run 16-bit tasks (though most software at this time was designed for eight-bit processors). The address bus was extended to 20-bit, which enabled the 8086 to access up to 1MB of memory and therefore increase performance.

The 8086 also became the first x86 processor, and it used the first revision of the x86 ISA, which nearly all of the processors created by AMD or Intel since the introduction of the 8086 have been based on.

Intel also produced the 8088 around the same time. This processor was based on the 8086, but with half as many data lines and a four-byte prefetch queue. This caused a loss of balance, as the narrower bus cut into instruction fetch rate, forcing Intel's execution unit to idle much of the time. It still had access to up to 1MB of RAM and ran at higher frequencies than previous processors; however, it was quite a bit slower than the 8086.

80186 And 80188

Intel followed up the 8086 with several other processors, all of which used a similar 16-bit architecture. The first was 80186, aimed at embedded applications. To facilitate this, Intel integrated several pieces of hardware typically found on the motherboard into the CPU, including the clock generator, interrupt controller, and timer. As a side effect, certain instructions ran notably faster on 80186 than 8086, even at the same clock rate. But of course, Intel naturally pushed the CPU's frequency up over time to further improve performance.

The budget-oriented 80188 similarly contained several pieces of hardware integrated into the processor. But like the 8088, its data bus was cut in half.

80286: More Memory, More Performance

The 80286 was released the same year as the 80186 and had nearly identical features, but it extended the address bus to 24-bit, which enabled the processor to access up to 16MB of memory.

iAPX 432

The iAPX 432 was an early attempt by Intel to diverge from its x86 portfolio in favor of an entirely different design. Intel expected iAPX 432 to be several times faster than its other offerings. The processor ultimately failed, however, due to some major design flaws. Although x86 processors are relatively complex, the iAPx 432 took CISC to a whole new level of complexity. The hardware design was rather large, which forced Intel to craft it out of two separate dies. The processor was also quite data hungry and failed to perform well without extremely high amounts of bandwidth. The iAPX 432 managed to outperform the 8080 and 8086, but it was quickly outpaced by newer x86 products, and eventually it was abandoned.

i960: Intel's First RISC

Intel created its first RISC processor in 1984. It was not designed as a direct competitor to the company's x86 processors because it was intended as a secure embedded solution. Internally, it was a 32-bit superscalar architecture that used Berkeley RISC design concepts. The first i960 processors were clocked relatively low, with the slowest model running at 10 MHz, but over the years it was improved and transitioned to smaller fabs that enabled it to hit up to 100 MHz. It also supported 4GB of protected memory.

The i960 was widely used inside of military systems as well as in business systems.

80386: x86 Turns 32-bit

Intel's first 32-bit x86 processor was the 80386, released in 1985. One key advantage that this processor had was its 32-bit address bus that allowed it to support up to 4GB of system memory. Although this was far more than anyone was using at the time, RAM limitations often hurt the performance of prior x86 and competing processors. Unlike modern CPUs, at the time the 80386 was released, more RAM almost always translated into a performance increase. Intel also implemented several architectural enhancements that helped push performance up above the 80286, even when both systems used the same amount of RAM. It also supported virtual mode processing, which increased multi-tasking support.

To segment its product line-up with a more budget-friendly offering, Intel also introduced the 80386SX. This processor was almost identical to the 80386; it still employed a 32-bit architecture, but half of its data bus was cut to 16 bits for cost-saving purposes.

i860

In 1989, Intel made another attempt to move away from its x86 processors. It created a new RISC CPU known as the i860. Unlike the earlier i960, this CPU was designed to be a high-performance model to compete in the desktop market, but the design proved problematic. Its most significant flaw was that the processor's performance relied entirely on the compiler to place instructions in the order they would need to be executed when the software was first created. This helped Intel keep the die size and overall complexity of the i860 down, but it was nearly impossible to correctly list every instruction from beginning to end when compiling the program. This caused the CPU to constantly stall while it attempted to work around the problem.

80486: Integrating The FPU

Intel's 80486 was another significant step up in terms of performance. The key to its success was tighter integration of components into the CPU. The 80486 was the first x86 CPU to contain L1 cache. Early 80486 models came with 8KB on-die, and were etched on a 1000nm process. But as the design transitioned to 600nm, the L1 cache size doubled to 16KB.

Intel also incorporated the FPU into the CPU, which up to that point had been a separate functional processing unit. By moving these pieces of hardware into the host processor, latency between them dropped sharply. The 80486 also used a faster FSB interface to increase bandwidth, and the core had various other tweaks to push up IPC. These changes increased the 80486's performance significantly, and high-end models were multiple times faster than the older 80386.

The first 80486 processors reached 50 MHz, and later models that used the improved 600nm process went as high as 100 MHz. To target budget-oriented users, Intel also released a version of the 80486 known as the 80486SX, which had the FPU disabled.

P5: The First Pentium

The Pentium emerged in 1993 as the first Intel x86 processor that didn't follow the 80x86 number system. Internally, the Pentium used the P5 architecture, which was Intel's first x86 superscalar design. Although the Pentium was generally faster than the 80486 in every way, its most prominent feature was a substantially improved FPU. The original Pentium's FPU was more than ten times faster than the 80486's aging unit. This became an even more significant feature in later years when Intel released the Pentium MMX. This processor was architecturally the same as the original Pentium, but featured support for Intel's new MMX SIMD instruction set that could drastically boost performance.

Intel also increased the L1 cache size on its Pentium processors relative to the 80486. Initial Pentiums contained 16KB, while the Pentium MMX moved up to 32KB. Naturally, these processors also ran at higher clock rates. The first Pentium processors used 800nm transistors and could hit just 60 MHz, but subsequent revisions transitioned to Intel's 250nm process and pushed frequencies up to 300 MHz.

P6: The Pentium Pro

Intel planned to quickly follow the Pentium up with the Pentium Pro based on its P6 architecture, but ran into technical difficulties. The Pentium Pro was considerably faster than the Pentium in 32-bit operations thanks to its Out-of-Order (OoO) design. It featured a significantly redesigned internal architecture that decoded instructions into micro-ops, which were then executed on general-purpose execution units. It also used a significantly extended 14-stage pipeline owing to the additional decoding hardware.

As the first Pentium Pro processors were targeted at the server market, Intel extended the address bus again to 36-bit and added its PAE technology that allowed it to support up to 64GB of RAM. This was far more than average users needed, but being able to support greater amounts of RAM was key to Intel's server customers.

The processor's cache system was reworked as well. The L1 cache was limited to two segmented 8KB caches, one for instructions and one for data. To make up for the 16KB deficit compared to the Pentium MMX, Intel placed between 256KB and 1MB of L2 cache on a separate chip attached to the CPU package. It connected to the CPU using a back-side-bus (BSB).

Intel initially planned to push the Pentium Pro out to consumers as well, but ultimately limited it as a server product. The Pentium Pro featured several revolutionary features, but it struggled against the Pentium and Pentium MMX in terms of performance. Both of the older Pentium parts were significantly faster at 16-bit operations, and 16-bit software was still heavily used back then. The processor also lacked support for the MMX instruction set, which resulted in the Pentium MMX outperforming the Pentium Pro in MMX-optimized software.

The Pentium Pro may have stood a chance in the consumer market, but it was also fairly expensive to produce due to the separate chip containing L2 cache. The fastest Pentium Pro processor ran at 200 MHz, and it was crafted with transistors ranging between 500 and 350nm.

P6: Pentium II

Intel didn't give up on the P6 architecture, but instead waited until 1997 when it released the Pentium II. The Pentium II managed to overcome nearly all of the negative aspects of the Pentium Pro. Its underlying architecture was similar to the Pentium Pro, and it continued to use a 14-stage pipeline with several enhancements to the core to improve IPC. The L1 grew to 16KB data + 16KB instruction caches.

Intel also moved to more affordable cache chips attached to a larger silicon package to reduce production costs. This was an effective way of making the Pentium II less expensive, but these memory modules were unable to operate at the CPU's full speed. Instead, the L2 cache ran at half-frequency, and on these early processors that was sufficient to increase performance.

Intel also added support for the MMX instruction set. The CPU cores used inside of the Pentium II, code-named "Klamath" and "Deschutes," were also sold as Xeon and Pentium II Overdrive products for servers. The highest-performance models contained 512KB of L2 cache and ran at 450 MHz.

P6: Pentium III And The Race To 1 GHz

Intel planned to follow up the Pentium II with a processor based on its Netburst architecture, but it wasn't quite ready. Instead, Intel pushed the P6 architecture out again as the Pentium III.

The first of these processors, code-named "Katmai," was rather similar to the Pentium II in that it used a slotted cartridge containing lower-quality L2 cache at half of the CPU's speed. The underlying architecture incorporated other significant changes, as several parts of the 14-stage pipeline were fused together, shortening it to 10 stages. Thanks to the updated pipeline and an increase in clock speed, the first of the Pentium III processors typically outperformed their Pentium II counterparts by a small margin.

Katmai was produced using 250nm transistors. However, following the move to a 180nm fabrication process, Intel was able to boost the Pentium III's performance significantly. This updated implementation, code-named "Coppermine," moved the L2 cache into the CPU and reduced its capacity by half (down to 256KB). But because it was able to run at the processor's frequency, performance still shot up.

Coppermine was Intel's competitor to AMD's Athlon in the race to break 1 GHz, which it succeeded in doing. Intel attempted to produce a 1.13 GHz model, but it was ultimately recalled after investigation from Dr. Tom Pabst of Tom's Hardware discovered that it was unstable. This left the 1 GHz model the fastest Coppermine-based Pentium III.

The last of Pentium III cores was named "Tualatin." It moved to a 130nm process that facilitated clock rates as high as 1.4GHz. It also increased the L2 cache back to 512KB, which helped improve performance somewhat.

P5 And P6: Celeron And Xeon

Around the release of the Pentium II, Intel also introduced its Celeron and Xeon product lines. These products used the same core as the Pentium II or Pentium III, but with varying amounts of cache. The first Celeron-branded processors based on the Pentium II had no L2 cache at all, which resulted in horrible performance. Later models based on the Pentium III had half of the L2 cache disabled compared to their Pentium III counterparts. This resulted in Celeron processors that used the Coppermine core containing just 128KB of L2 cache; later models based on Tualatin increased this to 256KB.

These half-cache derivatives became known as the Coppermine-128 and the Tualatin-256. Intel sold them at clock speeds comparable to the Pentium III, which allowed them to perform well and made them highly competitive against AMD's Duron processors. Microsoft used one of the Coppermine-128 Celeron processors clocked at 733 MHz as the CPU inside of its Xbox gaming console.

The first Xeon processors were similar, but they contained more L2 cache. The Pentium II-based Xeon processors contained at least 512KB, the same as Pentium II CPUs, whereas higher-end models could have up to 2MB.

Netburst: Introduction

Before discussing Intel's Netburst architecture and the Pentium 4, it is important to examine the idea behind its deep pipeline, which describes the process whereby instructions move through a core. Pipeline stages often perform multiple tasks, but sometimes they're devoted to single functions. By either adding new hardware or splitting one stage into multiple stages, the execution pipeline can be extended. The processor pipeline can also be shrunk by removing hardware or by combining the components in multiple stages down into a single stage.

The length or depth of the pipeline has a direct impact on latency, IPC, clock speed and the architecture's throughput requirements. Longer pipelines typically require higher amounts of bandwidth, but if the pipeline is kept adequately fed with data, then each stage in the pipeline stays busy. Processors that have longer pipelines typically are able to run at higher clock rates as well.

The trade-off is significantly higher latency inside of the processor, as data flowing through it must stop at each stage for a certain number of clock cycles. Processors using a long pipeline tend to have lower IPC as well, which is why they rely on significantly higher frequencies to increase performance. Over the years, processors implementing both philosophies have proven successful. Neither approach is necessarily flawed.

Netburst: Pentium 4 Willamette And Northwood

In 2000, Intel's Netburst architecture was finally ready, and it was pushed into production as the Pentium 4. The combination would carry Intel's top-end CPUs for the next six years. The first implementation was named "Willamette," which carried Netburst and the Pentium 4 through the first two years of its life. This was a troubled time for Intel, however, and the chip struggled to outperform the Pentium III. Netburst enabled significantly higher frequencies, and Willamette managed to hit 2 GHz, but the Pentium III at 1.4 GHz was still faster in some tasks. AMD's Athlon processors enjoyed a healthy performance lead during this period.

The problem with Willamette was that Intel stretched the pipeline out to 20 stages and planned to hit even higher clock rates beyond 2 GHz, but due to power consumption and heat issues, it was unable to reach those goals. The situation improved with Intel's 130nm design known as "Northwood," which scaled up to 3.2 GHz and doubled the L2 cache from 256KB to 512KB. Netburst's power consumption and heat issues persisted. However, Northwood nevertheless performed significantly better and was highly competitive against AMD.

On high-end models, Intel also introduced its Hyper-Threading technology to improve resource utilization in environments that emphasized multitasking. Hyper-Threading wasn't as beneficial on Northwood as it is on present-day Core i7 processors, but it did push performance up by a few percentage points.

Willamette and Northwood were released inside of Celeron- and Xeon-branded CPUs as well. As with the previous generation of Celeron- and Xeon-based products, Intel raised or lowered the L2 cache size in order to distinguish their performance.

P6: Pentium-M

As Netburst was designed as a high-performance architecture that was fairly power hungry, it didn't translate well to mobile systems. Instead, in 2003 Intel created its first architecture designed exclusively for notebooks. The Pentium-M was based on the P6 architecture, but with a longer 12-14 stage pipeline. This was also Intel's first variable-length pipeline, which meant that instructions could be executed after moving through just 12 stages if the information required for the instruction was already loaded into cache. If not, it had to go through two additional stages to load the data.

The first of these processors was crafted with 130nm transistors and contained a 1MB L2 cache. It managed to hit 1.8 GHz while consuming just 24.5W of power. A later revision known as "Dothan" was released in 2004 and transitioned to 90nm transistors. This enabled Intel to increase the L2 cache to 2MB and, combined with a number of core enhancements, provide a decent IPC throughput improvement. The CPU also scaled up to 2.27 GHz with a slight increase of power to 27W.

The Pentium-M architecture would eventually be used inside of the Stealey A100 mobile CPUs before being replaced by Intel's line of Atom processors.

Netburst: Prescott

Northwood carried the Netburst architecture from 2002 until 2004, after which Intel launched Prescott with numerous enhancements. It used a 90nm fabrication process that enabled Intel to increase the L2 cache to 1MB. Intel also introduced the new LGA 775 interface that featured support for DDR2 memory and a faster quad-pumped FSB than the first Northwood-based CPUs. These changes resulted in Prescott having significantly more bandwidth than Northwood, which was vital to increasing Netburst's performance. Prescott was also Intel's first 64-bit x86 processor, allowing it to access more RAM and operate on 64 bits at a time.

Prescott was supposed to be the crown jewel in Intel's family of Netburst-based processors, but instead it was a fiasco. Intel again extended its execution pipeline, this time to 31 stages. The company hoped to increase clock rates enough to offset the longer pipe, but it was only able to hit 3.8 GHz. Prescott simply ran too hot and consumed too much power. Intel expected the move to 90nm to alleviate this issue, but the increased transistor density made cooling more difficult. As it was not able to hit higher frequencies, Prescott's evolutionary changes hurt overall performance.

Even with all of the enhancements and extra cache, Prescott was, at best, on par with Northwood at any given clock rate. Around the same time, AMD's K8 processors were also moving to smaller transistors that enabled them to hit higher frequencies. For this brief time period, AMD dominated the desktop CPU market.

Netburst: Pentium D

In 2005, the race was on to produce the first consumer-oriented dual-core processor. AMD had already announced its dual-core Athlon 64, but it wasn't available yet. Intel rushed to beat AMD by using a multi-core module (MCM) that contained two Prescott dies. The company christened its dual-core processor the Pentium D, and the first model was code-named "Smithfield."

The Pentium D launched to criticism, however, as it faced the same issues that plagued Prescott. The heat and power of two Netburst-based dies limited clock rates to 3.2 GHz at most. And because the architecture was bandwidth-limited, Smithfield's IPC suffered as throughput was split between two cores. The implementation wasn't particularly elegant either; AMD's dual-core CPU constructed from one die was considered superior.

Smithfield was followed by Presler, which moved to 65nm transistor technology. It contained two Ceder Mill dies on an MCM. This helped reduce the processor's heat and power consumption, and let Intel raise its clock rate to 3.8 GHz.

There are two key steppings of Presler. The first one had a higher 125W TDP, whereas the later model dropped down to 95W. Thanks to the smaller die size, Intel was able to double the L2 cache as well, so each die had 2MB. A few enthusiast models also featured Hyper-Threading, allowing the CPU to address four threads simultaneously.

All Pentium D processors supported 64-bit software and could take advantage of more than 4GB of RAM.

Core: Core 2 Duo

Intel eventually gave up on its Netburst architecture and instead put its support behind the P6 and Pentium-M design. The company realized that P6 was still viable, and capable of being both efficient and providing excellent performance. It reworked the architecture into its Core design. Like the Pentium-M, it used a 12 to 14 stage pipeline that was significantly shorter than Prescott's 31-stage implementation.

Core proved to be highly scalable, and Intel was able to push it into service on mobile systems with TDPs as low as 5W and high-end servers with 130W ceilings. Intel mostly sold it as "Core 2 Duo" or "Core 2 Quad" products, but Core was also used inside of Core Solo-, Celeron-, Pentium- and Xeon-branded CPUs. The dies used were built using two CPU cores, and quad-core designs used two dual-core dies on an MCM. Single-core versions, meanwhile, had one core disabled. L2 cache size ranged from 512KB up to 12MB.

With the improvements made to the Core architecture, Intel could again compete against AMD. The PC market entered a golden age filled with extremely competitive high-performance processors that are still viable to this day.

Bonnell: Silverthorne And Diamondville

The Core 2 architecture hit a wide range of devices, but Intel needed to produce something less expensive for the ultra-low-budget and portable markets. This led to the creation of Intel's Atom, which used a 26mm2 die, less than one-fourth the size of the first Core 2 dies.

Intel didn't design Atom's Bonnell architecture completely from scratch, but instead went back to the Pentium's P5 foundation. That was largely because P5 was Intel's last in-order execution design. OoO execution, though highly beneficial to performance, also consumes quite a bit of power and takes up a large amount of die space. For Intel to meet its goals, OoO simply wasn't practical at the time.

The first Atom die, code named "Silverthorne," had a TDP of 3W. This enabled it to go places that Core 2 could not. Silverthorne's IPC was lackluster, but it was able to run at up to 2.13 GHz. It also contained 512KB of L2 cache. The decent frequency and L2 cache did little to make up for the low IPC, but Silverthorne still enabled an entry-level experience at a relatively low price.

Silverthorne was succeeded by Diamondville, which reduced the frequency to 1.67 GHz but enabled 64-bit support, which improved performance in 64-bit apps.

Nehalem: The First Core i7

With the processor market in a highly competitive state, Intel couldn't afford to sit still for long. So, it reworked the Core architecture to create Nehalem, which adds numerous enhancements. The cache controller was redesigned, and the L2 cache dropped to 256KB per core. This did not hurt performance though, as Intel instead added between 4-12MB of L3 cache shared between all of the cores. CPUs based on Nehalem included between one and four cores, and the family was built using 45nm technology.

Intel significantly reworked connections between the CPU and rest of the system as well. The ancient FSB that had been in use since the 1980s was finally put to rest, and it was replaced by Intel's QuickPath Interconnect (QPI) on high-end systems and by DMI everywhere else. This allowed Intel to move its memory controller (which was updated to support DDR3) and PCIe controller into the CPU. These changes significantly increased bandwidth while latency plummeted.

Once again, Intel extended the processor pipeline, this time to 20-24 stages. Clock rates did not increase, however, and Nehalem ran at comparable frequencies to Core. Nehalem also was Intel's first processor to implement Turbo Boost. Although the fastest Nehalem processor's base clock topped out at 3.33 GHz, it could operate at 3.6 GHz for short periods thanks to this new technology.

The last major advantage that Nehalem had over the Core architecture was that it marked the return of Hyper-Threading technology. Thanks to this and numerous other enhancements, Nehalem was able to perform up to twice as fast as Core 2 processors in heavily-threaded workloads. Intel sold Nehalem CPUs under the Celeron, Pentium, Core i3, Core i5, Core i7, and Xeon brands.

Bonnell: Pineview And Cedarview

In 2009, Intel released two new Atom-branded dies based on the Bonnell architecture. The first was known as "Pineview," which continued to use a 45nm fabrication process. It featured better performance than Diamondville by integrating a number of components traditionally found inside of the motherboard chipset, including graphics and the memory controller. This had the effect of reducing power consumption and lowering heat dissipation. Dual-core models were also available using two Pineview cores on an MCM.

Westmere: Graphics In The CPU

Intel created a 32nm die shrink of Nehalem that was code-named "Westmere." Its underlying architecture changed little, but Intel took advantage of the reduced die size to place additional components inside of the CPU. Instead of just four execution cores, Westmere contained up to 10. It could also have as much as 30MB of shared L3 cache.

The HD Graphics implementation in mainstream Westmere-based Core i3, i5, and i7 processors was similar to Intel's GMA 4500, except it had two additional EUs. Clock rates stayed about the same, ranging between 166 MHz in low-power mobile systems and 900 MHz on higher-end desktop SKUs. Although the 32nm CPU die and 45nm GMCH weren't fully integrated into a single piece of silicon, both components were placed onto the CPU package. This had the effect of reducing latency between the memory controller inside of the GMCH and the CPU. API support didn't significantly change between the GMA and HD Graphics implementations, though overall performance increased by over 50 percent.

Related: Evolution of Intel Graphics: i740 To Iris Pro

Sandy Bridge

With Sandy Bridge, Intel made its most significant leap in performance, the most in seven years. The execution pipeline was shortened into 14-19 stages. Sandy Bridge implemented a micro-op cache capable of holding up to 1500 decoded micro-ops that enabled instructions to bypass five stages if the micro-op required was already cached. If not, the instruction would have to run the full 19 stages.

The processor also featured several other improvements, including support for higher-performance DDR3. More components were integrated into the CPU as well. Instead of two separate dies on the CPU package (as on Westmere), everything moved into one die. The various subsystems were connected internally by a ring bus that enabled extremely high-bandwidth transactions.

Intel again updated its integrated graphics engine. Instead of a single HD Graphics implementation pushed into all CPU models, the company created three different versions. The top-end variant was the HD Graphics 3000 with 12 EUs that was could be clocked up to 1.35 GHz. It also contained extras like Intel's Quick Sync transcoding engine. The mid-range HD Graphics 2000 variant possessed the same features, except it dropped down to six EUs. The lowest-end HD Graphics model also had six EUs, but with the value-added features.

Bonnell: Cedarview

In 2011, Intel created another new Atom die based on the same Bonnell architecture used inside of Pineview. Again, there were minor core enhancements to improve IPC, but in reality little changed between the two. Cedarview's key advantage was a move to 32nm transistors that enabled frequencies up to 2.13 GHz at lower power. It was also able to support higher-clocked RAM thanks to an improved DDR3 memory controller.

Ivy Bridge

Intel followed Sandy Bridge with its Ivy Bridge processors, a "Tick+" in the company's "Tick-Tock" product design cadence. Ivy Bridge's IPC was only slightly better than Sandy Bridge's, but it brought with it other key advantages that outshined its predecessor.

Ivy Bridge's greatest advantage was its energy efficiency. The architecture was crafted with 22nm three-dimensional FinFET transistors that sharply reduced the CPU's power consumption. Whereas mainstream Sandy Bridge-based Core i7 processors typically came with a 95W TDP, the equivalent Ivy Bridge-based chips were rated at 77W. This was particularly important in mobile systems, and it allowed Intel to release a quad-core mobile Ivy Bridge CPU with a low 35W TDP. Prior to this, all of Intel's quad-core mobile CPUs came with at least a 45W TDP.

Intel took advantage of the reduced die size to also enlarge the iGPU. Ivy Bridge's highest-end graphics engine, HD Graphics 4000, packed in 16 EUs. The graphics architecture was also significantly reworked to improve the performance of each EU. With these changes, HD Graphics 4000 typically performed 200 percent better than its predecessor.

Haswell

Like a metronome, Intel pushed out its Haswell architecture just one year after Ivy Bridge. Haswell was once again more of an evolutionary step than a revolutionary one. The AMD processors competing against Sandy and Ivy Bridge weren't fast enough to do battle at the high end, so Intel wasn't pressured to increase performance too much. Haswell was approximately just 10 percent faster than Ivy Bridge overall.

Similar to Ivy Bridge, Haswell's most attractive features were its energy efficiency and iGPU. Haswell integrated the voltage regulation hardware into the processor, which enabled the CPU to keep a better handle on power consumption. The voltage regulator caused the CPU to produce more heat, but the Haswell platform as a whole became more efficient.

To combat AMD's APUs, Intel placed as many as 40 EUs inside of its top-end Haswell iGPU. The company also sought to increase the available bandwidth its fastest graphics engine had access to by equipping it with a 128MB L4 eDRAM cache, which drastically improved performance.

Bonnell: Silvermont

In 2014, Intel significantly reworked the Bonnell architecture to create Silvermont. One of the most significant changes was a switch to an OoO design. Another was the elimination of Hyper-Threading.

When the Bonnell architecture debuted, many felt that OoO occupied too much die space and was too power-hungry for an Atom CPU. By 2014, however, transistors had shrunk to such a small size and enjoyed reduced power consumption significantly enough that Intel could enable an OoO design on Atom. Intel also reworked the pipeline in Silvermont to minimize the impact of a cache miss. These changes, combined with a number of other improvements, resulted in a 50 percent increase in IPC compared to Cedarview.

To further boost Silvermont's performance, Intel created SKUs containing up to four CPU cores. It also switched to an iGPU based on the same graphics architecture in its Ivy Bridge processors. There were only four EUs in Silvermont's iGPU, but it nonetheless was capable of providing 1080p video playback, and it could run older games that weren't especially taxing. All aspects of the chipset were integrated into the Silvermont CPU as well, but this was more to reduce the system power consumption than anything.

The Silvermont die was used in Bay Trail-based products. The platform's TDP ranges between 2 and 6.5W, and the clock rate ranges between 1.04 and 2.64 GHz.

Broadwell

Intel's next processor architecture was known as Broadwell. Designed for mobile systems, it was released in late 2014 and used 14nm transistors. The first Broadwell-based product was called the Core M, and it was a dual-core Hyper-Threaded processor that operated with a 3-6W TDP.

Other mobile Broadwell processors dribbled out over time, but on the desktop side of the market, Broadwell never really showed up. A few desktop-oriented models were released in mid-2015. However, their reception was tepid. The highest-end SKU, however, contains the fastest integrated GPU Intel has ever added to a socketed CPU. It contains six subslices with eight EUs each, adding up to a total of 48. The GPU also has access to a 128MB L4 eDRAM cache, which helps to resolve the bandwidth challenges on-die graphics engines typically face. In gaming tests, it outperformed AMD's fastest APU and proved to be more than capable of providing playable frame rates in modern games.

Bonnell: Airmont

With its 14nm fab up and running, Intel did not hesitate to push out a new Atom chip built from these transistors. This CPU die was essentially a shrink of Silvermont, and Intel named it "Airmont." It did not improve IPC, but thanks to the die shrink it still managed to somewhat outperform its predecessor. After all, the move to 14nm transistors reduced heat dissipation, allowing the CPU to maintain its Turbo Boost frequency for longer periods of time.

Airmont's iGPU was significantly improved over Silvermont. The die itself contains 24 EUs, but products based on Airmont use between 12 to 16. None of the models based on Airmont currently all 24 EUs, and we are unlikely to see one in the future. These extra eight EUs exist to improve yields of Airmont, as a larger portion of the chip can be defective and still be salvageable. The graphics architecture was also updated to Intel's eight-gen Broadwell, improving the EUs' performance.

Airmont products were sold under the "Cherry Trail" and "Braswell" code names. The fastest Airmont-based Atom CPU is the N3700, which contains four CPU cores clocked at 1.6 GHz with a Turbo Boost frequency of 2.4 GHz. It also has a dual-channel DDR3L memory controller and 16 EUs clocked at up to 700 MHz.

Skylake

In 2015, not long after Broadwell first showed up on desktop systems, Intel replaced Broadwell with its Skylake architecture. Although Skylake-based CPUs were Intel's fastest to date, the platform changes accompanying Skylake were arguably more important.

Skylake was the first consumer-oriented CPU to use DDR4 memory, which is more energy-efficient than DDR3 and capable of enabling greater throughput. The Skylake platform also contained a number of improvements, such as a new DMI interface, an upgraded PCIe controller, and support for a much wider array of connectivity devices.

Naturally, Skylake included a better on-die GPU as well. The highest-end model was known as Iris Pro Graphics 580, and it was deployed to certain Skylake-R CPUs. The Iris Pro Graphics 580 engine featured 72 EUs and came paired with 128MB of L4 eDRAM. Most other Skylake-based chips included HD Graphics with 24 EUs, based on a design similar to Broadwell's.

Kaby Lake

Starting with Skylake and Kaby Lake, Intel ended its tick-tock development cadence in favor of a tick-tock-tock schedule. It was also referred to as the process-architecture-optimize cadence. This extended the amount of time Intel spent on a single fabrication process before it developed a new one. It also extended the amount of time between major architectural changes.

Kaby Lake, therefore, was essentially an optimized variation of Intel’s Skylake architecture. Although still 14nm, Intel utilized a process it called 14nm+ that had various tweaks to improve energy efficiency and performance. The architecture itself hardly changed at all, but it did facilitate DDR4-2400 memory support.

Kaby Lake also employed an HD Graphics 630 engine featuring improved codecs for encoding and decoding, extending support for 4K video playback.

Coffee Lake

With Coffee Lake, Intel increased the number of cores in its Core i3, i5, and i7 processors by two. This marked the largest increase in core count for Intel since the introduction of the Core 2 Quad in 2006.

Core i5s now have six cores without Hyper-Threading. Coffee Lake-based Core i7s also have six cores, but with Hyper-Threading. The underlying architecture does not change from Kaby Lake. However, with more cores to share the work, performance increases markedly in threaded applications.

Coffee Lake-based Core i3 processors lack Hyper-Threading, but thanks to the increase from two to four CPU cores, the Core i3 processor family has never wielded more power. In essence, Coffee Lake Core i3 CPUs are every bit as powerful as Kaby Lake Core i5s, and potentially faster than Skylake Core i5s.

Whiskey Lake and Amber Lake

Intel's delayed 10nm process has slowed progress on the smaller Cannon Lake processors, so the company developed the 14nm++ Whiskey Lake and 14nm+ Amber Lake processors for laptops, to fill the gap between generations.

The new 15-watt U-Series Whiskey Lake models slot into the same Eighth Generation Core “Kaby Lake-R” product stack as previous-generation mobile chips, and have the same numbers of cores and threads as the chips they’ll be replacing. And the 5-watt Amber Lake models replace the seventh-gen Y-series chips found primarily in fanless laptops and convertibles. One of the primary new features for Whiskey Lake is the addition of the first hardware-based fixes for Meltdown and L1TF to appear on consumer-focused CPUs.

The Whiskey Lake and Amber Lake processors all feature the same underlying Kaby Lake microarchitecture as previous-generation CPUs, with a few optimizations. Primarily, single-core boost frequencies get a big bump over previous parts (up to 4.6GHz with the Core i7-8565U). But of course, exactly how long your CPU will stay at that top speed depends largely on the device’s cooling abilities.

About the author
Michael Justin Allen Sexton

Michael Justin Allen Sexton is a Contributing Writer for Tom's Hardware US. He covers hardware component news, specializing in CPUs and motherboards.

  • abryant
    Archived comments are found here: http://www.tomshardware.com/forum/id-3322311/history-intel-cpus.html
  • mitch074
    Strange that Itanium is missing, the Celeron 300A/333 is gone, the original Pentium bug disappeared, no mention is made that the 487 was actually a fully active 486DX, and that AMD led the desktop for "a short time" while it led from the moment Netburst came out (2000) to the moment Core replaced it (2006). On another note, a 64-bit CPU doesn't run 64-bit software faster : it is required to have one to run some. But since AMD came up with x86-64, I guess some approximation is allowed...
  • lunyone
    And where is the AMD CPU history?
  • Tom Griffin
    Remember the ABIT BP6a motherboard; I was running dual Celery (Celeron) 300mhz processors overclocked to 533mhz. What a flashback.
  • TheDane
    @Tom Griffin: Yeah - those were the days :)
  • AndrewJacksonZA
    1) Where can we see what was updated please?
    2) Where are the Phi CPUs please?
    3) Where can we view this as a one-pager please?
  • ta152h
    There are quite a few mistakes here.

    For one, the 8086 was not available in higher clock speeds than the 8088.

    The 8086 could NOT run 8080 code. Source code compatible does not mean that. It means it was very easy to recompile the code so it would work with the newer processor, not that the compiled code would work.

    The 286 section is oddly very limited. It was an enormous improvement over the 8086, as it added more memory, much more performance, and also virtual memory and hardware assisted multi-tasking.

    The 80386 was not significantly faster than the 286 running 16-bit code, despite what the author says. Clock for clock, they were very close, although 386 based systems tended to get SRAM caches, whereas the 16 and 20 Mhz 286s rarely did.

    The 386SX not only cut down the data bus to 16 bits, it also cut down the address bus to 24-bits.

    The remarks on the i860 are bizarre. " ... it was nearly impossible to correctly list every instruction from beginning to end when compiling the program. " This is wrong, it was just very difficult to order the instructions very efficiently. Of course it had no problem listing them correctly, or the program wouldn't run. Intel tried it again with Itanium, and depended on the compilers to order instructions very efficiently, and also had difficulties.

    The author oddly left out the most significant part of the 486; it was the first pipelined x86 CPU, and that was a large part of the performance improvement.

    The Pentium's FPU was not 10x faster than the 486, but it was the biggest improvement. Unless you're comparing a very low clocked 486 to a high clocked Pentium. Clock per clock, it was not nearly 10x faster.

    Also MMX instructions were not related to the FPU, but were actually integer based.

    The first Pentiums actually ran at 66 MHz, but ran really hot, and they had yield problems, so sold 60 MHz Pentiums along with them, at a significant discount. Most people bought the 60 MHz because they were so much cheaper initially, but 66 MHz was out there.

    The Pentium MMX only reached 233 MHz as sold by Intel, not 300 MHz.

    The Pentium III (Katmai) was a Pentium II with SSE instructions added, nothing more. The nonsense about fewer pipelines and IPC improvements (outside of SSE code) is fabricated with regards to Katmai.

    The Celeron originally had no cache, but there was another version (not a Coppermine based) that had 128K cache on the processor. In some ways, it was faster than the Pentium III it was based on, because the cache was faster.

    Coppermine's cache was wider, and generally superior, but Celeron's were very competitive with Katmai based products, and were the favorites of overclockers.
  • mitch074
    Anonymous said:
    The Celeron originally had no cache, but there was another version (not a Coppermine based) that had 128K cache on the processor. In some ways, it was faster than the Pentium III it was based on, because the cache was faster..


    The Celeron 300A/333 you're referring to is the Mendocino core, wasn't based on Pentium III - it didn't have support for SSE. It can be seen as a precursor for Pentium III inasmuch as it had 128 Kb of level 2 non-inclusive low-latency cache - and it was the first Intel P6-family CPU to have that.
    It was soon replaced with the Coppermine core, which was indeed based on the Pentium III core (SSE) as it was a smaller die (smaller engraving) thus much cheaper to produce.
  • milkod2001
    Still on Haswell 4770k. Don't see any reasons to upgrade apart from m.2 SSDs maybe but have regular SATA SSD and there might be no difference in actual performance. Right?
  • Dymension
    OK, I believe the article says a history of INTEL chips, not AMD.
  • mitch074
    Anonymous said:
    OK, I believe the article says a history of INTEL chips, not AMD.


    Indeed - but it's strange how the jumps from 8 to 16-bit and then 16-bit to 32-bit got a whole lot of 'splaining, and then the jump to 64-bit is actually glossed over. That, and "AMD led the market for a short while" which lasted for several years and Intel got its collective butt handed to it:
    • faster than GHz frequency (K7)
    • DDR RAM support (K7)
    • fast double precision FPU (K7)
    • exclusive L2 cache (K7)
    • high-speed serial bus instead of FSB (K7: northbridge to southbridge, K8)
    • integrated memory controller (K8)
    • 64-bit (K8)
    • native dual core design (K8)


    Of course, if this appears on a "History of AMD CPUs", which does need an update since Ryzen came out, us AMD users won't complain as much :)
