Interconnects may be a dry subject to some, but they're the fundamental underpinnings of today's processor architectures. Interconnects are becoming even more relevant as the industry grapples with the waning Moore's Law, and Intel has yet another new interconnect in the works that purportedly offers improvements over existing packaging technology.
It's already been a busy year on the interconnect scene; Intel unveiled its new mesh architecture, and AMD also pushed its Infinity Fabric-powered designs out to market. Those designs will power the company's respective products for years to come.
At the annual Hot Chips semiconductor conference, Intel presented the company's new EMIB (Embedded Multi-Die Interconnect Bridge), which is a complicated name for a technique that provides high-speed communication between several chips. The packaging technology allows Intel to assemble Lego-like components ("chiplets") into single heterogeneous packages.
Intel envisions eventually using EMIB to bring FPGAs, ASICs, CPUs, and HBM memory all onto the same package, possibly ushering in a wave of new customized solutions that wield tremendous compute capabilities.
Intel originally painted the new technology in broad strokes during its Manufacturing Day earlier this year, but it dove into more details at the conference. Intel purchased Altera for $16.7 billion in 2015 and folded the FPGA company into its Programmable Solutions Group. Now, Intel is using EMIB packaging for its new Stratix 10 FPGAs, which we'll cover shortly. First, the basics.
Monolithic Versus Heterogenous
The traditional monolithic die consists of several distinct components, such as CPU cores, I/O controllers, and graphics cores, all manufactured on a single large monolithic die. As chips get physically larger (die area increases), the chance of manufacturing defects increases, thus reducing yields. Also, some components don't scale well to smaller nodes. So, while the 10nm CPU cores may be fully functional, shrinking the I/O controller could be more challenging. Other components are also optimized on certain nodes for specific tasks.
Transitioning to a heterogeneous design, which combines several separate components onto a single package, solves several problems. First, the smaller dies sidestep yield problems by reducing the chance of catastrophic defects. It also allows Intel to combine several different components with different processes onto the same package. That lets the company use larger nodes for the harder-to-shrink or purpose-built components. It also decouples development cycles for analog and digital devices, thus decreasing time to market.
The idea is simple in concept, but providing a low latency, low power, and high bandwidth interconnect between the components is a major challenge. Doing it at a reasonable cost is even harder.
Building the Heterogeneous Chip
We already see heterogeneous designs with existing processors such as Intel's Knights Landing. Similar techniques are also used for connecting HBM packages to GPUs, such as Nvidia's Volta and AMD's Vega. AMD's EPYC and Threadripper processors also combine several chips into a single logical processor, but the approach is somewhat different because each chip is a fully functioning unit. There are different paths, with varying levels of efficiency and performance, to constructing heterogeneous chips, but Intel claims they aren't as sophisticated as EMIB.
There are two popular high-performance options. The first is a multi-chip package, which connects the die through the package substrate. The design suffers from poor connection density in the substrate and where the interconnects meet the die (notice the spacing of the red lines).
Silicon interposers (2.5D packaging) are another technique. They consist of a thin layer of silicon that slots in between the die and the package substrate. Die-to-die communication occurs through the silicon interposer, which increases connection density. It also improves the die-to-interposer connection density, as well. Unfortunately, silicon interposers are more expensive due to the size of the interposer and TSV technology. Also, all connections on the die have to travel through the silicon interposer, which is important to remember for later. As an aside, AMD's Threadripper and EPYC processors use an organic interposer to connect its Zeppelin die.
Intel's claims EMIB, at the bottom of the slide, solves several issues in one fell swoop. Intel embeds small silicon bridges into the package substrate that allow for tighter interconnect density on the periphery of the die. They also reduce cost because they don't use TSV and are smaller than silicon interposers.
Zooming in on the key areas also unveils another advantage. The silicon bridge features four layers of metal interconnects for die-to-die communication (bottom zoom image). These interconnects have a much tighter pitch than you can achieve with a package substrate.
The microbumps on the periphery of the die connect to the silicon bridge and feature a 55-micrometer pitch, which Intel claims will scale down to 35 with next-gen products. 10-micron microbumps are also on the seven-year roadmap. So, there is room to grow. Or shrink, rather.
The remainder of the connections feature a standard >100-micrometer Flip-Chip pitch.
Silicon interposers are large, so all of the traffic, including RF signals, I/O, and power have to travel through them. EMIB transmits digital signals only across the smaller microbumps into the silicon bridge, while other signals still travel through the package substrate. That allows for increased silicon bridge density, and it reduces noise. As pictured above, each package can have multiple silicon bridges. Silicon interposers have size constraints, whereas EMIB-based packages can be much larger.
Using EMIB, Intel can package die within 100 microns of one another. That reduces space between the components, which also reduces the power required for data movement. Of course, it still isn't as efficient as an on-die monolithic implementation. At the show, Intel didn't provide power comparisons to standard silicon interposers. Instead, it compared EMIB to standard interconnects, like PCIe and DDR, found on motherboards.
We do know that a single chip can support up to 20,000 EMIB connections with up to 2Gbps of throughput each.
Enter The Chiplet
So, how does this high-speed interconnect impact the Stratix 10 FPGA?
Intel is working with partners to develop chiplets, which are small re-usable IP building blocks that can be processors, transceivers, memory, or other types of components. Intel can mix and match the chiplets, much like Lego blocks, and connect them to the FPGA to create custom designs for different application use-cases.
Chiplets are a great idea, but a standardized SiP interface between components is important, especially if other companies are involved in the effort. The AIB and UIB interface IP blocks we see on the Stratix 10 die are the key. UIB connections are a general-purpose SiP interface for HBM and ASICs, while AIB supports transceiver connections and other general-purpose components.
Intel subdivides the AIB PHY into 25 logical channels, with 24 exposed to the user and one dedicated to handshaking operations. We can also see the standard EMIB interface, with the smaller microbumps dedicated to the digital interface while the larger connections handle analog.
Chiplet designers don't have to worry about specialized programming to leverage the interface--Intel claimed it abstracts away the complexity so the designer can focus on IP design. Intel didn't share details on the ostensibly more complex UIB interface, but we do know that both interfaces support up to 2Gbps per physical line (programmable).
These likely won't be open standards, so it is unlikely that we will see other fabs producing similar heterogeneous packages. However, Intel invites others to develop chiplets for its devices. Chiplet verification is also of the utmost importance to ensure reliability, so Intel has developed a framework of governing rules.
Here we see an example of several different configurations with the Stratix 10 FPGA. One design employs six transceivers, while the other uses four transceivers and two HBM die. The configuration can be armed with many different types of chiplets.
The goal is to eventually create incredible new combinations. As shown on the third slide, you could connect CPUs, HBM, and a machine learning ASIC (think Nervana) to a FPGA to create a homogeneous design. That would be an unheard-of combination with massive performance potential. In fact, the company demoed an experimental FPGA and Skylake combination on the same package last year, so the next generation EMIB interconnect is likely well underway.
Intel noted that three different foundries are producing six different technology nodes, and because no single company has a monopoly on innovation, each can create certain customized solutions that are better than their competitors. Building a single monolithic die requires a company to be the best at everything, whereas picking and choosing between chiplets from other vendors allows companies to focus on their core competencies.
Intel plans to expand its EMIB interconnect to its other products--many speculate that EMIB will make its way into the next generation Xeons. Intel hasn't confirmed a specific timeline, but it's only a matter of time. Intel's 3D XPoint will likely come on-package soon, and EMIB appears to be a great fit.
Intel's EMIB has great promise, but it will be costly. Opening the interconnect up as a standard would facilitate faster adoption.
Of course, most of us want to know if it will come to desktop processors as well, but that doesn't seem as likely in the near future. It's logical to assume it might come as Intel pushes the boundaries of Moore's Law.
But Intel, long the keeper of the Moore's Law flame, certainly doesn't want us to think that its new approach is an admission that Moore's Law is waning. The company cited a quote from one of Moore's famed papers (PDF) that states that this type of approach might be an economical solution. Talk about foresight.