There's a lot of technological ground to cover following today's announcements from AMD. Genoa and Bergamo, 3D V-Cache powered Milan-X, and the Instinct MI200 MCM (Multi-Chip Module) GPUs aside, there's one element that crucially stands at the crossroads of all of these technologies: AMD's Infinity Fabric, with its version 3.0 introduced today, has continued to evolve dramatically since AMD first introduced the new connectivity scheme in March 2020.
In many ways, AMD's Infinity Fabric is an extension of AMD's dreams of Heterogeneous System Architecture (HSA) systems; it now powers intra- and inter-chip communication on AMD's CPU and GPU solutions. There's no singular technology that can be described as being the Infinity Architecture solution; the name aggregates several interconnect technologies employed on AMD's latest products, culminating in a coherent CPU + GPU technology that aims to improve system performance (and especially HPC performance) by leaps and bounds.
Infinity Fabric 3.0 ushers in what AMD dreamed the future was with its "The Future is Fusion" marketing campaign from way back in 2008. It brings a coherent communications bus for interconnectivity and resource sharing between the company's CPU and GPU solutions, allowing for increased performance, lower latencies, and lower power consumption.
The idea is simple: moving data is computationally expensive. Hence the Infinity Architecture is designed to reduce data movement between storage banks (whether it be VRAM, system RAM, or CPU cache) as much as possible. If every piece of the hardware puzzle is aware of what information lies where and can access it on an "as-needed" basis, big performance gains can be realized.
AMD's Infinity Architecture 3.0 builds upon its Infinity Fabric technology in almost every conceivable way. The previous-generation Infinity Fabric architecture forced communication between CPU and GPU to be done (non-coherently) via the PCIe bus - which means that peak theoretical bandwidth would scale up to that link's limit (16 GT/s for PCIe 4.0), but no more. It also limited the maximum number of PCIe-interconnected GPUs in a dual-socket system to four graphics cards. The new Infinity Architecture, however, enables the entire communication to happen via the Infinity Fabric 3.0 link, meaning there is no PCIe non-coherent communication, though the link does have a fallback to PCIe if needed.
Additionally, the Infinity Fabric is now used to enable an 400 GB/s bi-directional link between the two GPU die found in the MI250X, enabling the first productized multi-chip GPU.
The new, improved Infinity Architecture actually allows for a coherent communication channel between not only a dual-socket Epyc CPU system but increases the maximum amount of simultaneous GPU connections from four to eight. The speed at which graphics cards talk to one another has also been greatly improved - the Infinity Architecture now allows for 100 GB/s bandwidth throughout each of its Infinity Fabric links, providing enough throughput to feed entire systems that can include up to two EPYC CPUs and eight GPU accelerators.