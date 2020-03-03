AMD is currently the only vendor with both x86 processors and discrete graphics cards under one roof, at least until Intel's Xe graphics roll out, giving Team Red some flexibility. This tech has been particularly useful in the world of high-performance computing (HPC), as evidenced by an AMD presentation at the Rice Oil and Gas HPC conference yesterday.

AMD initially announced at its Next Horizon event in 2018 that it would extend the Infinity Fabric of its data center MI60 Radeon Instinct GPUs to enable a 100 Gbps link between GPUs, much like Nvidia's NVLink. But with its Frontier supercomputer announcement in May, AMD divulged that it would expand the approach to enable memory coherency between CPUs and GPUs.

(Image credit: Twitter)

The annual Rice Oil and Gas HPC event hasn't concluded yet, but according to a tweet from Intersect 360 Research analyst Addison Snell tweeted yesterday, AMD announced that future Epyc+Radeon generations will include shared memory/cache coherency between the GPU and CPU over the Infinity Fabric, similar to what AMD enabled in its Raven Ridge Ryzen products.

We also got a glimpse of some slides presented at Rice Oil and Gas, courtesy of a tweet from Extreme Computing Research Center senior research scientist Hatem Ltaief.

Image 1 of 5 (Image credit: Twitter @HatemLtaief @addisonsnell) Image 2 of 5 (Image credit: Twitter @HatemLtaief @addisonsnell) Image 3 of 5 (Image credit: Twitter @HatemLtaief @addisonsnell) Image 4 of 5 (Image credit: Twitter @HatemLtaief @addisonsnell) Image 5 of 5 (Image credit: Twitter @HatemLtaief @addisonsnell)

AMD's charts highlight the divide between power efficiency of various compute solutions, like semi-custom SoCs and FPGAs, GPGPUs and general purpose x86 compute cores, and highlights the FLOPS performance relative to both power consumed and the amount of silicon area required to deliver that performance. As we can see, general purpose CPUs lag behind, but optimizations for vectorized code that use dedicated SIMD pathways can boost performance in both metrics. However, GPUs still hold a commanding lead in both power efficiency and area consumed.

Leveraging cache coherency, like the company does with its Ryzen APUs that unite the power of the Zen x86 architecture and Radeon Vega graphics cores, enables the best of both worlds and, according to the slides, unifies the data and provides a "simple on-ramp to CPU+GPU for all codes."

AMD also provided some examples of the code required to use a GPU without unified memory, while coding for a unified memory architecture actually alleviates much of the coding burden.

AMD famously embraced the Heterogeneous Systems Architecture (HSA) to tie together Carrizo's fixed-function blocks, touting that feature among its marketing materials. Much like the approach of extending an Infinity Fabric link between the CPU and GPU, HSA provides a pool of cache-coherent shared virtual memory that eliminates data transfers between components to reduce latency and boost performance. But while the company still appears to be a member of the organization, it no longer actively promotes that functionality in communications with the press.

Data transfers often consume more power than the actual computation itself, so eliminating those transfers boosts both performance and efficiency, and extending those benefits to the system level by sharing memory between discrete GPUs and CPUs gives AMD a tangible advantage over its competitors in the HPC space.

AMD has blazed a path in this regard and secured big wins for exascale-class systems, but Intel is also working on its Ponte Vecchio architecture that will power the Aurora supercomputer at the U.S. Department of Energy's (DOE's) Argonne National Laboratory. Intel's approach leans heavily on its OneAPI programming model and also aims to tie together shared pools of memory between the CPU and GPU (lovingly named Rambo Cache). It will be interesting to learn more about the two different approaches.

Meanwhile, Nvidia might suffer in the supercomputer realm because it doesn't produce both CPUs and GPUs and, therefore, cannot enable similar functionality. While both AMD and Intel have won exceedingly important contracts for the U.S. DOE's exascale-class supercomputers, Nvidia hasn't made any announcements about such wins, despite its dominating position for GPU-accelerated artificial intelligence workloads in the HPC and data center space.