Microsoft Patch Enables Hotswapping AMD GPUs In Linux Systems

AMD
(Image credit: AMD)

As one of the leading cloud computing companies, Microsoft uses AMD's data center GPUs as well as Linux on the servers it uses. Sometimes those GPUs need to be installed into running machines or replaced,  but that requires shutting down the server and then swapping out the GPU. To enable flawless hot-plug replacements, which means the GPU can be pulled out of the PCIe slot and replaced while the computer is still running, Microsoft has developed a special driver that enables hot-plugging AMD GPUs on Linux servers.

Microsoft's AMD GPU PCIe hot-plug patch for Linux has been posted on the mailing list and GitHub for reviews and testing, reports Phoronix. The patch was designed for Linux and is aimed at Microsoft's Azure machines that may benefit from the ability to hot-plug GPU-based accelerators when needed.

"We are from Microsoft Research and are working on GPU disaggregation technology," a code review request reads. "We have created a patch […], which will enable PCIe hot-plug support for AMD GPU. […] We believe the support of hot-plug of GPU devices can open doors for many advanced applications in data center in the next few years, and we would like to have some reviewers on this PR so we can continue further technical discussions around this feature." 

While Microsoft did not disclose any details about its GPU disaggregation tech, it looks to be a proprietary capability that allows Azure machines to dynamically add GPU acceleration to servers that do not have the cards physically installed. Since machines that will house the actual GPU hardware work in tough conditions (as compute and gaming GPUs tend to get hot, overheat, and fail), GPU hot-plug support will be a particularly useful feature for them. 

Hot plugging a graphics board or accelerator isn't entirely new, but doing it via the PCIe interface is. AMD has previously developed a driver that enables hot-plugging a graphics card to a Thunderbolt 3 port using an eGFX box years ago. However, it looks like AMD doesn't support this functionality for its data center GPUs just yet. This is apparently where Microsoft's engineers come into play to assist its partner and get a benefit for the company's Azure division. 

Anton Shilov
Contributing Writer

Anton Shilov is a contributing writer at Tom’s Hardware. Over the past couple of decades, he has covered everything from CPUs and GPUs to supercomputers and from modern process technologies and latest fab tools to high-tech industry trends.