Latest Windows 2022 Server Update Causes BSODs on AMD EPYC With VBS Enabled

Inspur
(Image credit: Inspur)

Neowin reports that Microsoft's latest Windows Server 2022 update, KB5031364, is triggering blue screens of death on VMware ESXi hypervisors running on AMD's EPYC server CPUs. Microsoft confirms that the issue is partially related to Virtualization Based Security (VBS) being enabled in Windows Server 2022, which is a feature known to cause stability issues in the past. If you've installed this update, Microsoft recommends disabling "Expose IOMMU to guest OS" to work around the issue.

Specifically, this new bug affects the start-up procedure on guest VM's running on VMware's ESXi hypervisor. When a startup failure occurs, users can expect an error code to pop up featuring the words "PNP Detected Fatal Error." According to Microsoft's notes, VM servers with the update can expect the problem to occur if you have the following: an AMD EPYC server CPU, "Expose IOMMU to guest OS" enabled in VMware settings for the VM, and have both “Enable Virtualization Based Security” and "System Guard Secure Launch" enabled on guest hosts running Windows Server 2022.

From this information, it's likely that Microsoft's October 2023 update conflicts with the virtualization and Windows Defender firmware protection mechanism on systems running EPYC processors. Sadly, this isn't surprising to see. Stability issues surrounding Windows' virtualization features have been problematic ever since Microsoft first released Virtualization Based Security with Windows 10 (and its Windows Server counterpart). 

If you're unfamiliar with VBS, it is a security measure that runs certain parts of the operating system in a virtualized environment. One feature that uses VBS is Memory integrity (found in the Windows Security app). Memory Integrity is a kernel code integrity process that uses VBS as an additional layer of security, so it is protected against vulnerabilities like buffer overflow that allow malware to modify memory.

But Memory Integrity (and by extension VBS) has been known to cause stability issues with specific updates and drivers. You can find countless reports of users all over the web disabling Memory Integrity to fix BSOD problems on Windows 10 and 11 systems. Microsoft even has a feature built into Windows 10/11 that automatically disables Memory Integrity if your system fails to boot up.  On top of this, VBS can also slow down applications and even slow down gaming performance.

Thankfully, this issue is pretty specific, so users running Windows Server 2022 normally or on machines with Intel hardware shouldn't be affected. However, for users with the right hardware, the issue can be catastrophic, especially on virtual machines running mission-critical software. Thankfully, those affected shouldn't have too long to wait for a fix. According to the company's notes, it's working on a fix and "estimate[s] a solution will be available in mid-November 2023."

Aaron Klotz
Contributing Writer

Aaron Klotz is a contributing writer for Tom’s Hardware, covering news related to computer hardware such as CPUs, and graphics cards.

  • tamalero
    is VBS really that required "feature"? I mean, if its this unstable...
    Reply
  • TechieTwo
    Microsoft is in the wrong business IMNHO.
    Reply
  • sjkpublic
    Here come the lawyers. This is a good one. MS changes taking out a major competitor and seizing control of a major virtualization market. All in the name of security. Wouldnt it be nice if the OS was written for security instead of dealing with 3rd party contracts?
    Reply
  • rluker5
    A specific series of CPUs somehow working in an incompatible fashion with what was likely assumed by the software engineers as a safe improvement to make might be indicative of an exploitable security vulnerability of those specific CPUs. They are handling virtualization based security differently than the rest.
    It is a normal feature of Windows that showed up in the latest Windows Server. And even if Windows can get the basic security feature working normally on Epyc, those chips known malfunction with specific code might be exploitable, maybe. Might be an I/O die thing.
    Reply