AMD Ryzen 7000 Burning Out: EXPO and SoC Voltages to Blame (AMD Responds)
Impacts all motherboard makers and all Ryzen 7000 chips.
Update 4/27/2023, 7:40 am PT: AMD has now issued a second statement clarifying that it has identified the root cause, which is in fact the SoC voltage we identified below in our coverage. AMD has issued new firmware to reduce SoC voltages to 1.3V. You can read the second statement here.
Update 4/25/2022, 10:41 am PT: AMD has responded to the numerous reports of chip failures with a short statement acknowledging that claims do exist and that the company is investigating. The statement confirms that AMD is working with its ODM partners (motherboard makers) to ensure safe voltage settings are applied to its Ryzen 7000X3D CPUs, but doesn't name the specific actions that it is taking:
"We are aware of a limited number of reports online claiming that excess voltage while overclocking may have damaged the motherboard socket and pin pads. We are actively investigating the situation and are working with our ODM partners to ensure voltages applied to Ryzen 7000X3D CPUs via motherboard BIOS settings are within product specifications. Anyone whose CPU may have been impacted by this issue should contact AMD customer support." -- AMD Spokesperson to Tom's Hardware.
Notably, the statement does not acknowledge the multiple reports of failures with standard Ryzen 7000 processors. ASUS has also issued a statement, clarifying that it will issue firmwares that limit SoC voltage to 1.3V. We're following up for more detail with AMD and will update as needed. Our original coverage with deeper details about the issues follows:
Update #2 4/26/2023 4:00pm PT: Multiple motherboard vendors have now issued press releases pointing to new firmwares they will release in the coming days, with many citing SoC voltage as the adjusted parameter. You can read the vendor statements here.
Original Article 4/24/2022, 9:49 pm PT: Multiple reports of Ryzen processors burning out have burst onto the internet over the last few days. The damaged chips have not only bulged out and overheated to the point they have become desoldered, but they have also done significant damage to the motherboards they are installed in. We reached out to our industry contacts and learned some new information about the nature of the problem and the scope of AMD's planned fix. Our information comes from multiple sources that wish to remain anonymous, but the info from our sources aligns on all key technical details. As with all unofficial information, we should take the finer details with a grain of salt until AMD issues an official statement.
First, we're told this condition can occur with both standard Ryzen 7000 models and the new Ryzen 7000X3D chips, though the latter is far more sensitive to the condition, and the root cause could be different between the two types of chips. AMD will issue a fix soon, but the timeline is unknown. We're told that failures have occurred with all motherboard brands, including Biostar, ASUS, MSI, Gigabyte, and ASRock.
According to our sources and seconded by an ASUS statement to Der8auer, the problem stems from SoC voltages being altered to unsafe higher levels. This can be imposed from either the pre-programmed voltages used to support EXPO memory overclocking profiles or when a user manually adjusts the SoC voltages (a common practice to eke out a bit more memory overclocking headroom).
Our sources also added further details about the nature of the chip failures — in some cases, excessive SoC voltages destroy the chips' thermal sensors and thermal protection mechanisms, completely disabling its only means of detecting and protecting itself from overheating. As a result, the chip continues to operate without knowing its temperature or tripping the thermal protections.
AMD's modern chips often run at their thermal limits to squeeze out every last drop of performance within their safe thermal range — it isn't uncommon for them to run at 95C during normal operation — so they will automatically continue to draw more power until it dials back to remain within a safe temperature. In this case, the lack of temperature sensors and protection mechanisms allows the chip to receive more power beyond the recommended safe limits. This excessive power draw leads to overheating that eventually causes physical damage to the chip, like the bowing we've seen on the outside of several chip packages, or the desoldering reported by Der8auer.
The chip continues to receive excessive current through the motherboard socket during this death spiral of sorts, thus leading to the visible damage we can see in the socket to the vCore pins and the bulging on the chip's LGA pads. However, less visible damage also extends to the CPU SoC, CPU_VDDCR_SOC, and CPU VDD MISC rails/pins — they just don't pull enough current to leave visible scorching like we see with the vCore pins.
We do know that 1.25V is the recommended safe SoC voltage limit, and we're told that 1.4V and beyond definitely increases the likelihood of the condition occurring. To be clear, running beyond 1.4V doesn't ensure that your chip will burn out, but your odds will increase. Conversely, 1.35V appears to be "safe." Proceed at your own risk, though. [EDIT: AMD has issued a statement, clarifying that it will issue firmwares that limit SoC voltage to 1.3V. As such, this appears to be the maximum safe limit.]
Our sources say that AMD is working on a fix that includes a voltage cap or lock in the firmware/SMU, which should prevent EXPO memory profiles and simple BIOS manipulations from exceeding an as-yet-undefined limit. We're also told that AMD can't completely prevent SoC voltage manipulations because the amount fed to the chip is dictated by the VRMs, leaving a means for crafty motherboard vendors to allow voltage changes despite AMD's lock (this would not be the first time motherboard vendors have circumvented limits to offer rare functionalities).
A few motherboard vendors, like ASUS and MSI, have already issued new BIOSes to correct some of the issues. However, we have confirmed that failures have also occurred on Biostar, ASRock, and Gigabyte boards, so all vendors are impacted to some degree.
As with all forms of overclocking, any damage from using an EXPO overclocking profile is not covered by your warranty, but given the situation, we don't think that AMD or the motherboard vendors would use the lack of warrantied EXPO support to invalidate warranties.
The advertised performance you get from an EXPO profile is also not guaranteed by the chipmaker. It's also noteworthy that AMD's purportedly planned SoC voltage cap could lead to lower stable memory overclocking frequencies. However, we don't think that will matter too much to most Ryzen 7000 owners, as the sweet spot DDR5-6000 should work just fine within the proposed limits. However, extreme overclockers and those pushing the very bleeding edge of performance could end up with lower overclocking limits. Time will tell.
For now, you could take a few common sense approaches to potentially protect your chip while we await an official statement from AMD — but proceed at your own risk.
This condition means that, even though the odds are small, an EXPO profile could lead to physical damage to your chip and motherboard. If you use an EXPO profile, you should check your SoC voltage in your BIOS or with a utility like HWiNFO. If it is at or exceeds 1.4V, you should disable the profile and run the memory at standard stock settings. If you have manually dialed in a 1.4V or higher SoC voltage, dial that back to a safer setting for now. [EDIT: AMD later confirmed that 1.3V is the maximum safe voltage.]
Stay On the Cutting Edge: Get the Tom's Hardware Newsletter
Get Tom's Hardware's best news and in-depth reviews, straight to your inbox.
Now all that is missing is the official word from AMD on the matter. We're told the company is moving quickly to resolve the issue, so we expect a statement to arrive soon. We'll update as necessary.
Paul Alcorn is the Managing Editor: News and Emerging Tech for Tom's Hardware US. He also writes news and reviews on CPUs, storage, and enterprise hardware.
-
Viking2121 A little typo, DDR4 -6000 is a bit crazy for DDR4 lol.Reply
The problems you have to face when adopting a brand new platforms, hope them people get replacements, I know Asus will probably try everything they can to get out of replacing broken stuff. -
jkflipflop98 Seems like "hey I have no idea what my temperature is here" would be cause to throw an error.Reply
Nah. -
TerryLaze
You should always add the corresponding quote since most people never read the aritcle itself.jkflipflop98 said:Seems like "hey I have no idea what my temperature is here" would be cause to throw an error.
Nah.
Yeah, the thermal sensor burns out and the CPU just keeps running willy nilly.
Our sources also added further details about the nature of the chip failures — in some cases, excessive SoC voltages destroy the chips' thermal sensors and thermal protection mechanisms, completely disabling its only means of detecting and protecting itself from overheating. As a result, the chip continues to operate without knowing its temperature.
-
As with all forms of overclocking, any damage from using an EXPO overclocking profile is not covered by your warranty, but given the situation, we don't think that AMD or the motherboard vendors would use the lack of warrantied EXPO support to invalidate warranties.Reply
I like that part. A great way to not lose this customer. -
Kamen Rider Blade So new general rules for Ryzen 7000 series SoC Voltage.Reply
1.25V is the "recommended safe SoC voltage limit".
1.35V "appears to be safe."
1.40V and beyond "definitely increases the likelihood" of the Burn-Out condition occurring
So if you're OC-ing past 1.25V, make sure your voltage is < 1.40 V
1.35V "appears to be 'Safe'" for the CPU's, anything between:
"> 1.35 V" & "< 1.40 V" has some danger factor to "Burning-Out" your CPU. -
Sergei Tachenov Interesting.Reply
I have issues with EXPO and sleep, so I only turn EXPO on if I'm going to play a game, which happens quite seldom.
But I do use Eco mode, EXPO or not, so I'm never running at the thermal limits. In fact, I don't think I ever exceed 75C even at the highest load.
And if the problem is overheating, then Eco mode should be able to help, right? Somehow only safe and unsafe voltages are discussed, but not thermals and Eco mode. -
heickel.ramadhan it's a lesson for AMD and Partner to conduct more deep testing when launching brand new platform, including overclocking possibility and limit (both memory and CPU). to ensure safety to their customerReply
ppl who expeirence such issue, might never back, if it were me I'll never touch their product again for a long time because it's really painful and discouraging to have something burn, even if they replace it for free. -
TerryLaze
The problem is that the thermal sensor blows up which means that if it happens it will keep showing you the same temp until the whole CPU blows up because it won't get any new update on the real temp.Sergei Tachenov said:But I do use Eco mode, EXPO or not, so I'm never running at the thermal limits. In fact, I don't think I ever exceed 75C even at the highest load. -
-Fran- Sounds like a very plausible root cause, so I hope AMD and motherboard vendors actually DO TALK now.Reply
Talk about growing pains, oof.
And I agree: I hope they realize that the "on paper" restriction of EXPO/XMP invalidating warranties is stupid. If you won't warranty it, then don't advertise it as part of the platform, you stupid people from marketing.
That also begs the question: can we start talking about not using EXPO/XMP going forward? Not even advertising using higher clocked kits, unless it's for OC investigations and always remind people it will void their warranty. Until both AMD and Intel stop being stupid about it.
Regards. -
So this is a widespread problem with the entire 7000 series which I suspected. Sure am glad I didn’t buy one of this seriesReply
Inadequate testing is one hypothesis
I’ve used AMD processors all my life and career and still that’s all I buy, but this seems really super sloppy, and I may have to reconsider my purchase decisions from now on. I agree that if a sensor stops reporting data, the CPU should shut itself down for safety reasons and report an error. This is unacceptable performance from AMD. I will never recommend this series of processor to anybody for any reason.
For the affected people they should replace the CPU with one that doesn’t have these problems and also reimburse them for their motherboard and anything else that got damaged. It’s the least they could do.