Tom's Explains: Why Power Supplies Fail

The component of your PC that's under the most stress is the power supply unit (PSU), because it's the power-conversion bridge between the system's components and the mains grid. What that means: It has to deal with every abnormality of the mains and make sure those abnormalities don't affect other components. That's a tough job, and it gets even harder if there's no power conditioner or uninterruptible power supply (UPS) installed.

In low-quality PSUs, the first parts to go are usually the electrolytic caps and the cooling fan. (You can read more about electrolytic cap life calculation in our "PSUs 101" article, where we also discuss the various fan bearing types.) So those are the parts that tend to fail first in low-quality PSUs, but what causes failures in PSUs that use higher-quality components? We'll get to that, but first let's take a look at an especially important component in today's PSUs: the Multilayer Ceramic Capacitors (MLCC).

A Few Words About Multi-Layer Ceramic Capacitors

MLCCs are widely used in power-supply circuits, mostly for filtering purposes. They offer numerous advantages, including low cost, small size, low ESR, high reliability, and increased tolerance to high ripple currents. An interesting feature of high-dielectric-series-type MLCC caps is that their capacitance changes according to the applied DC voltage; the higher the voltage, the less the capacitance. Something that many people don't know is that MLCC caps (and all other ceramic caps, for that matter) can be the source of coil whine. (Yes, coil whine is also generated by caps.)

Ceramic capacitors are piezoelectric, a word that derives from the Greek words πιέζω (squeeze) and ήλεκτρον (amber). So when the applied voltage on a ceramic cap changes, its physical size slightly changes as well, and this can result in an audible noise that users perceive as coil whine.

The Reasons Behind Problems In Quality PSUs

Multi-Layer Ceramic Capacitor Problems

According to our sources, the majority of failures in quality PSUs are because of cracked MLCCs. Even a single broken MLCC can result in issues, and they can crack due to any of the following:

Bad handling (i.e. improper PCB stacking during the manufacturing process)
PCB bending (which can happen during the solder-wave process, if extreme heat is applied)
Careless soldering repairs on the PCB
Bent pins of In-Circuit Test (ICT) fixtures, which are used in the manufacturing line to quickly evaluate the PCBs

Long PCB Mounting Screws

This may sound silly, but in some cases, mounting screws that are too long can actually cause shorts to the PCB.

IC And MOSFET Damages During Assembly

If the manufacturing line is set at higher than normal speeds, and the applied heat is high, there can be either fatal or minor damage to ICs and MOSFETS, both of which will in the long run (or under stressful conditions) eventually cause the PSU's failure.

High Voltage And Current Surges

Besides high voltage spikes, which can be caused by weather conditions (i.e., lightning) or other problems in the mains grids, high currents can be sunk from the mains due to sudden (transient) energy demands caused by or during the system's startup. Those high currents are also called "inrush currents," and in power supplies, the main reason for them is the charge of the bulk cap(s). High voltage and current surges can be the cause of multiple component failures, including fuses, bridge rectifiers, diodes, and FETs. Even if the PSU is equipped with an MOV (surge protection) and an NTC thermistor (inrush current protection), it can still malfunction, especially if the voltage or current surge is too high.

Cracked PCBs

If the shipping conditions are not ideal, and the PSU packages are handled too roughly, you might encounter cases of cracked PCBs that result in PSU failures. This is why protection of the unit itself inside the box is crucial. Thick layers of packing foam are the best way to protect PSUs (and other products, as well) from rough shipping.

A great piece of information that we got after contacting our sources is this: Shipping PSUs via air cargo increases the Dead On Arrival (DOA) rate significantly, because the products are usually shipped in "master boxes" in the belly of passenger aircraft. This transportation method is actually cheaper than shipping on pallets with cargo aircraft. All the loading, unloading, vibration, and possible falls of the master boxes can kill a notable number of PSUs, especially if they're not adequately protected in their boxes.

Bugs: Yes, Those Kinds of Bugs

Image 1 of 3

We are not referring to software bugs here, but actual insects. In the past, we've encountered some PSUs from Chinese brands that feature a piece of foam between the soldering side of the PCB and the chassis, and we wondered about its purpose. It turns out the foam is supposed to keep insects away, because in some environments, ants and roaches can cause fatal short circuits by entering the PSU's internals. But that foam is expensive, and it leaves the component side of the PCB unprotected. Thankfully, relatively few PSUs die because of bugs--the ratio is around 10% of total failures for a high quality and quite popular PSU line (the name of which we cannot reveal), so most companies don't use the foam.

Overview

To summarize, high quality PSUs can fail for the following reasons:

Broken MLCC components
Long mounting PCB screws
Damaged ICs and FETs because of soldering-wave issues
Careless soldering jobs/repairs
Cracked PCBs
High inrush currents
Creepy-crawlies
High surge voltages

You cannot do much about the first six, but you can keep bugs away from your system, and a power conditioner or UPS will protect the PSU from surge voltages, brownouts, and voltage sags, which also apply huge stress to the PSU's circuits. If you live in an area with an unstable mains grid, then the use of a quality UPS is essential.

Aris Mpitziopoulos is a contributing editor at Tom's Hardware, covering PSUs.

30 Comments Comment from the forums

chaz_music

Great article. I used to design power supplies for industry, and your comments are right on. I always thought of failure modes caused by different factors: design errors, manufacturing errors, transport (mechanical usually), and end user handling/installation. For instance, cracked ceramic caps can result in both mechanical stress from the soldering process and high vibration during transport.

The bugs found are also very real! I have found insects and animals before inside of products. When I was in college I would make extra money repairing arcade games. One game had a CRT monitor that was blank, and inside I found a very well cooked mouse with one front paw on the HV connection going to the monitor, and his back foot was on kine ground (HV ground). He had destroyed the monitor because his feet had put HV everywhere. But as Edison had said of his mousing days, they had "left this worldly sphere".

Another point is on UPS systems. Most of the cheaper ones are "standby" or line interactive types. The only UPS system that continuously clean the power are double conversion and are more expensive and have lower efficiency. For this reason, the standby types are often marketed as green power efficient types, which they are, but only because they act only when the power glitches. If you truly have bad power in your area, you need a double conversion type to protect your electronics.

Appreciate the good articles - Charles.
Reply
turkey3_scratch

Wonderful article; to sum it up, the manufacturing process and good soldering seem to be the most important thing for high end PSUs. Hopefully this will help people to realize electrolytic capacitors are not quite as relevant for high end PSUs.
Reply
Glock24

I had the bad luck of having 3 Seasonic PSUs malfunctioning after around 2 months of use. The model was S12II-430.

Those were my choice for systems I built myself and for other people, and some home servers. They had a good track record, with no failures and some of them in service for more than 4 years. In total I must have bought over 10 units of that particular model.

The ones that failed were purchased between 2016 and 2017. Maybe a bad batch? Defective components? Shipping damage? Don't know, but Seasonic replaced them under warranty.

The failure was random shutdowns or reboots.
Reply
Glock24

I've never found bugs on PSUs, but once found ants in (poorly maintained) laptops.

What I've found a couple of times inside dead PSUs are (fried) geckos.
Reply
aquielisunari

It may be difficult for some but keeping your PC apart from high draw appliences can help. While you have a UPS and a surge protector other surges on the same line can affect the quality of the surge protector over time and eventually cause a protection failure. A light will go out. That may not be the unit's power light. It may be a protection indicator.

Please correct me.
Reply
studmoose

The worst thing you can do to a computer or laptop:

1) Have it plugged into street power, without any power conditioning or suppression.
2) Have it plugged in without a top quality battery backup that remediates #1.
3) Routinely turning the units off.

I've been an IT officer for 30 years, and it is rare that systems are turned off, whether they are rack-mounted servers, midrange or mainframe systems. The field engineers and customer engineers stay on-site, as the power on process is the most prone for failure. Their vehicles are stocked with the most failure prone parts to minimize downtime in the event a part needs to be swapped out.

My one home computer is a Dell Dimension 8100, which is on it's second power supply, since it was built in 2001. My work laptops are never turned off, unless I am transporting them or going on a vacation. My current one has been powered up for almost a year now. I leave my systems up, through all but the heaviest of thunderstorms, and even then, some hit when I am away from the house, while they are powered on and idle. Sure, it burns more power, but component failure are minimal. The PSU failure and a HDD grinding after 10 years, were the only two issues with the Dell in 16.75 years. One Thinkpad, 8 years ago, had a HDD failure after 2.5 years.

Repeated power-offs and ons, cause slight surges and create rapid heat expansion on previously cooled CPU, mainboard, storage and PSU. The momentary power spike and the thermal expansions and contractions between power cycles is what really kills a system. My next system will be a higher efficiency SFF, which also will remain powered on for long durations.

If you want to destroy your battery in a laptop, keep unplugging and plugging it back in. Each time you do that, it counts as a charge. To prevent laptop batteries from catching fire, there is counter circuitry that will slowly inop your battery when it gets close to the count limit. This is their best way of determining when a battery might fail and disable it beforehand. Our laptop batteries routinely last over 3-4 years, whereas those who plug and unplug frequently might see a 1.5-2 year lifespan.
Reply
bmwman91

Good wrap up of the issues. I have seen, in both PSUs and GPUs, FET failures in synchronous buck converters very often. In almost all cases, the FETS bear markings of some mystery company overseas which is probably knocking-off parts from ST/ON/IR/etc. The combination of poor timings leading to too much shoot-through, plus higher RDS-On from cheap fabbing, seem to lead to a lot of high-side FET failures which allow the full source rail voltage to be applied downstream (the failures are frequently internal shorts).

At this point, I usually only buy PSUs that I have seen the guts of beforehand so I can see who manufactured the components.
Reply
bit_user

20830419 said:
I've been an IT officer for 30 years, and it is rare that systems are turned off, whether they are rack-mounted servers, midrange or mainframe systems. The field engineers and customer engineers stay on-site, as the power on process is the most prone for failure.
In that case, it's not that turning them on causes disproportionate wear, but it's:
■ When self-test occurs.
■ When the system might be under higher-than-normal stress, hence marginal components (that would eventually fail anyway) are more likely to fail.
■ Starting mechanical components from a dead stop places all sorts of stresses, especially when you don't do it regularly. (see: https://en.wikipedia.org/wiki/Stiction#Hard_disk_drives )

20830419 said:
I leave my systems up, through all but the heaviest of thunderstorms, and even then, some hit when I am away from the house, while they are powered on and idle. Sure, it burns more power, but component failure are minimal.
IMO, this verges into the realm of superstition.

I typically use sleep, rather than power-off, but that's mostly because I dislike waiting for shutdown & bootup. Systems I don't use daily get turned off (except I often leave them plugged in & power supplies' hard switches turned on). And guess what? I don't have component failures, either. I attribute that mostly to buying quality components and a good UPS.
Reply
Karadjgne

It would be nice if psus included actual knowledge in their booklets. For me, the biggest killer of the psu was the ignorance of the end user, caused by lack of info from the manufacturer. Take the active pfc (since UPS were mentioned) in my Evga G2. Works like a champ, great psu. Doesn't work on my Minuteman Pro 700 UPS. Active pfc does not like square sinewaves. UPS works great on my old Seasonic M12-II, plenty of time for shutdowns, but on the Evga if power goes out, boom, so does the pc. The UPS needs to be the expensive 'pure sinewave' kind. Might have been nice if the psu manufacturer included that tidbit of info. Brown outs aren't good for a psu.
Reply
bit_user

20830648 said:
Active pfc does not like square sinewaves.
This.

...except for the part about "square sinewaves".
;)
I think you mean "square waveforms".
Reply

Show more comments