Sign in with
Sign up | Sign in
Your question

Supermicro Workstation with PSU alarm problem

Last response: in Systems
Share
December 31, 2011 8:48:30 PM

Hi,

I own a Supermicro system, built around a 7046A-T "barebone" superworkstation. Its based around the SM X8DAi motherboard, the 743TQ-865B-SQ chasis, and the PWS-865-PQ power supply. I've been using it for about the last 15 months without any major problems. It is primarily used for 3D rendering and video processing. It has:

2x Xeon X5650 with the Supermicro Heatsinks included in the barebone system
6x 4GB Kingston 240-Pin DDR3 1333 ECC Registered Server Memory (KVR1333D3D4R9S/4G)
3x WD SATA HDD (7200 RPM)
ATI FirePro V5800
ASUS DVD-RW

The system is also running off of a Cyberpower PP1500SWT2 1000W UPS.

I recently replaced the PSU after I got an alarm and an indicator LED (Power Fail) telling me that one of the redundant fans in the PSU had failed and that it had overheated. This happened while I was rendering in a 3D application. After shutting down and removing/replacing the old PSU with an identical model that I had purchased through Newegg, within an hour or two of resuming my work I'm getting the same alarm.

In both instances the remaining fan throttles up (as in the manual). I've felt around the exterior/interior of the case and the PSU and while it feels a little warm in some spots, it doesn't seem at all hot. I restarted the system after receiving the alarm with the new PSU and it starts normally, without any alarm.

I also checked the BIOS when I restarted, and the system temperature is still low, at 31-33 degrees. The cooling/fan configuration has always been set to "balanced". As I'm typing this the system is sitting idle at my Windows 7 desktop. I'd like to resume my work but I'm afraid of getting the same alarm condition.

I had a long back-and-forth with Supermicro's Tech support regarding the problem, and after detailing my BIOS version, giving them the serials of both PSU's and the System, they told me that the two PSU's that I had were "Revision 2.0", and that I needed to fill out an RMA and specifically request "Revision 2.1". I filled out the RMA and sent it in.

Next day, the RMA dept. gets back to me telling me that they cannot RMA it changing the Revision from 2.0 to 2.1. I then had to re-explain the situation to the RMA representative after they wanted to know why I specifically requested revision 2.1. They basically told me that I needed to take it up with Newegg, since I ordered it through them and I needed to get a "new" one.

I'm in the middle of requesting a return/replacement through Newegg, but I'm beginning to wonder if something else might be going on that could be tripping the alarm. I'm wondering about more mundane things like airflow problems, maybe I'm overloading the PSU? or maybe something more pernicious with the Motherboard? Any advice would be greatly appreciated.

Thanks.
December 31, 2011 9:07:21 PM

Output Power 865 W for the PSUs that have the failing fans.

1. Never open a PSU, the voltage there is lethal. Bummer the fan is failing.

2. Newegg will swap for you since it failed in the first hour, but it seems strange warantee support.

3. I'm confused by 'one of the redundant fans in the PSU' in the PSU. Is this your PSU, it's specs say 1 fan... ? http://www.provantage.com/supermicro-pws-865-pq~7SUPM1V... clearly if "In both instances the remaining fan throttles up " then there are multiple fans....
December 31, 2011 9:18:58 PM

tsnor said:
3. I'm confused by 'one of the redundant fans in the PSU' in the PSU. Is this your PSU, it's specs say 1 fan... ? http://www.provantage.com/supermicro-pws-865-pq~7SUPM1V... clearly if "In both instances the remaining fan throttles up " then there are multiple fans....


Apparently Provantage has it listed incorrectly then, because under the alarm description in the SM 7046A-T manual it says:

"Power Fail
Indicates a power supply fan has failed. The power supply module has a redundant
backup fan that will increase its rpm to compensate, but the power module should
be replaced as soon as it's convenient."

There is a "backend" exhaust fan, and "frontend" suck fan. I'm not sure if both are always running. In this failure/alarm scenario, one will significantly increase its RPM if the other fails. I'm just not sure why this error is occurring, because I can restart the system and the PSU behaves normally until the system has been under load for about an hour or so.
Related resources
December 31, 2011 11:35:54 PM

Max Power: ‹75 W for your video card.
The two cpus each have TDP of 95W.

Total load on your PSU should be less than 350 watts under max load. No way that's pushing a high end enterprise class 800+ watt PSU.

"Power Supply 865W AC power supply (cooling-redundant) w/ PFC" just like you said. http://www.supermicro.com/products/system/4U/7046/SYS-7...

One thought that was ruled out: Some PSUs with active power factor correction hate some UPSs with stepped approximations to sine wave output. I can't see why the FAN would complain, but it's not an issue. Your UPS is a high end unit with Pure Sinewave.

Net I can't think of any reason why you are getting bad parts, or anything that you are doing that would cause the alarm. I assume that the people you talked to would have known if there was a procedure to clear a fault after installing a new part so that the system wouldn't continue to report.

Good luck, and I hope the replacement works. (Does make you wonder what the difference is between Revision 2.0 and 2.1.)
January 1, 2012 4:13:17 AM

tsnor said:
Max Power: ‹75 W for your video card.
The two cpus each have TDP of 95W.

Total load on your PSU should be less than 350 watts under max load. No way that's pushing a high end enterprise class 800+ watt PSU.


Yep, the Cyberpower monitoring software installed on the machine says it pulls on average 300 watts under full load.

tsnor said:
Net I can't think of any reason why you are getting bad parts, or anything that you are doing that would cause the alarm. I assume that the people you talked to would have known if there was a procedure to clear a fault after installing a new part so that the system wouldn't continue to report.

Good luck, and I hope the replacement works. (Does make you wonder what the difference is between Revision 2.0 and 2.1.)


There are a couple other things that I might have luck trying. The BIOS has several fan/power usage regimes in its "system health" settings. By default its set to a "balanced" cooling/power usage regime. There is a "low power/low noise" option, the midrange "balanced" option, a "Performance" option, and a "Full" cooling option. Also, the midplane of the chasis has provisions for 4 80x25mm hotswap fans, but the barebone system only comes with 2 installed. I could order another 2 and see if that helps.

The only reason that I can think of as to why I'm getting older parts is that Newegg keeps older stock lying around (which isn't surprising, since there isn't going to be high demand on a proprietary non-hotswap PSU like this one designed to fit SM's E-E-ATX cases.) It just makes it all the more frustrating when there is a problem with the old stock and you RMA it only to get one from the same batch, knowing that there is some elusive newer version out there. Maybe I could specifically ask for them to look for a newer one to send me, or is that just wishful holiday thinking? :ange: 
April 2, 2012 6:57:45 PM

I think it has to do with the fact that Supermicro does not deal with end customer directly and do not have the purchasing record from you. That's why they might want to direct you to newegg. You can tell Supermicro RMA that you want to talk to their tech support and ask for the newer revision. Also since you did buy the server as well, you can tell them that your server is having this issue. For the server, I think the warranty is different. Their tech support can probably help you with the RMA. I have pretty good previous experience with them.
!