workstation stopped to boot

galk

Commendable
Aug 30, 2016
2
0
1,510
I have workstation with the following config:

supermicro x9dr7-ln4f motherboard
2 Intel e5-2680 CPUs
8 x 8Gb Kingston memory modules
Nvidia 970 video (EVGA)
Asus XONAR sound card
Supermicro chassis

Recently (today, actually) it stopped to boot. When I turn on workstation it makes one beep and then CPU fans just blowing fast, display shows no signal etc.
I've tried to remove all RAM modules, after that motherboard started beeping, code corresponds to missing memory. But no luck turning it again.
I've replaced CMOS battery with newly bought energizer battery, reset it by shortening as it mentioned in manual - still no luck, although now it tries to boot as soon as I plug power cord into chassis.
I also tried to boot without video/sound/both - still no luck. I've tried to connect VGA output from motherboard to display - still no luck.

I wanted to remove 2nd CPU to try to boot it with one, but dont' know how to do that (the FAN is quite big and I can't reach screw, so I decided to postpone it for tomorrow/other day.

Could you please advice what I can do else?


I've bought the workstation aftermarket > 1 year ago, so exchange/send back is not an option. If nothing will work I'll try to use it as a donor to rebuild new with parts from that one (when I'll be able to reliably identify the broken component).

Thanks
 
Solution
galk,

If you have a Supermicro Superworkstation, I would be surprised if it were a motherboard, or power supply problem. Those are server-like in their build quality and reliability. The problem in order of probability seems more likely to be: memory failure, thermal, CPU or BIOS problem.

1. Thermal: Check that all case, CPU, and memory fans are running.

2. Memory: The system will not start without memory. Try starting on only one RAM module and ensure it is installed Slot 1 / CPU1 of the Supermicro X9DR7-lN4F . See:
ftp://ftp.supermicro.com/CDR-X9_1.30_for_Intel_X9_platform/MANUALS/X9DR7_E-LN4F.pdf > Page 2-11

The tests will be complicated by the presence of dual CPU's. As there are two CPU's installed...

Fairace

Commendable
Aug 30, 2016
23
0
1,520
Test individual compnents on another system if you have one that is comparable, or test memory individualy on each slot. Im suspecious of your PSU though but im not sure how you can test that short of getting a new one.
 
galk,

If you have a Supermicro Superworkstation, I would be surprised if it were a motherboard, or power supply problem. Those are server-like in their build quality and reliability. The problem in order of probability seems more likely to be: memory failure, thermal, CPU or BIOS problem.

1. Thermal: Check that all case, CPU, and memory fans are running.

2. Memory: The system will not start without memory. Try starting on only one RAM module and ensure it is installed Slot 1 / CPU1 of the Supermicro X9DR7-lN4F . See:
ftp://ftp.supermicro.com/CDR-X9_1.30_for_Intel_X9_platform/MANUALS/X9DR7_E-LN4F.pdf > Page 2-11

The tests will be complicated by the presence of dual CPU's. As there are two CPU's installed you may be required to have at least RAM one module for each CPU, so is the first test has no results, add another module in the first slot related to CPU2.

3. CPU: The E5-2680 has a MTBF of 170,000 hours- continuous running oro 19+ years but one of them may be getting too hot. After cycling through a reasonable number of RAM combinations, remove both CPU's- typically you will need a long-shank screwdriver, and remount each CPU in turn in CPU 1 position. If you haven't done this previously, see YouTube videos. With LGA2011 as it is such a large die, I clean both the CPU and heatsink surface with denatured alcohol, polish the heatsink surface with a block and scouring powder, making certain when done that all traces of powder and alcohol are gone. Make a very thin diagonal bead in an X-shape on both surfaces and then spread the thermal paste (Arctic Silver) with a business card over both surfaces. the coating should be as thin as possible so that the surfaces are just covered. Screw the fan /heatsink down in small steps in a diagonal pattern- a bit like doing up a car wheel and don't really crank down on it.

This is a bit of fuss and needs care, but is is an important skill to know and importantly will eliminate a possible marginal installation o the E5-2680's.

4. BIOS The problem has BIOS implications, it seems unlikely, given the sequences done already and as it's the BIOS that is running the fans full bore.

Let us know what you find out.

Cheers,

BambiBoom

CAD / 3D Modeling / Graphic Design:

HP z420 (2015) > Xeon E5-1660 v2 (6-core @ 3.7 / 4.0GHz) / 32GB DDR3 -1866 ECC RAM / Quadro K4200 (4GB) / Samsung SM951 M.2 256GB AHCI + Intel 730 480GB (9SSDSC2BP480G4R5) + Western Digital Black WD1003FZEX 1TB> M-Audio 192 sound card > 600W PSU> > Windows 7 Professional 64-bit > Logitech z2300 speakers > 2X Dell Ultrasharp U2715H (2560 X 1440)
[ Passmark Rating = 5581 > CPU= 14046 / 2D= 838 / 3D= 4694 / Mem= 2777 / Disk= 11559] [6.12.16]

Analysis / Simulation / Rendering:

HP z620 (Rev 2) 2X Xeon E5-2690 (8-core @ 2.9 /3.8GHz) / 40GB DDR3-1600 ECC) / Quadro K2200 (4GB) / HP Z Turbo Drive (256GB) / 800W > Windows 7 Professional 64-bit > HP 2711x (27" 1980 X 1080)
[ Passmark System Rating= 5322 / CPU= 19675 / 2D= 767 / 3D = 3544/ Mem =2337 / Disk = 12951 ] 8.15.16

Network: Netgear GS108-400NAS
 
Solution

galk

Commendable
Aug 30, 2016
2
0
1,510
Friends, thanks for help.
I've removed fan from CPU 2 and removed some dust from heat sink.

After that I've removed all RAM modules and started adding them 1-by-1. Seems like 1 RAM module was dead, because system wasn't able to boot when I've put it in 2 different slots. I think I can live with 56GB ram at the moment.

Mistakes made: I didn't paid attention yesterday into which slots I was putting 1 ram module to check if system boots with 1 module. Seems like it matters and system won't boot if module is put into wrong slot by order (or I was just unlucky yesterday and picked exactly the module that was dead)