RAM or motherboard causing reboots?

aiobh

Honorable
Dec 2, 2013
13
0
10,510
I have a server that reboots during tasks like file-unzipping and copying. I have removed and tested the hard drive, which is fine. It passed Memtest a month ago for 45 hours, 3 passes through all 256GB of RAM without any errors. I am leaving it to run Memtest again over the weekend, but is there the possibility that this is a motherboard issue rather than a RAM issue? I know I will need to do the stick-swapping and testing when I get back to work next week, but looking for any insight out there. This is a KGPE-D16 server board, the same one in fact that I posted about a few weeks ago that wouldn't boot Ubuntu without ACPI disabled (would just reboot like it's doing now), despite being a relatively recent board with the latest bios. Maybe it's wishful thinking that the problem could lie with the motherboard, just not looking forward to testing 16 sticks of RAM that I already thought were OK.
 

aiobh

Honorable
Dec 2, 2013
13
0
10,510
I've had Memtest running for 66 hours, and it has completed 3 passes through the 256 GB of RAM without any errors. Is it possible for there still to be problems with the RAM that are undiagnosed by this test? I'm still going to continue testing the RAM in smaller groups, but wondering if anyone can weigh in with experience here.
 

Supahos

Expert
Ambassador
If it is only restarting during file zipping (pretty CPU intensive task as well as memory) then I would be more leaning toward a possible CPU issue (since all memory has tested good for long stretches of time). Or possibly is your power supply sufficient enough? no idea of the rest of your rig, but if when everything fires up to unzip something if the power supply wasn't good enough it would restart.
 

aiobh

Honorable
Dec 2, 2013
13
0
10,510
Thanks Supahos. The server has 2 x AMD Opteron 6376 (115W) chips and I have a Seasonic 850W power supply. I specified that the server reboots on unzipping and file transfers because it has not rebooted on other intensive tasks: I have run CPU-torture test Mprime on it without error or hiccup, as well as 10 simultaneous FreeSurfer (http://surfer.nmr.mgh.harvard.edu/) processing jobs without any trouble. Then my coworker started copying files, and the server couldn't handle it...?!

Is it possible I damaged one of the CPU's with those tasks? They have gigantic Noctua coolers with a respectable amount of Arctic Silver....

One other thing I have found: if I disable the "Quick Boot" feature of the Boot process, the motherboard is forced to check all of the RAM. It will reboot after tallying ~6.5 GB of RAM.
 

Supahos

Expert
Ambassador
That is flat out odd then. I doubt you damaged the cpu with those tasks as if they got hot it would have restarted (which it sounds like it didn't back then) I guess you could just try and keep a temp monitor open and do a little more stress testing, or try it while unzipping large files but I can't imagine it being too stressful. the PS is more than sufficient, just wasn't listed so I mentioned it.
 

aiobh

Honorable
Dec 2, 2013
13
0
10,510
Spoke with ASUS customer service about northbridge temperature, they don't think the temperature is outside the realm of operational fluctuations. I am really leaning toward "bad motherboard." Overheated RAM might be a possibility, but I have a ridculous number of fans blowing through my Raven v03 case, and it doesn't explain the trip-reboot that occurs during boottime RAM initialization. Memtest v4.20 has been running 95 hours strong without errors. I'm going to load up Memtest 5.0 and see what we get.