GA-990FXA UD3 rev. 4 shuts down randomly

liste

Honorable
Feb 3, 2014
8
0
10,510
Hello,

I have just finished setting up a new computer for the first time in years. The components are as follows:

1. 600w TR2 Thermaltake PSU
2. H100i Cosair cooling
3. GA-900FXA UD3 rev. 4 motherboard
4. AMD FX-9590 CPU
5. 1 TB WesternDigial HDD
6. 32 GB RAM (pv332g186c0qkbl)
7. LG DVD writer
8. Cougar ATX case.
9. Two monitors (one VGA, and one HDMI to VGA adapter).

I'm running Ubuntu 12.04 64-bit, and the issue is stated above: the computer randomly shuts down every once in a while. Here are some of the things I have observed:

1. It has not yet happened more than once per day, though I haven't used it for more than 4-6 hours at a time.
2. The last time it happened (a few minutes ago) was after a cold boot. Ubuntu started, and as soon as I made the first click on the "Dash Home" button, the computer instantly shut off.
2.1 The keyboard "numlock" light remained on.
2.2 I was unable to use the computer's power button to turn the machine back on. I had to start by flipping the power switch on the PSU off and on again.
3. Once it happened while I was installing Windows in VirtualBox, but after restarting immediately and entering BIOS, it said my CPUs were around 35C.

I've been doing some research, and what I have found so far is one person suggesting that the PSU might be insufficient. I added a "kill-a-watt" metre to the computer, and it was generally reading between 200 and 300 watts. Another suggested that the North Board was overheating. I felt it right after the computer shut down, and the heat sink was only slightly warm. What is the most likely cause of the issue. Have I mal-configured the hardware? Is it more likely a software issue?

Thanks.
 

dmitche3

Distinguished
May 25, 2008
253
2
18,815
When you say 'shut off instantly' I take that as if I were to blink the unit just powered down and there was no attempt at a Windows/Linux shutdown, yes?

I don't think it is software and you can try some tests. The first would be boot it up into the BIOS and let is sit for an hour or so. That might help tell you if it is something OS, software or not. If it fails it is hardware at this point.

The NUM lock key light is interesting. I'm not sure what to think of that. Sounds more like a 'frozen' computer than one that shutdown.
 

liste

Honorable
Feb 3, 2014
8
0
10,510
Thanks for the fast reply!

That's correct. Other than the fact that the numlock light is still on, I would have said someone yanked the plug out from my computer. Of course, that's not a possibility in my current setup.

The power LED light on the front of the box turns off, the monitors go black, and all the fans turn off; only the numlock light remains on on the keyboard (plugged in through PS/2). Pressing the power button on the computer does nothing at all. To restart the computer, I have to manually cycle the power on the PSU.

I agree that it seems like a hardware issue, but if it is, I have to figure out which piece. I have run the computer for several hours before, so I very much doubt anything will happen if I leave it in BIOS for an hour, but I'll give it a try. Any other ideas in the mean time?
 

liste

Honorable
Feb 3, 2014
8
0
10,510
Hello again,

Running in BIOS for 1 hour caused no problem at all, though I didn't expect it to. Here's another issue I just got, perhaps unrelated, but perhaps not.

I was running Ubuntu and Chrome, nothing more. I opened a new tab and suddenly everything froze. My keyboard's numlock light turned off, and the Caps Lock and Scroll Lock started blinking at approximately 2.5 times per second. I couldn't move the mouse. Unfortunately I had nothing actively moving on the screen (ex.: video / gif), but I think that the screen was equally "frozen".

I have actually seen that happen before on this machine, but I forgot until now. The resolution was to press and hold the power button until it shut off. Once the power was off, the Scroll Lock and Caps Lock lights were stuck on. Simply starting the machine again by pressing the power button worked.

One final thing: I have 1866MHz RAM, but according to the BIOS, it is only running at 1600. Trying to increase it to 1866 prevented the computer from booting altogether. Fortunately, the computer ?BIOS gave me an error message telling me that it couldn't boot with the new settings, so I was able to choose to go back into BIOS and revert my settings.
 

dmitche3

Distinguished
May 25, 2008
253
2
18,815
I hate these type of problems. grrr.

Do you ever run Windows?

Under Linux, have you installed any third party video drivers? Or any third party drivers?

I see that you have two monitors, possibly detach the HDMI one, or VGA. I would prefer the HDMI detached as the video card I believe ( not sure if it is true or not but I've noticed on my machine) that the HDMI draws more power than the VGA output.

It's a good thing that that the problem wasn't solved sitting in the BIOS. At this point I would be leaning towards power, heat, or software.

Heat: perhaps a safety feature of the CPU is shutting down the box. Try downloading something to monitor the CPU temp and that it writes to disk so you can view it after the crash. I don't know what to download under Linux as I've lost touch with it in the past few years.

Power: Somewhat related to heat but I don't tend to believe this with the PSU that you have at 80% bronze eff.

I really doubt it is memory issues as the problem is consistent and it doesn't sound like. If you want to waste some time you might boot Ubuntu and run the memory tests form the CD/DVD.

My bet would be out of pure ignorance would be heat and a safety shutoff as the motherboard is still functioning by the evidence of the num lock still being on. OR, a video/mobo driver that is misbehaving as a second guess.

Good luck. Hopefully someone else will see this post and throw in some ideas.
 

liste

Honorable
Feb 3, 2014
8
0
10,510
I totally agree about hating these problems, and I can't thank you so much for your input!

// WINDOWS
Yes, I do run Windows, but virtualized. If you think it could be a help, I could format and install Windows directly. This is a brand new machine made from brand new pieces, so anything has the potential to be the issue. I've got discs for Windows XP Home x86, 7 Home (x64), and 8 (Developer Release) x86 and x64.

// DRIVERS && MONITORS
Under Ubuntu, I installed the additional drivers for ATI/AMD roprietary FGLRX graphics driver (post-release updates). It seemed to be necessary to get both my monitors working. I'll unplug the HDMI as eventually I want to plug it in through DVI, but I don't yet have my DVI-D wire. I'm currently connected through a borrowed HDMI to VGA adapter.

// OVERHEATING
The reason I'm NOT leaning towards heat is what I stated in my original post: it once happened right after a cold boot (I forget to mention that the computer had been off for hours before hand). On the other hand, I can be running it for several hours without any issue. It seem like it would be difficult for it to overheat right after a 10-second boot while it doesn't overheat while running Windows virtualized and several installations, etc., at the same time. Correct me if I'm wrong here. I should note that I lost a very little of the cooling paste on the CPU cooler before placing it on the final location. However the paste I lost was lost directly on the CPU. In other words, the "stock" cooling may not be exactly placed as it should on the CPU, but it is all on there. I do have a tube of Sliver Based Thermal Grease, if it comes to that. When I reboot (after cycling the power on the PSU), the BIOS says my CPU is around 30C, so I'm thinking it is probably not an overheating issue.

// POWER
I'm not quite sure what you mean by 80% bronze eff, but I did attach a watt metre, and it was reading between 200 and 300 watts. Unless the computer can spike power usage quite suddenly (even before I start running programs), I'm expecting this is not the issue.

// RAM
I'm not quite sure what you mean when you say the problem is "consistent" Perhaps I was unclear with my description - I apologise. I ran the Ubuntu memtest+ two nights ago and it came up with several errors. Here's a screenshot: https://plus.google.com/115114710203188146582/posts/e7GrAM13P9y Last night, I ran it again with just one of the sticks of ram installed. There were no errors detected on that one stick. I have been running the computer this evening with that one stick without any restarts, but unfortunately I can't say that is abnormal. Sometimes the computer will run for hours, sometimes it will crash as soon as it starts. Looking at the screenshot, do you think that RAM could be the issue? It seems odd brand new ram should have errors? I don't know how to read them to know if they are serious.

I have seen heat issues before (ran a computer without a cooling supply in the day!), and the instant shut-offs definitely react like a heat problem, but I don't honestly see how it can be a heat problem while being so inconsistent. I've never had a video/mobo driver issue, so it could be.

One more bit of information: I have gotten a couple Linux Kernel panics as well: computer freezes with screens still on, but keyboard and mouse do nothing. All animations on the screen freeze in place. CapsLock and ScrollLock blink together 1.5 times per second. Again, I have had no kernel panics or random shut-offs this evening using only the one stick of RAM that showed no errors in memtest+. What are your thoughts? Do these sound like RAM error symptoms?
 

liste

Honorable
Feb 3, 2014
8
0
10,510
Thanks Calvin7. Can you tell me where you found that? My system still only runs at 1600MHz when I have only one or two sticks in. Do you think the fact that I am running 1866 MHz ram not supported by my CPU that would cause the random shutdowns I'm experiencing?

I ran another memtest86+ test last night with 2 of my sticks RAM in, and it found errors. I'm now running the computer on those two sticks to see if the machine will randomly shut down. So far no problems, though it has only been a few minutes.
 

liste

Honorable
Feb 3, 2014
8
0
10,510
I have been testing my ram 1 or 2 sticks at a time. Two of them, I'll call them D3 and D4, went through memtest86+ with no errors at all. When all 4 sticks were installed there were errors. When D1 and D2 were installed there were errors, always in round 5. I have yet to test D1 and D2 separately so as to discover which is at fault for the memory errors.

As it stands, I have been running my machine again this evening for about an hour, with relatively heavy usage: I've been running VirtualBox with Windows 7 and Windows 8, and of course my Ubuntu host. I've been doing a few lighter things as well: a couple installs, some FTP transfers, and some browsing. It has not (yet) crashed this evening despite having all four sticks installed. Perhaps I didn't install one of the sticks correctly the first time, and taking them out and putting them back in fixed the issue.

Hopefully this is resolved, but I'm going to leave it open for another week or so while I run some tests. If it crashes, I'll post again with more details. Thank you both for your help, and if you have any more ideas, please let me know.
 

dmitche3

Distinguished
May 25, 2008
253
2
18,815
I only have a few minutes online today. :( I think that you found the problem. As far as I know ( which is limited) I thought that as long as the chips are all the same speed that they could run slower. With memory, there is no 100% sure answer though as at various times in the past there have been incompatiblity between memory and MOBOS, and speeds, etc. that come and then go, and then return in time with the advent of new tech.

I think that seeing that you have the tests coming up with a positive result I think that you have it. It could be a bad seating of the chips but I would lean more towards that they are different brand chips ( I didn't go back and read if they were or not as I'm in a hurry AND lazy) and not compatible when running at the different speed ratings.

I'm glad that I mentioned running the memory tests. :) I usually am dead out of the box when I mess up my memory installations.
Memory is not a fun thing to mix and match, IMHO.
 

liste

Honorable
Feb 3, 2014
8
0
10,510
Thanks dmitche3! But my memory is all identical, and all purchased at the same time. Here's the link on Amazon for exact details: http://www.amazon.com/Patriot-1866MHz-Desktop-Heatsink-PV332G186C0QKBL/dp/B00DS0D7OK

Do you think I should try returning the bad stick(s)? It seems to be running fine (despite errors in the memtest86) since I took everything out and put them back in, but obviously I'd rather not have occasional crashes. Actually, I did have one kernel panic last night, but that could be software-related.
 

liste

Honorable
Feb 3, 2014
8
0
10,510
Hello again, I have finally narrowed the issue down to 1 of the 4 sticks of RAM, which were all purchased together less than a month ago. I guess I'll try seeing whether I can return one stick to the supplier or manufacturer. Running on the other 3 sticks, I have had no Kernel Panics or random shut-offs as of yet. One last test tonight will be to verify that the stick that gave an error in memtest still gives it when plugged into a different slot in the MB.

Thanks again for your help, dmitche3!