Need Opinion/Advice/Input - Possible Dying System

JSteve

Honorable
May 24, 2012
7
0
10,510
Hello all first time poster - troubleshooting lurker. I was not sure where to put this but I did buy the PC from CyberPower Sept. 2010 so figured I would post here.

System Specs:
MOBO: Gigabyte X58A-UD3R
CPU: i7 950 @ 3.07
RAM: Corsair 2GB 16000MHz (3 sticks)
GPU: Dual EVGA nVidiaGTX 480's 1.5GB - Standard operating temp: 65-68C (72-75C in D3)
OS: Windows 7 Professional 64-bit
PSU: Corsait TX950W
HDD: 1TB SATA III 6GB/s 7200RPM


My PC is showing sign of "dying"? I originally thought it was a PSU issue, then perhaps a MOBO issue...now I am just at a loss.

The details of the issue - as complete as possible:
Day 1: Diablo 3 is about to launch, so I update my video drivers - all seems fine. (driver version: 296.1 used auto-detect tool on website)

~2 days after: First notice that when I boot after the Windows loading icon the screen goes black with a white cursor and just hangs there. This issue continues to persist throughout (happened this morning too). Power down the PC sometimes boots fine and goes to Windows login, sometimes hangs again but then will go to login screen.

Few more days: Diablo 3 launches, everything is going fine aside from the login issue above. Nothing out of the ordinary until a video crash during the game. With the login issues I decide to roll back the drivers and update via Windows.

2 days ago: Get up start playing D3, middle of game the system turns off. Will not turn back on - go into panic mode. Red light on front case card reader is on, light on CMOS flush button area in back is on. When I hit the power button the buttons blue LED light shows at first, but does not boot, afterwards nothing happens when the button is pressed. Took it to the PC to diagnose (I don't have the tools). He calls me 15 minutes after I leave says the PSU volts check out ok and when he reconnected the cords it booted up fine. I go back and pick it up, relieved.

Later that night: Playing Diablo 3 screen goes purple, sound cuts to a whine/distorted sound (from speakers) then stops...system freezes. Power down, go through normal stuff, everything is fine. Decide to stop for the night.

Next Morning: Have a similar video crash, except screen goes maroon/dark red color. Boot down, let it sit for a bit. Turn it back on and go to play the game again, and it powers down again. I let it set awhile and check internal cables. Turn it on about 30 minutes later as it boots a black screen appears BEFORE the motherboard logo screen and DMI Pool checks. The screen is Black but has a purple long box and said something about CMOS error recovery? Maybe? It moved on as I noticed as I was checking other cables below desk. Worked for about 5 minutes and powered off again. Left it alone.

About an hour later: Dwelled on the issue decided to call CyberpowerPC tech support. They said my PC is past parts warranty but I am under Labor. Explained to him the same stuff as here and was issued an RMA#.

This Morning: Woke up and decided to test the system one last time. First boot I got that CMOS error screen again, this time I selected "Last Known Good" option - was not sure what the screen was never saw it before. It then hung up at the black screen before login, but DID load in a couple of seconds. After login powered down almost immediately. Let the system sit a few minutes, turned it on and got the purple screen again. Chose the first option - forgot what it says (should have wrote it down, I know). System booted and logged in fine - had to reset date/time. Everything ran fine for first half hour, so decided to test things. Set my Video settings to a single card and the "Recommended"for PhysX and ran the "Heaven DX11 Benchmark" ran fine fine at about 48 FPS AVG. Rebooted the system, boot and login fine then switched back to SLI mode and ran the benchmark again, everything went fine about 86 FPS AVG.Waited awhile longer - decided to play D3to try and see if the issue would occur again. Over an hour in with normal all High settings, I am still fine and posting here from the "issue"system.

So as you can see I am at a loss. Any info or thoughts would be seriously appreciated. Gonna test the system today and see if anything happens, if so I will send it out tomorrow...if not Monday if I have issues over the weekend. If I send it I have to pay for parts and shipping (shipping will be atleast 75$ one way I am told). Budget is tight so I am worried there too.

PS: D3 is running in the BG as I post this and system has been on over 3 hours now.

Edit: Updated specs.
 
Have you done a bios update on the motherboard ? If you start having those issues again you can try taking one of the cards out and try running single cards and switching them to make sure that both will run by themselves without the other card in a slot next to it. I know you tried a similar thing by doing it in the Nvidia control panel but this would be a little more definitive by actually physicaly having one card at a time in the motherboard.
You didn't list the psu and while it was supposedly tested it could still be a concern and a new psu is considerably cheaper then any of the other components and supplies power to all the components so it is an important part. If the power supply was starting to go then the problem would be intermittent and hard to diagnose and could fool someone by passing the psu test and then acting up later on. The usual procedure with RMA's is that the number is good for two weeks and even if you didn't send it in and the number expired you can always reapply for another Rma.
With the situation that your in now with having to pay for parts and shipping it could be cheaper if you were able to determine it was the psu and it would end up costing less than $100 for a new one that you could put in yourself and you would only need a screwdriver (phillips)
What is the psu that you have now?
 

JSteve

Honorable
May 24, 2012
7
0
10,510
@inzone - sorry about that, I did update the specs - I was trying to hurry. As it now says my PSU is a Corsair TX950W. Oddly when I first got this system, it started up and then died - somewhat similar - but never came back on so I had to RMA the system Day 2.

As for the BIOS I have never updated it, since I got it. Reason being I do not know the REV code on it, I know where it is located, but I can never see it and just didn't wanna toy with components to be able to see it.

System is still on with no issues. I really wish I know what the purple screen was and what I actually hit. /facepalm
 
I have a Gigabyte board and it's a hgih end model , but I started to get blue screens while playing games and I could never figure out what was the cause and I tried everything. I finally sent the board in by RMA and they ended up having to replace the bios chip itself and now it's working fine. So sometimes the issue is clear and you can fix it and other times it's not and you have to send it in.
At this point judging by your descriptions of the problem it sounds like a video card or monitor problem but a monitor won't make the computer crash so that leaves the video cards. The thing with it being the video card is that it could also be the psu because if the card isn't getting the right power then that could cause the card to act up and if a video card is defective it will cause things similar to what you described.
When you did the RMA when you first got the Pc what was the cause , did they tell you or did they just ship a new Pc back to you?
My feeling is that it's the power supply and I'm afraid at this point it's just a guess and ultimatly you will have to decide to RMA or try fixing it yourself. Before I do an RMA i do like to try all the available options that I can so that if an RMA is not necessary then I will have saved the time wasted of not having a computer. I'm not sure that a bios update would fix the issue but it is something that only takes 10 minutes and a power supply replacement would only take about an hour to do , but as I said these are guesses and I do wish there was a clear cut indication of what the cause is.
 

JSteve

Honorable
May 24, 2012
7
0
10,510
Still up and running - no issues. Will test with Diablo 3 more later.


@inzone - The Tech said the original RMA was to replace the PSU - though at first they thought it was a RAM issue when troubleshooting on the phone. Only issue with my PSU I have heard about was with Asus MB's.

If it was one of the GPU's its fine as I got two and can get another later on, but just want to be sure...but at the moment have no real way of making sure what the problem is as you say.

I really don't think it is the MOBO...they seem to be either dead or not and the intermittent issues don't seem to be along those lines- unless as you say its a compatibility issue due to the BIOS?

I really am just dwelling here and reaching. Funny I am a grad student in Technology and I am just at a loss. I think for now I will keep pushing it for the weekend and see where it stands come Monday morning.
 
There are some stress testing software programs that you can download and put the video cards through to see if something happens. Diablo 3 isn't going to push a GTX 480 never mind two. So you can try some of the software from Futuremark that will stress the video cards. If you had BF3 or Crysis 2 you could try running one of those games on Ultra for a period of time and that would push the cards. But your best bet is the testing software.
 

JSteve

Honorable
May 24, 2012
7
0
10,510
Still on today and just did a 3 hour Diablo 3 run. Maybe it was a BIOS error that corrected itself? Intermittent issue that just hasn't shown again? I am not sure, but if anybody has anything else to weigh in with, I would appreciate it. Check back tomorrow/later.
 

JSteve

Honorable
May 24, 2012
7
0
10,510
Update: Worked fine all day yesterday and today so far. No random shutdowns and no video crashes.

However, last night when I went to shutdown the PC it turned back on after completely shutting down by itself. The only remedy so far has been to wait to it shuts down and start to reboot and hold the power button in for about 4 secs and it won't turn back on.

I have tried turning off all Wake settings in the CMOS to disabled and have disabled Auto wake features on Network adapters in device manager. AM I missing something? Could it be something else?

Another note, when I was in the CMOS and checked the CPU clocking section it had a red error pop up at first that the system failed due to a voltage issue currently. So I went through and set my RAM and CPU back to "Optimized Settings""F7" and took off the old overclocking that was originally set by CyberPower. I am believing this may be tied to the purple error screen I was curious about in original post.
 

JSteve

Honorable
May 24, 2012
7
0
10,510
Approximately 20% or so, I had it done when I got the system from CyberPower.

Found the power settings you were talking about. I disabled wake timers, anything else that may be a problem tucked in there?

Settings I have (not including obvious such as slideshow):
Turn off HDD: 30 mins
Wireless Adapter: Max Performance
Sleep: 30mins
Hibernate: Never
Wake Timers: Disabled
USB Selective Suspend: Disabled
Power Button: Shutdown
Sleep Button: Sleep
PCI Expresss Link State: Moderate Power Savings
Processor:
------ Min: 5%
------ Cooling: Active
------ Max: 98%
Display Off: 25mins





 
I'm not one to ever put the computer to sleep and so I set all the timers and and settings to never. Never turn off the HDD or monitor and never sleep and diable sleep button. If I'm not using the computer I shut it off. But that's just me and the way I do things , a lot of people like the idea of coming home a pressing a button to wake the computer and it's on. The boot up process on my computer takes less than a minute so that's good for me.
A 20% overclock is getting agressive and it will add to the likely hood of the overclock failing at some point. If you get a lot of the overclock failing you may have to add some voltage and how much is a question because you don't want to add too much but I would leave that as a last resort and try other things. Bios update and driver updates are always good to do and you said you have done those. I would try some stress testing software like those found in Furmark and Futuremark. The Furmark testing will just do the video card , while the Futuremark testing will do the cpu and video card

http://www.futuremark.com/
http://www.ozone3d.net/benchmarks/fur/
 

JSteve

Honorable
May 24, 2012
7
0
10,510
I have not had a single problem relating to the original problem since I clicked the first option in the CMOS error screen that appeared...I think it just reset it to a default state. SO basically day 3 now and all is ok, error wise.

The power issue is still here though, when I shut down it still automatically restarts. Here are ALL the different attempts I have made to correct it:

- Disabled Auto-Restart on Errors (Advanced System Settings > Startup and Recovery > Settings)
- Disabled ALL Auto-Wake features under the Advanced Power Management section of the CMOS
- Disabled Auto-Wake features manually in Device Manager for Network Devices.
- Disabled (via Device Manager and CMOS) the ability for Mice and Keyboards to wake system.

I also took your advice inzone and set HDD to never sleep and disabled Sleep/Hibernate (I normally do not use them either) - however, I left my Monitor to turn off after 30mins still. I also disabled the Sleep button.

As mentioned earlier I clicked Optimized Defaults in the CMOS for RAM and CPU, so the OC'ing is no longer active. Though I am not sure if it changed anything because as I said above I think the CMOS reset after the error.

Finally I have been checking the Event Viewer logs and the only thing I have seen were critical errors about the system not shutting down properly. However, there have been none over the last 3 days (since original post).

All in all I think the original issues may be voltage related. The power button, I have no friggin idea, but it stays off if I shut down and then hold the power button in for ~4secs when it tries to turn back on. This also doesn't cause any Disk check errors or anything, so I am just gonna leave it be. Hopefully, things will ride out OK until August when I will have more cash to throw at fixing things...if not I can always use that RMA.
 
When shutting down you can always switch off the psu once it shuts down and before it restarts , might be a pain but as you say for now....
If you want to try one more thing , you could unplug the pwoer supply and remove the cmos battery and push the power button for 10 sec and then put everything back and see if it still does it.