Sign in with
Sign up | Sign in
Your question

Psuedo-random freezes & resets. No BSOD.

Last response: in Systems
Share
December 31, 2010 4:17:59 AM

Hello,

A few months ago, I posted to this forum about some random crashes that I was having. At first, they seemed to only occur when I had an eSATA hard drive plugged in. However, they've become increasingly more frequent, now occurring even when the eSATA is unplugged. It's a custom-built computer that I ordered from a local store last Christmas. The warranty is still in effect, but will expire soon. Here's the specs:

Cooler Master Stacker 830
Antec Quattro 1000w
ASUS P6X58D Premium
Intel Core i7 920
Corsair Dominator DDR3-1600 / 8-8-8-24 (2GB x 3)
XFX Radeon HD 5970 Black Edition
Auzentech X-Fi Prelude
Western Digital Caviar Black 2TB
Windows 7 Professional 64-bit


Ever since I brought it home, I've had random freezes and resets, as well as a few other problems. As more time goes on, the crashes get more frequent, occurring once a day on average. They can occur when the computer is idle, or even just after pressing the power button to turn it on. However, some conditions seem to make the crashes more likely:

1. Loading or scrolling certain webpages.
2. Uploading attachments to Gmail.
3. Playing a Flash-based game, either online or offline.
4. Transcoding certain videos.
5. Running nightly backups to the external hard drive.
6. Scrolling certain PDF files.
7. Streaming video using PS3 Media Server.
8. Playing Assassin's Creed 2. Other games seldom crash.


Here's a list of the various types of resets and freezes that happen:

1. The computer resets without warning, as if I had pushed the reset button on the case.
2. The picture freezes, sound stops, and all USB devices lose power. I'm forced to push the reset button.
3. The picture freezes, all USB devices lose power, and a continuous, high-pitched tone comes out of the speakers. I'm forced to push the reset button.
4. The monitor loses input, sound stops, all USB devices lose power, a fan inside the case speeds up dramatically (very loud), and a red LED lights up on the motherboard. I'm forced to push the reset button.


Also, on one occasion, the boot-up process halted with an error message stating that overclocking had failed. However, I've never overclocked anything. The error message gave me two options: try to boot Windows anyway, or reset the BIOS settings. If I tried to boot Windows, the computer would reset before getting to the log-in screen. Since I was afraid to reset the BIOS settings, I tried this a few more times. It eventually stopped trying to boot at all, instead going straight into ASUS ExpressGate. I took the computer to the store where I bought it, and they somehow "fixed" it by reformatting the hard drive. The crashes still occurred, but at least I could boot into Windows.

I've taken the computer to the store several times, hoping that the technicians could diagnose the problem. However, they were unable to reproduce the crashes. Luckily, last month, I downloaded a program called CPU Stability Test. It's very uninformative, but it was able to cause a crash in less than a minute. So, I took it to the store again, and they could now reproduce the crash, though it took them over two minutes. They eventually determined that the BIOS needed to be updated. After that, CPU Stability Test no longer caused crashes. Unfortunately, after taking the computer home, I realized that the crashes hadn't been eliminated entirely. They merely became less frequent. Now I have no way of reliably reproducing a crash.

Last weekend, I got the "overclocking failed" error again. However, this time, I reset the BIOS settings. To my surprise, they were exactly the same as I remembered them! It's as if the error message was caused by a spontaneous change in the BIOS settings. Regardless, when I booted into Windows, my graphics drivers were no longer installed. I tried rebooting, but after several freezes during the boot-up process (before Windows), I decided to take the computer to the store again. Now the technicians have each individual component running tests in different machines, but I'm still concerned that they won't be able to pinpoint the problem. How can they RMA a component if they aren't certain it's defective? I'm hoping that someone here will have some insight that I can relay to the technicians.

Here's a list of the things that the technicians and I have done to narrow down the source of the problems:

1. I keep Windows and drivers up-to-date, and run daily virus scans with Windows Defender in full scan mode.
2. I turned off Windows' automatic restart function, but I still never see any BSOD's or errors in the event log.
3. Stress tests (like Prime95) have run overnight without causing crashes or errors.
4. The BIOS was updated, as mentioned above.
5. The crashes have occurred in safe mode and diagnostic startup.
6. The crashes have occurred even with a different internal hard drive installed.
7. The crashes have occurred even with a different video card installed.
8. The technicians tested the power supply, but they didn't detect any problems.
9. The crashes have occurred regardless of what peripherals are connected. However, they are less frequent when less devices are connected.
10. The crashes have occurred at different physical locations, using different power cords.
11. The crashes have occurred with "looser" memory settings, such as 1066MHz and 9-9-9-24.


Thank you,
Steohawk
a c 113 B Homebuilt system
December 31, 2010 4:51:29 AM

A message saying failed OC is not unusual and is an indicator of a hardware issue, only.

Your system ate your GPU driver? That sounds like file corruption due to crashes.

Your techies may not know everything, but I'm glad they remain helpful. This is one of those horrible loose-money-and-time problems for sure.

I'm concerned about the variable frequency of issues between your home and the shop.

Poor power from the wall, exacerbated by a faulty PSU, could account for that. Examining the voltages the PSU is producing while it is both under load and at idle might uncover some issue. In that scenario, problems would magically disappear at the shop. I have seen this more than once. Just because it's a high-quality PSU does not make it exempt from issues.

Or you could have one of those more subtle issues with CPU or MB. Probably not CPU, possibly MB. I/O related.
m
0
l
Related resources
December 31, 2010 4:05:42 PM

Thank you both for your replies.

The problem never completely disappeared at the store. It only became less likely to crash, but it still did. That is, until they updated the BIOS. However, they did test the power supply, and it seemed OK, though I have no idea how certain they can be. Of course, I can't imagine why the power would have trouble when looking at a PDF, but work flawlessly in Black Ops. For that matter, I can't imagine why ANY component would exhibit such strange patterns. If it weren't for the fact that it sometimes crashes before Windows even begins to load, not to mention the "overclocking failed" error, I would suspect software/driver issues. So far, my best guess is a defective motherboard, simply because it's the one thing that can't be easily tested. If only the technicians could find proof, so that they could convince ASUS to send a replacement.
m
0
l
a c 113 B Homebuilt system
December 31, 2010 10:14:35 PM

Weird, almost unexplainable issues are often traced to power. If a DMM shows the voltages within acceptable range then it's probably time to move on.

There are other variables than load. Air temps can effect the PSU output and all the electronics in your computer. Time of day can effect the power at the wall.

I would guess the MB as well, though.
m
0
l
January 2, 2011 9:12:40 PM

Epiphany! Maybe...

The store that currently has my computer will open tomorrow. They've been closed since Thursday, so I've had plenty of time to think. I keep wondering why Flash and nightly backups would make the computer unstable, but most games and stress tests run flawlessly. I can think of nothing that games and stress tests don't max out. Then it hit me like a ton of bricks. There may be a pattern. The computer is relatively stable when idle and at full load, but gets freaky somewhere in between.

Is there a fan on the processor that runs at different speeds depending on how much load it's under? If so, perhaps the "threshold" is too high. In other words, maybe the fan is running too slow at times when it should speed up, thus causing the processor to overheat. If so, it could even explain the crashes that occur just after turning on the computer, as I may not have given the processor enough time to cool off.

If that's not the case, then what else could cause instability at "medium" load?


Thank you.
m
0
l
a c 113 B Homebuilt system
January 2, 2011 9:39:27 PM

The power from your PSU can have different characteristics at different loads. The power your motherboard supplies to the CPU and memory will also vary depending on load.

If your board is supplying power to a part based on an inaccurate profile, this might happen. Or, if the voltage regulation in the board was faulty.

You should be monitoring your temps with Hardware Monitor although I doubt this is CPU temp related.
m
0
l
January 2, 2011 10:24:08 PM

The other people posting on the forums here probably have a huge amount more experience than I do.

That said, the first and only time I had a problem like what you describe with one of my home builds (my most recent system has been running overclocked for 3 years without a hiccup), was a bad RAM chip. No BSOD, just random crashes for no particular reason. If you have entirely ruled out the power supply, try running one ram stick at a time and see if the problem stops.

You might wait for someone else to comment on my post here before you go through the trouble, but that is the solitary experience I can share with you that resolved a similar problem.

If its not the power supply, sounds like a bad ram stick if not the mobo, either of which sound like a possibility.
m
0
l
January 2, 2011 10:43:21 PM

P.S. I would get speedfan running, and CoreTemp or whatever other reliable program you want to check all your temps.

If you have run Prime95 all night I sure hope you had some temperature monitoring software running ;) 
m
0
l
January 8, 2011 12:12:29 AM

Just an update.

A few days ago, I got the computer back. After all of the tests they ran, the technicians determined that the memory was at fault. They RMA'd the sticks and loaned me some in the meantime. It hasn't crashed since then, which is an unprecedented amount of time, so it seems that they were right. However, I still don't understand why the faulty memory passed Memtest86+ without any errors, or why most high-end games didn't trigger crashes. I guess it doesn't matter at this point. At least the problem has been solved. :) 

Thank you again for all of your replies.
m
0
l
a b B Homebuilt system
January 8, 2011 1:11:57 AM

5970!!!

Nice rig, now you can enjoy it!
m
0
l
!