Hardware Lock-Up, Nomatter what hardware.

jivix

Distinguished
May 19, 2009
14
0
18,510
For the life of me, I cannot figure this one out. I work in an IT department and participate in a helpdesk line. None of the people in my department have been able to figure my issue out.

I have a custom gaming computer, that I built myself. This computer started out as a fledgling, with a meager Pentium E2200, 2GB of ram, 160GB hard drive, ATI x1650 Pro, 450W psu. It began with windows XP SP3. Later on, I moved up to an 8800GT video card, followed by a new case and accompanying 700W psu and XFX 680i LT motherboard, larger hard drive, 4GB of ram. I moved up to a Pentium E5200 and finally now a Q9550. I now have a Sparkle GTX 260 C216 video card, 8GB of DDR3 ram, and a Gigabyte EP45T-USB3P motherboard. It resides in an Antec Nine Hundred with a newer 700W modular power supply. I moved from windows XP to windows Vista to windows 7.

I have always taken good care of this computer. I have always kept the drivers updated, got all of the newest windows updates through Windows Update, and religiously cleaned out the case every two months. I have spent hours on wire management and temperature monitoring. Also, this machine is run from several locations, with different power circuits.

Now that the background is out of the way, here is my problem.

Every so often, this machine locks up. All usb devices power off. However, my watt meter reports that the system is using 140W, which is 20W above idle. Any sound playing will loop the last 1/8th second with static introduced as well. Much more rarely I will get a bluescreen due to video hardware, of which I have never resolved. Anything on the screen becomes completely static and locked, and the system never goes into standby or hibernate (which I have it set to do). If the hard drive was even being used, any activity stops and the hard drive light turns off. There is no way to clear the state except for a restart.

I have boggled the IT team by the fact that every part in the case (including the case) has been replaced several times. I have reinstalled windows at least 15 times, even tried Ubuntu for a while, and tried every available driver for every piece of hardware in every permutation.

However, there do seem to be several conditions or at least average patterns that could possibly trigger a lock up.

Doing any 3 of the following things seem to trigger the crash:
-Performing file transfers
-Playing music
-Watching any format of video
-Playing something off a DVD/CD/Blu-Ray
-Typing something
-Browsing with any browser
-Playing any video game (Warcraft III, Diablo II, Left 4 Dead 2, EVE: Online, etc, etc.)
-Converting videos using Super
-Using any software suite (Autocad, Photoshop, Visual Studio, Azureus, OpenOffice)

Yet even more bogglingly, when I run every program I have (except games... and remember, 8GB of ram), nothing seems to happen, even if I leave it on overnight.

Also, some times nothing at all happens, and yet some other days I get a lock up every 10 minutes, or as rarely as once every 12 hours.

I'd like to reiterate from the computers' history: All drives, motherboard, cpu, memory, psu, case, monitor, keyboard, mouse, speakers, card reader, and wireless card have all been replaced over long periods of time. I do not know if this could possibly be caused somehow by my house somehow, but in this time span I have even moved several times. I cannot get the lock ups to stop, and I would really appreciate a fresh angle or idea that I could try.

Oh, and I've never had a component besides the 8800GT get above 80C.

Full parts history (in order of date acquired)

CPU: E2200 @ 2.8Ghz, E5200 @ 3.15Ghz, Q9550 @ stock
Memory: OCZ 2GB 800mhz DDR2 (2x1GB), OCZ 4GB 800mhz DDR2 (2x2GB), OCZ 1333Mhz DDR3 (4x2GB)
PSU: 450W OEM psu, 700W Ultra XVS modular psu, 700W Raidmax modular psu 80+ spec.
GPU: ATI x1650, Nvidia 8600GT, EVGA 8800GT Superclocked, Sparkle GTX 260 C216
Case: (Some shitty case), Ultra Grid ATX, Antec Nine Hundred
Keyboards: (have tried all of them) Razer Tarantula, OCZ Elixir, Logitech 967740-0403.
Mice: Logitech standard optical, Dell ball mouse, Logitech Wireless mouse.
Hard Drives: (all Western Digital) 160GB, 250GB, 500GB Caviar Green, 750GB Caviar Green, 750GB Caviar Black
(Currently using both the 750GB drives with the Caviar Black as boot drive. OS has a 200GB partition.)
Optical Drives: Lite-On 16x DVD-ROM drive, Pioneer 16x DVD-RW drive, Lite-On 16x DVD+/-RW drive, Lite-On Blue-Ray drive
Monitors: 2 different Dell 1280x1024 monitors, currently an Acer 23" 1080p monitor. (the H233h)


EDIT: After reading other peoples' lock-up problems (which sound remarkably like my own), I have confirmed that any solution they have does not work on my machine. The ram I have should run at 1.65V, my motherboard tried to run them at 1.6V but lower timing but I set them manually to stock settings and the voltage is set to 1.66V because the Gigabyte board boosts voltage in even increments.

The RAM runs at about 33C, which is the highest case temp except for the GPU and CPU. GPU is generally idling at about 35-38C. The Q9550 is 30C-35C. This is in a 21-22C room. Hard drives are about 24-26C.

I have checked rail voltages with a multimeter and SATA and molex rails are all right on voltage target whether under load or idle.


Thanks everyone, I'm looking forwards to some ideas to try out.
 
Solution

Motherboard. It looks as if everything else has been changed.

JimRaynor56

Distinguished
May 15, 2010
12
0
18,520
Has this problem been occurring every since you first built the computer, or just since the most recent rebuild?

I have a few things that I would like you to try. First is to open the Windows Reliability and Performance Monitor. Start -> type reliability -> 'View reliability history'. Find a day when the lockups occurred and look for any critical events or Windows failures. Post them here.

After you've done that run memtest86+ and report any errors. If you get any try setting your RAM voltage and timings to AUTO in the BIOS and try again. If this eliminates your errors then your memory is probably fine. (Let it run at least 2 passes)
You can download memtest here

If everything comes back clean then download OCCT and prime95 32-bit 64-bit.
Run the GPU: OCCT test in OCCT and then run the GPU:Memtest. After that run prime95. See if any of them throw any errors.

If you have no errors try running a HDD diagnostic. Download here.


All things considered this sounds very similar to the issue I was having, which as far as my testing could prove was due to a faulty VGA card. You could try putting another card in and see if it solves your issues.
 

jivix

Distinguished
May 19, 2009
14
0
18,510
This problem has occurred since the very first time I built the computer. It has increased in frequency in the last build with the Gigabyte motherboard.

The lock-ups do not show up in the reliability monitor in Windows 7. I assume that when a lock-up happens, all critical system functions also lock up. When I reboot, Windows asks me if I want to boot into repair console but when I continue to boot, it works fine.

I have run Memtest 3.5 and Memtest 86+ 4.0 overnight and each one has been clean for every stick of ram that has ever entered this computer. There was at least 10 passes for my older DDR2 ram, and at least 3 passes for my newer DDR3 ram. I guess it takes longer to run if there are more sticks installed.

While my older 680i LT board recognized my DDR2 memory timings just dandy, the new Gigabyte board could not set timings automatically (and be correct), so it would boot windows and then crash immediately. When I set the timings to memory standards, it worked fine.

As for other stability testing programs, here is what I have run (for every iteration of this machine)
-Coretemp 32/64
-HDtach
-Speedfan
-Crystal Disk Info
-Prime95
-Linpack 32/64
-Rthdribl
-CPUstabtest
-EVGA Precision (only when I have Nvidia cards)

I also use CPU-Z and GPU-Z for getting information.


As for faulty video cards, I do find it difficult to believe that my system has been crashing from 4 different video cards over the years. To reiterate: GTX 260, 8800GT, 8600GT, and x1650 Pro. For a very brief time before the x1650 was delivered I may have also used an Nvidia FX 5500 that produced the same issue.

The 8800GT is now in another computer and the 8600GT is in my girlfriends' computer, and neither ever crash like my main computer. The x1650 is in a friends' computer, but he hardly uses it so it's not a good reference for stability.

Thanks for the ideas though, I am trying to go over and over and see if there may be a small crack in my planning that I haven't seen, or some error that could slip through. I am also looking at power usage constantly to see if perhaps power usage spikes right before the crash or something.
 

Motherboard. It looks as if everything else has been changed.
 
Solution

jivix

Distinguished
May 19, 2009
14
0
18,510
Unfortunately, I have replaced the motherboard several times. I started out with an Nvidia 630i, moved up to a 680i LT, and now have a Gigabyte EP45T-USB3. The problem occurred with all three. I wonder if it could be contagious between hardware? Perhaps a faulty piece in the original build caused a power supply rail to burn out, which caused something on the motherboards to fail. Then, when I replaced the PSU, a short or something could cause one of the lines on the 24-pin power to fail. I am afraid to try to meter the 24-pin while the system is turned on, so as of now I am stuck.
 

jivix

Distinguished
May 19, 2009
14
0
18,510
I will consider getting different ram, as my board can handle supposedly up to 2200mhz memory clock anyways.

I did try running the board with fewer ram modules. In fact, I tried 2GB, 4GB, and 6GB.

Actually, I have once again scrutinized my OS because that is one of the few things I haven't changed much, seeing as it seems unlikely to be the culprit (especially because I've used every version of windows after 2000 on this).

I have had my startup entries cleaned out on every OS install I've had, but there are always several remaining components that are critical. Here are my current startup entries:
1. Windows Sidebar (sidebar.exe)
2. EVGA Precision (EVGAPrecisionWrapper.exe)
3. Realtek Audio Utility (RavCpl64.exe -s)
4. Kernel & Hardware Abstraction Layer (KHALMNPR.EXE)
5. *disabled* Logitech Wingman Profiler (LWEMon.exe /noui)
6. Stardock Objectdock (ObjectDock.exe
7. AVG Free Edition

I have tested the system with #1, 2, 3, 4, 5, and 6 disabled.

However, I do not recall testing the system without my primary antivirus disabled.
Now, you have to completely uninstall AVG to get it to stfu, because it has several services that make sure it definitely starts up. Also, you can not exit it manually like you could with older versions. THIS IS COMPLETELY STUPID AVG! Put your goddamn exit button back on the tray icon!!!


I have uninstalled AVG, and installed Avast! It has been about 4 hours of uptime with several trigger programs running, but I have not had a crash since. I find it very difficult to believe that my antivirus program was causing all of this, so I will not be satisfied until I have not received a crash for a week... I do believe that a crappy application could mess up any Windows OS though.

However, this does not explain why the system still crashed with Ubuntu, under the exact same circumstances... Perhaps it was just a series of bad coincidences?
 

jivix

Distinguished
May 19, 2009
14
0
18,510
Over 12 hours of uptime now with no issues. As of right now, AVG Free 9.0 was the cause of my random crash issues for the last 6 years. Goddammit.

Note: I switched to Avast!, as it has some of the best virus discovery rates in the free antivirus business. It needs more configuration than AVG, but at least it has the options.

Note 2: This is just as bad as the firewall I was using, Zonealarm. It would cause my network connection to be severed, requiring a hard reboot. Zonealarm was a total pain in the ass for videogames too, because you had to allow each game several times if you were trying to do an online match. Unfortunately, the pop-up function caused many of my games to hang, requiring me to press the standby button and log back in so I could approve each message. Very time consuming!
 

jivix

Distinguished
May 19, 2009
14
0
18,510
Well, AVG is definitely not causing it. After hour 16 of uptime, I got another lockup. I had 3 .rtf files open, Firefox 3.6.3, Steam, and Foobar2000 playing music.

I wish there was a god of computing I could talk to.
 

jivix

Distinguished
May 19, 2009
14
0
18,510
OK - I'm still stuck. I can't tell if it's Windows 7, an Nvidia driver problem, or something hardware or software related.

I have rolled back my graphics card driver to see if that influences the frequency of the crashes.

People from the EVGA forums with 200 series cards have been having similar problems with the newer drivers.
 

jivix

Distinguished
May 19, 2009
14
0
18,510
Alright - the problem has been gone for 12 hours, going.

Here's what I did, in case it might help anyone with a similar problem:


1. I set my memory timings below what they are rated for. My ram was 1333Mhz & 7-7-7-20 @ 1.65V. However, the motherboard reads the default as 1.5V. I have had the voltage set to 1.66V since I put the ram in, but I also lowered the rate to 1066Mhz. The timings are still 7-7-7-20.

2. Reinstalled Windows 7. I did this after changing the memory timings, which may have made a difference.


As I add all of the software that used to be on my previous Win7 build, I will keep a close watch-out for any more freezes. I sure hope the solution was this easy.