Strange PC Crash - Random

raybob95

Distinguished
Mar 7, 2009
586
0
18,990
So I have a very strange PC crash that happens usually randomly, at a nearly steady interval of once every 2 or 3 days or so. It's not a BSOD, it's literally as if time just stops for the computer. I'll be browsing a website or something and everything will just stop - the screen will continue displaying the same frozen image, the computer is no longer available over the network, etc. It's like the freeze right before a BSOD without the following blue-screen if that makes sense. If the HDD access light is on at the crash it will stay on, if it's off it will stay off, forever. When I hit the reset button it does absolutely nothing for about 5 seconds, then powers off completely and after another 5 seconds powers back on.

I do also get occasional blue screens, usually with the code 0x00000124 about 3/4 of the time. I've had 8 BSODs this year so far and I have the minidumps for all of them. Yes I am very overclocked and I know 0x00000124 means more voltage is needed, but I've tested for weeks without being overclocked or overvolted and the same problem occurs. (see specs below) The crashes have been happening for more than a year now. It's fair to note that the CPU has been at 3.2GHz+ for about 4 years and is rarely below 50% load.

The problem seems to be mostly random except that there is some predictability, but it could be coincidental - occasionally it'll happen right at the start of a graphics-intensive application, e.g. just now I started up Orbiter 2010 and right when it finished loading it froze. That's not always the case however, it's frozen before while just browsing the web, and I haven't found a correlation with CPU usage or temperature. My system idles at about 45-50C and gets up to 70-75C under load. I've run MemTest, Prime95, and FurMark numerous times and they always come up clean and don't cause a crash.

It's very frustrating for me to have this problem because I usually have 2-3 virtual machines running that take a while to get back up, and downtime for my website which I host within one of the VMs is not good. My other VM is for software development and the 3rd (which runs in VMWare as opposed to the other two which run in virtualbox) is a mock-physical PC with its own monitor, mouse, and keyboard. The freezes seem to happen much more frequently when VMWare is running. For reference of how frequently this happens, my Seagate 3TB drive which has 10,181 hours as of now reports 191 power cycles, 173 of which were unsafe shutdowns (crashes).

Here are my specs: (Built June 2009, RAM is newer)
Core i7 920 C0 @ 172x21 (3.6GHz) (1.125V in BIOS, CPU-Z reports 1.104V)
Gigabyte EX58-UD3R motherboard
16GB (4x4GB) DDR3 RAM, running underclocked at ~1032 MHz and 1.5V
EVGA GTX-275, not overclocked
Corsair 750W PSU, voltages are fine
OCZ Vertex 2 60GB System Drive
4 HDDs at 1TB, 1.5TB, 2TB, 3TB
3 Optical Drives

Hope you can help! I'll test anything anyone recommends.
 

raybob95

Distinguished
Mar 7, 2009
586
0
18,990
I do all 4 at a time, should I do one at a time? Also how long should you let each one test for?

Also now that I think about it, the problem did start around the same time that I upgraded from 12GB to 16GB...
 

crewton

Distinguished
Apr 3, 2011
1,334
0
19,460


Memtest only tests the first RAM stick. You'll know right away if you have a bad one as the errors will go crazy. 5-8 cycles and move on to the next stick is generally what I do. You might not even be able to post with the one bad stick which would also narrow it down.
 

raybob95

Distinguished
Mar 7, 2009
586
0
18,990
Ssddx - I've pretty much always had 3 VMs. The development machine isn't always running though. Like I said however it could be my imagination but it seems to happen more frequently when VMWare is running. Last night a few hours after this post I opened up minecraft in the VMWare machine, and despite starting fine the whole computer froze when I hit fullscreen. That doesn't always happen like that but I know it has in the past with that specific action.

Goblues39 - You might be right about the SSD... I originally had a Vertex I and I had it replaced under warranty five or six times before they finally gave me a Vertex II which seems to have worked right for a year... but who knows. The freezes have been happening for a long time, at least 8 months I'd say maybe a year... that would be weird if the drive was causing problems for that long but still working.

What tests can I run to confirm it? I'll try memtest one stick at a time also. I also tried upping my OC voltage from 1.125V to 1.175V and it still crashed. I'm running at standard clocks now to verify that isn't the problem.

Could it be possible then that the BSOD and the freeze are a separate issue? Maybe the seldom BSOD is because of my slightly-unstable overclock and the freeze is because of the SSD randomly refusing to give up data.

OH! Also I don't know if it helps to note but a few weeks ago I had a problem with my PC after I cleaned it of dust where it would not POST when I had my USB devices plugged in to their usual ports. I had to move them to different ports for it to POST but once it booted I moved them back and it worked fine, and it posts fine now too. Strange huh?

----- Here's the SMART data for my SSD if it means anything:

HD Tune: OCZ-VERTEX2 Health

ID Current Worst ThresholdData Status
(01) Raw Read Error Rate 106 100 50 12741146 ok
(05) Reallocated Sector Count 100 100 3 0 ok
(09) Power On Hours Count 100 100 0 11799 ok
(0C) Power Cycle Count 100 100 0 242 ok
(AB) Program Fail Count 0 0 0 0 ok
(AC) Erase Fail Block Count 0 0 0 0 ok
(AE) Unexpected Power Loss Count 0 0 0 100 ok
(B1) Wear Range Delta 0 0 0 2 ok
(B5) Program Fail Count 0 0 0 0 ok
(B6) Erase Fail Count 0 0 0 0 ok
(BB) Reported Uncorrectable Errors 100 100 0 0 ok
(C2) Temperature 30 30 0 1966110 ok
(C3) Hardware ECC Recovered 106 100 0 12741146 ok
(C4) Reallocated Event Count 100 100 0 0 ok
(E7) SSD Life Left 88 88 10 0 ok
(E9) Media Wearout Inidcator 0 0 0 54784 ok
(EA) (unknown attribute) 0 0 0 27904 ok
(F1) LifeTime Writes from Host 0 0 0 27904 ok
(F2) LifeTime Reads from Host 0 0 0 14016 ok

Health Status : ok
 
i personally have two vertex 2 80gb ssd drives that are at least 2 years old and were written to max capacity and they still work fine. personally my two have been completely rock solid.

under typical usage patterns you shouldnt wear out the nand in a ssd. unless of course you write gb upon gb of data constantly. what is more likely to fail is the controller. i've never had one fail on me but from what i've read it can happen quite suddenly and without warning unlike a hdd which normally goes through death throes.

you could try hdtune or crystaldisk to see what results you get. you could also remove your ssd and put in another spare hard drive (surely you must have one around somewhere) and boot up windows on it (without activating online of course) to see if the problem goes away. if not then just swap back to the original ssd drive when the system is off of course.

before you go do something like this hdd swap check the ram first as he listed above

you never did get back to me on your usage patterns. if you just started running 3 VM about the same time as the issues started then perhaps that is the cause. if you've always done this then perhaps it definitely is the memory or ssd causing an issue.

during your maximum normal usage routine how much ram is normally left available and not used?
what are the other stats in system monitor at this maximum normal usage? (cpu, ssd usage...)
note: not a benchmark but your own typical maximum normal usage pattern.


 

raybob95

Distinguished
Mar 7, 2009
586
0
18,990


I did get back to you, I said I've pretty much always had 3 VMs. Not that that should cause the PC to crash under any circumstances anyway unless something else is wrong.

My CPU is typically always around 20% unless someone is playing a game or something in VMWare (which is often) then sometimes it sustains around 50% usage. I hit 100% quite a lot though since I do DVD encoding frequently and other similar tasks. Yes let's say I use this computer for quite a lot. Right now I'm using 6.18GB of RAM with only my server running but with the other 2 VMs running it's usually around 11-12GB usage. I pretty much never hit 16 but I do get it up to 14 on occasion. My Server and VMWare are given 3GB RAM and the Development PC is 2GB since it runs XP.

My Hard Drives are idle most of the time... for some reason my resource monitor doesn't show me stats for individual drives and it used to show me no data at all. I use the SSD for Windows and some programs only and am using ~42GB, the 1TB Drive (WD) holds my VirtualBox HDDs and all the programs for my computer, the 1.5TB drive (Samsung) is all personal documents, the 2TB drive (Samsung) is backups of various nature, and the 3TB drive (Seagate) is pretty much empty except for some more documents and backups. I have page files on the 1.5TB and 2TB drive only.

---

I'm running underclocked now since I feel that still could be the issue, maybe I didn't give it enough time before or something.... in a week I'll let you know if it crashed or not, which it normally would.
 
i suppose i must have missed that first paragraph for whatever reason. perhaps its because i didnt have my morning caffiene fix at that time!

-------------------

if you are using 12-14gb of 16gb of ram then one stick failing can definitely cause some issues. i think the most important thing to test right now is each and every stick with memtest.

you said you are using VM with the other drives.. as far as i know VM is an addon of windows and it loads off of the os drive but you can assign a hard drive for it to use. is this correct or are does it somehow run off of the alternate hard drive itself?

what i'm getting at is that perhaps one of the other hdds is failing instead which maybe causes VM to lock up for some reason and then freeze the whole system. do a hdd test on the other drives to be sure.

i can tell you from personal experience that the large capacity drives fail quite often. i've had them fail as quick as 3 days and as long as 8 months when i used them daily. now i just keep mine for backup only and they dont get much use because i cannot trust them and i dont feel like shelling out $20 to ship them back to WD for a replacement as often as i have been since i only paid $100 EACH for them and have already spent like $40 on shipping (one time they actually paid for my shipping too them...oh how gracious). i've had them rma'd 3 times so far.
 

raybob95

Distinguished
Mar 7, 2009
586
0
18,990


My drives have all been fine... my 1TB WD Drive is original to the computer and has over 33,000 hours!

The VMs have virtual hard drives that reside in files. I know that neither my 1TB or 1.5TB drives or failing. SMART reports everything as fine and I use them for many things other than just storing the VM files so I know they're OK.

But yeah I'm gonna try the stock clocks, try memtest both overclocked and not, and then let ya know.

Thanks!

 

Dhamilton

Honorable
Nov 27, 2012
158
0
10,710
It is more likely, if it is related to the HDDs, that the problem is power draw or heat.
Individually they would all work perfect.
Did the crashes begin around the time one of the HDD's was added?
Is there a fan drawing in cool air that moves across the HDD's?
 

raybob95

Distinguished
Mar 7, 2009
586
0
18,990


I have an Antec 900 so lots of cooling, and a Corsair 750W PSU with a single GPU so plenty of power. I'm 99.99999% sure that my storage drives aren't the problem.

I'm still gonna do MemTest but can anybody think of any tests that would verify whether the SSD is/isn't the problem?
 

Dhamilton

Honorable
Nov 27, 2012
158
0
10,710
Ok good you have the cooling.
But the question still remains, did you add on one of your 4xHDDs ( not the SSD) before the crashes started?
That 3TB drive you talk about in OP, hardly any safe shutdowns. Was that the last drive added? Had the crashes started before or after its install?
You seem pretty knowledgeable so we can assume that your SATA drivers are up to date?
 

raybob95

Distinguished
Mar 7, 2009
586
0
18,990


Quite honestly I don't remember. I think they started roughly around the time that I installed the last 4GB of RAM, but that was about a year ago. You can tell I've just been ignoring it forever since it's happened 173 times lol. It might even date back to when I installed Windows 7 which was January 2012. Oh and yes the drives were added in order of capacity. I got the 1TB in June 2009, the 1.5TB in March 2010, the 2TB in Nov. 2010 and the 3TB in April 2012.

I'm just using the standard Windows 7 SATA drivers since I don't run RAID or anything. My drives run in IDE mode not AHCI anyway.

EDIT: I looked at my order history on Newegg: I started out with 3x2GB of RAM in 2009... in March 2012 I bought 2x4GB and mixed the old with new so I had 12GB. Then in April 2012 I got another 2x4GB matching set (but different heatsinks?) so I had matching new RAM and 16GB total. I don't remember what happened but I had to replace the second set of RAM shortly after because the PC wasn't recognizing it or something.

So yeah the start of the crashes coincides pretty closely with when I got the RAM and the 3TB drive at nearly the same time.

Would it help if I uploaded my minidumps or SMART data for each drive? My oldest minidump is 4/28/12 which is 11 days after the 3TB HDD and about 24 days after the RAM, but it was a different problem than my more recent minidumps indicate.

I still think the SSD could be the cause since I had such immense trouble with the Vertex Ones. I had to replace them every couple months for three years until the warranty ran out and they gave me a Vertex II.
 

raybob95

Distinguished
Mar 7, 2009
586
0
18,990
OK, some new intel.

The PC just crashed again about 3 minutes ago (at standard clocks and voltage), and once again it corrupted my Firefox session store which is extremely annoying because I usually have at least 10 tabs open. I was also in the middle of burning a DVD! My brother was playing some fullscreen flash game on the VM. I now once again have to restart my server and everything and hope it doesn't have to run chkdsk again on start. (Remember, this has happened 174 times. Not good.)

Also I looked in my email and my current Vertex II dates back to the end of January 2012, right about the same time I changed over to Windows 7. I don't remember which came first. Honestly I might have done both upgrades at the same time, installing a fresh OS on a fresh drive.

EDIT: I'm very sure the crashes started in Late April 2012 (about the time of the RAM and the 3TB Drive) because I have a TXT file in my docs reminding me to fix the new problem, and the creation date on the file is May 9th '12.

I also had a problem right around the same time where running my backup software would always cause a BSOD. I think that's what my old BSOD minidumps are from. Not sure if that's related but it doesn't happen anymore.

So once again I'm gonna try memtest which I haven't yet. I'm also still unsure as to how to test the SSD.

Thanks for all the help so far by the way and reading all my long posts.
 

raybob95

Distinguished
Mar 7, 2009
586
0
18,990
OK so MemTest didn't indicate anything.

I couldn't test every slot individually since the PC wouldn't post unless the RAM was in slot #1. I tried every stick and let each run pass to at least 33% (5 minutes) and except for a couple minutes where it randomly wouldn't post, it was fine. I took the stick that was in when it randomly decided not to post 3 times and let it run until 102%, which it did with no errors. Then I put all 4 sticks back in and did another test, it ran until 50% which took 20 minutes and indicated no errors.

So. How do I test the SSD?
 

raybob95

Distinguished
Mar 7, 2009
586
0
18,990


OK. I'm running 1.35 which makes sense since I got the drive in January '12. If CrystalDiskMark doesn't cause a crash (it's running now and hasn't so far) then maybe I'll try flashing the firmware.

On OCZ's Website in the 1.37 release notes it says:

"Issues resolved since version 1.35: Fixed blue screen on sleep mode from S3 / S4"

Maybe that could be the issue!

 

Dhamilton

Honorable
Nov 27, 2012
158
0
10,710
If you do flash it up, be absolutely sure you follow the instructions to the letter.
Flashing an SSD can destroy all data if not done properly and thus is only recommended before any major data is on there or if the drive is producing errors.
So again, please follow the instructions you get to the letter.
 

raybob95

Distinguished
Mar 7, 2009
586
0
18,990


OK so the update from 1.35 to 1.37 went well, but didn't solve the problem.

Since switching minecraft to fullscreen in VMWare is the only thing so far that I know has a tendency to make it crash, we tried that immediately after flashing and it worked the first 5 or 6 times but then the PC crashed again. I know this has to be a definitive cause because I've never had it crash within a minute of booting the PC before. It's definitely not the only cause though, just the only one I've managed to identify.

Not sure what to do now.
 

goblues39

Distinguished
Jul 1, 2008
44
0
18,540
When my first Vertex II died, it was one year almost to the day. My only idea would be to install Windows on one of your HDD's (or a clean spare) and see if you can duplicate the problems without the SSD.