Why Does My System Keep Crashing Randomly? Do I Have a Problem With My CPU?

Zelda__64

Reputable
Jul 31, 2014
4
0
4,510
I have a hardware problem but I'm uncertain of the cause. I believe that my CPU is faulty but I'm not sure. Please help me determine the cause of my hardware problem.

My computer has crashed many times in the last 3 weeks and was progressively getting worse, it crashes up to 5 times a day (I had college finals and couldn't really deal with it until now). 2/3 of the time that it crashed I was watching full screen video (with VLC or steaming a video on the internet) but it has also crashed while I was doing other tasks. I know that this problem is not a software issue, I have tried both Windows 7 and Ubuntu 14.04 LTS (which are on separate hard drives) and experienced random system crashes on both OS (Windows reports "Blue Screen (BSOD)" and Linux reports "kernel panic: Unable to sync") although Windows has crashed more easily than Ubuntu.

If it's not my CPU, then my next best guesses in order of likelihood are: motherboard, power supply, GPU, or some combination thereof.

Tests & Inspections Performed:
Memtest: 43 hours, 0 errors
mPrime "Short FFT's": 16 hours, 0 errors
mPrime "In Place Large FFT's": ~15 tests, 10-30 mins before a "Fatal Error"- Rounding Error
mPrime "Blend": ~5 tests, 10-30 mins before a "Fatal Error"- Rounding Error
CPU Core Temp Idle: 42 degrees Celsius
CPU Surface Temp Idle: 45 degrees Celsius
CPU Core Temp Load (mPrime @ 30 mins): 46 degrees Celsius
CPU Surface Temp Load (mPrime @ 30 mins): 51 degrees Celsius
GPU Temp: 68 degrees Celsius
Airflow: Case and components are clean and unobstructed
Visually inspect MOBO: No ruptured capacitors, rust or other signs of damage.

My System (Built in 2009, All parts were brand new except the GPU):
CPU: AMD Phenom II x2 550 3.1GHz Dual core
TIM: Gelid GC-2 TC-GC-02-A) 3.8 W/mK
Heat Sink: CoolerMaster Evo 212
DRAM: Corsair CM2X2048-8500C5C__2x2GB DDR2 1066MHz PC2 6400 5-6-6-16 2T 2.1 Volts
MOBO: Gigabyte GA-MA790GP-UD4H
Chipset: AMD 790GX
SouthBridge: AMD SB700
LPCIO: ITE IT8718
Gigabyte AHCI Drivers
GPU: Radeon Sapphire HD 4870___750MHz GPU, 900MHz 1GB GDDR5, 256-bit bus, 800 SP's
PSU: Corsair TX950 __950 Watt 78A on +12V rail (Overkill for my machine, I know)
HDD (Win Boot Drive): Seagate ST31000528AS_ 1TB, SATA, 7200 rpm, 32MB Cache
HDD (Linux, Storage): 5x Seagate/Samsung HDD's__14TB, SATA, 5400-5900 RPM
Case: Antec 300 ATX Mid Case__Steel, removable air filters

* Additional Info:
1) About 3 years ago I bent a pin on my CPU when I was applying a better TIM (Gelid GC-2). I bent the pin back into place with a needle, making sure it was symmetrical and inline with all the other pins. Afterwards my computer booted fine and ran mPrime (Prime95 for Linux) for 48 hours without any errors. My machine has not given me any problems until now.

2) I had 5 case fans (4x 120mm & 1x 140mm) but about 6 months ago I removed 2 of the 120mm fans from the front because the fan cages had extremely small amounts of rust (I was going to sand & paint the fan cages for protection and put them back in my computer but school kept me real busy). Right now I only have a fan on the top, back, and side of the case. When I grabbed my case to start diagnostics, I noticed that the top-back-left corner of my case was very hot to the touch. I opened up the case and the fins & tubes of the heat sink were cool to the touch, I didn't touch the top of the base of the heat sink though. I turned up the fan speed from low to high on my top & back case fans and the top-back-left corner of the case was cool to the touch within 15 seconds.

Before opening the case to cool it down, mPrime's "In Place Large FFT's" test would produce a Fatal Error immediately, outputting a test time of 0 mins and 0 sec to the Terminal. All of the errors that I received were rounding errors. Immediately after opening the case to cool it, I was able to run mPrime's "In Place Large FFT's" test for over 3 hours but I have not been able to do so again. Now that I have cooled it off, the "In place large FFT's" test can last 10-30 mins before a "Fatal Error".

3) I throttle my CPU fan to 50% if the temp is lower than 60 degrees Celsius. I use "Speed Fan" (windows) and "FanControl" (Linux) to throttle my fan.

4) About 2 years ago I overclocked my DRAM from 800Mhz to 1066Mhz which required than I increase the voltage to my DRAM. Both my DRAM and MOBO support DDR2 1066 speed but I had to manually overclock them to actually achieve the speeds the products were marketed as.

5) My case came with a nice nylon air filter for the front case fans and I bought a filter for the side fan as well but I let them get dirtier than ever before, I usually clean the inside with compressed air about 2-3 times a year and I clean the air filters of my case every 6-8 weeks. The inside of my computer is still relatively clean but a bit dirtier than usually.

6) My processor, the "AMD Phenom II x2 550 Black Edition", is actually a quad core processor with 2 cores locked. AMD tries to make as many perfect processors as possible but if 1 or 2 cores are not functioning properly, AMD will just lock the non-functioning cores and sell the processor as is, for example a quad-core processor might be sold as a dual or triple core processor if some of the cores are faulty. I believe Intel does the same thing. This shows that there is variability in the manufacturing process and thus each processor is relatively unique, meaning buyers don't know the true quality of their processor. Maybe my processor was worse than the mean (average) level of quality.

7) I have repaired about 20 computers for my friends and family since 2008 which was when I started to learn about computers. I'm familiar with diagnostics, I'm fairly certain my CPU is the problem but I want to be sure before I buy any new parts for my old PC. I'm going to build a new PC soon but my current machine is still very capable and I want to keep it running. I have "fixed" 2 of my friends computers by just cleaning them very well inside and out and nothing else, I know I messed up by letting my computer get dirty. After I discovered the problem, cleaning my computer didn't stop it from crashing randomly.
 

Zelda__64

Reputable
Jul 31, 2014
4
0
4,510

gijoe50000

Distinguished
May 27, 2013
170
3
18,715
Not sure if this will help but I recently solved a freezing/crashing problem by raising the cpu voltage.. In my case I raised to +0.25V and the crashes were less frequent and then to +0.5V and all was good. Might be worth a shot..
 

Zelda__64

Reputable
Jul 31, 2014
4
0
4,510
I solved my problem yesterday. I returned my DRAM settings back to default (See my original post "additional info 4") and afterwards I was able to run Prime95 for over 16 hours without any errors. I then overclocked my DRAM again and Prime95 crashed within 10 mins with the same rounding error so it was definitely the overclocked DRAM that was causing the errors. A couple of weeks ago I removed the side panel to my computer case, stopped going full screen as much and performed frequent restarts of my web browser and the crashes were reduced to once every few days. My best guess is that my DRAM was overheating and/or getting old because I stability tested my DRAM overclock when I originally performed the overclock.