Before I RMA my CPU I wonder....

Infernalbird

Honorable
Dec 26, 2013
22
0
10,510
I come here hoping for some help or suggestions that may help shed some light on a problem i've been having for the past few weeks.

Back in mid july i went to load up a graphically demanding game and my entire PC rig lost power and shut down then immediately it restarted and went right back to the desk top and it did this quite a few times before i accepted i had a problem. I'm no guru but i know how to troubleshoot a pc thoroughly enough and started with benchmarking and temp issues.

It is a custom rig that i put together back in dec/jan and consists of a corsair 600t mid case housing a corsair h100 cooling a AMD 8350 on a ASUS sabertooth 990fx R2.0 mobo with 16g of corsair venegance 1600 displayed by a EVGA nvidia gtx 760 SC gpu powered by a corsair RM 850 80+ gold PSU i am running windows 8.1 off my samsung 830 SSD with a WD 1t for my main drive and a old 380g HDD i have from forever ago.

To record the temps i used everything i could think of OCCT/CPUID/Speccy even the asus tool and the UEFI and it never got over 55c so i ruled that out, however when i went to load heaven benchmark the system would crash so i thought it maybe the GPU and i stress test with the evga OC scanner and the GPU did not fail under full load and got up to 80c without issue. So i moved on to the RAM, i have an extra set of ram laying around the house and swapped it out and no change the issue remained. I also have a PSU voltage tester and the 24 pin and 8 pin and 5v and 12v all checked out i even double checked the UEFI and all the power was right where it should be. Then i ran memtest86 for over 8hrs and had multiple passes with no errors at all. I checked online and found some small articles that referenced software as a potential part of the problem and ultimately did a recovery and that did not solve the issue.

When i checked the Event viewer it had a critical error event id 41 task cat 63 kernel power. So, with my head in my hands and on a whim i went into the UEFI and disabled the CPU cores 5/6 & 7/8 and BOOM the issue was corrected....when the pc would reboot there was NO BSOD and so the bluescreen viewer didn't have a log to read. Yes, i did disable the automatic reboot check box in system > advanced....

Of course that sounds like it could be or should the CPU and i should immediately RMA it with AMD but i thought to ask here first, wouldn't the CPU continue to degrade if it was indeed failing physically or logically? If i could ask a question to anyone who reads this; is there possibly a fix or something i'm missing that could fix this without having to send the cpu back and be without my pc for a three week duration?
 
Next time, please organize this better. Paragraphs, bullets etc would really help.

From what I can tell there's no solid evidence that your CPU is the problem. You can easily run Prime95 to see if errors pop up. If it passes your CPU isn't the problem, though if it fails your CPU might not be still.

Unfortunately, your troubleshooting appears to involve swapping components. The easiest to swap are:
a) Graphics card (have a spare one?)
b) Power Supply

Your memory appears fine so if the above doesn't show the problem perhaps it's your motherboard but unless you find solid evidence that the CPU is defective I doubt it's the problem.
 

Infernalbird

Honorable
Dec 26, 2013
22
0
10,510
apologies for the format it was close to mid night when i wrote this out, yes i did in fact swap out the gpu with a spare gtx 550ti and the issue continued and we also have an extra cx 750m that i tried as well.

I will try out Prime and post up those results, do you think i should re enable the 4 cores before i try out the test?
 


No need to post results as it's basically pass or fail.

I would try the test with a SINGLE CORE first. If it passes jump to all four cores.

It's my understanding that running Prime95 for about 5 minutes with no cores showing errors would indicate it's not a CPU issue.

*My guess at this point is:
a) Motherboard, or
b) Software glitch.

Other test:
1) Burn an Ubuntu disc
2) Shut down PC and disable all SSD/HDD's
3) Boot to Ubuntu (run form DVD, don't install)
4) Try running various programs.

If it won't crash then it would likely indicate either:
1) Windows software issue, or
2) HDD/SSD issue (glitch drives can sometimes cause problems)

If it DOES crash (and Prime95 passed) I think it's likely a MOTHERBOARD failure.
 
Update:
The ULTIMATEBOOTCD has some useful tests. http://www.ultimatebootcd.com/

You did MEMTEST, but you can test your CPU here which bypasses any Windows or HDD failures affecting results. Some diagnostics for hard drives as well (careful not to run a destructive test).
 

Infernalbird

Honorable
Dec 26, 2013
22
0
10,510
Alright so i ran the prime95 test for a while while i took care of my son with OCCT and CPUID monitoring temp and voltage. Nothing out of the ordinary other than the temp was higher at around 61c, I did this while the entire CPU and its cores were enabled and the system did not shut down. Prime 95 had 0 errors and 0 warnings.

I'm curious why you would say it possibly wouldn't be the CPU if after making specific changes to the CPU the problem then stopped?