Crashing under stress

chuckthegreat16

Commendable
Jul 18, 2016
9
0
1,510
Hello everyone.

My computer has been crashing under stress/load for almost a month now.
It started about 1 1/2 weeks after playing JC3. There was no issues with the computer before, or after its installation, and seemingly came out of nowhere. No updates or programs downloaded either.

I would be playing the game, and the computer would randomly crash, always to a solid color screen, sometimes black, blue, green, light blue, white, and sometimes you could still hear audio from the game still playing. Sometimes i would have to hard reset the computer, sometimes it would restart on its own.

Then it started happening with youtube videos, first 4k videos, then even 720p mp3 videos.

I tried system restore points from prior to the issue's first occurence, that didn't work. I tried uninstalling geforce and nvidia drivers and reinstalling, no good. I switched my second gpu to the first pci slot and disabled sli, only running the one gpu that was originally in the 2nd pci slot, and that worked, for a time.

I took the first gpu into the computer store I got it from, as all my parts are still under warranty. They took it behind the counter and did stress tests, and found it to be running normally. So I put the gpu into the 2nd pci slot, uninstalled and reinstalled geforce and gpu drivers and reactivated sli.

This worked for about a week, and then the problem started happening again after I dl'd AC3.
I took the whole pc in this time and had them check it out again. They had the same issues of stress causing a shutdown, but when they tested the gpu's with a different windows 10 on a different system they operated normally.

So I just picked it up today, tried using a system restore point from december when I had freshly installed windows 10 on my boot drive, and it gave me the blue screen of death.
After this, it gave me the option of reinstalling windows 10, which was my plan anyways if the syt restore didn't work.

I reinstalled windows from scratch, and the problem STILL persists.
Is there a phantom in my cpu? A gremlin in my motherboard? What's my next step in troubleshooting, if you folks don't already know what the issue might be?

Thank you for your time
 
Solution
Personally I would prefer knowing the RAM isn't faulty by running Memtest, but that's for you to decide.

The following site suggests the code refers to a driver issue: https://windowsreport.com/fix-live-kernel-event-141-error/
At least have a look to see if there's something worth trying. I suggest the clean boot option and then play something you know typically crashes the system to see if it happens. To me it would indicate a third party driver causing the issue.

Gut feeling was a PSU issue, but with no further information on it I personally wouldn't rule it out. The fact you still experience the same issue even with a fresh installation of Windows would seem to rule out a software issue in most cases.

kiemoneiro

Great
Feb 15, 2018
38
0
60


I thought the same, but still we dont know what PSU he have
 


Agreed. We'd have to wait until that info is known to troubleshoot effectively.

 

chuckthegreat16

Commendable
Jul 18, 2016
9
0
1,510
Intel I7 4790k
16 Gb Ram
Z97 MSI Gaming 5 Motherboard
x2 Asus Gtx 970 gpus
x2 G Skill Ripjaws 8gb ram
Corsair ax1200 gold psu
x1 120gb ssd samsung boot drive
x1 500gb ssd samsung drive
 
Hm... that should be a solid PSU (Jonnyguru rated it 9.8/10). Unless it's faulty it should be able to feed the two GTX 970s. On the surface it should be fine, but something is amiss. Though I would tentatively suggest using a hardware monitor which can read voltages to see if there are any wild fluctuations from expected voltage rails (typically 3.3, 5 and 12v). Proper equipment is better obviously, but I would hope even a software-based hardware monitor can spot serious issues with the PSU.

From what we know the individual graphics cards are fine as they've been independently verified under stress conditions. They seem fine with respect to hardware. Software should be fine if the driver installations were thorough. I assume you used DDU? If not that may be worth a try just to eliminate problematic remnants of drivers.

As far as I can tell we're left with RAM, motherboard, PSU and CPU.

With RAM use Memtest86 and let it run overnight for at least 8 passes. This is to eliminate a RAM issue. If errors are identified then you may have to swap the RAM modules about and between RAM slots to identify whether it is the RAM module or the RAM slot.

Use MSI Afterburner to monitor what occurs with the PC during gaming (hopefully it'll record the data before it crashes). Check the graphs specifically for CPU usage/frequency/temps, GPU usage/frequency/temps, RAM usage/frequency, fps/frame times, page file. The hope is to see a pattern of behaviour before a crash or freeze.

Check your CPU temperatures when under stress.

Use Event Viewer and see if you can spot any logs corresponding to time of the crashes. They could provide an indicator what occurred when those crashes happened. If there was a power issue it will be identified here.

Configure Windows to create crash dump files. You don't mention what sort of blue screens (when they occur) you get. This could also help identify potential issues. You'd need to upload the files here; there are members who can read those and be able to help more than I can.

That's pretty much all that comes to mind.
 

chuckthegreat16

Commendable
Jul 18, 2016
9
0
1,510


My apologies for taking so long to respond.

As it stands, I don't believe the ram is faulty, as it was just recently replaced in nov/dec 2017. I didn't mention it earlier, but all my parts are under a 4 year warranty, up until aug 2018, and the computer store does free priority testing/diagnosing, so they found my old ram and old boot drive were faulty, and tested and replaced both.

So that leaves us with either the cpu, motherboard, or psu.
Now, I checked afterburner and its logs for during the crash period, and it shows cpu, gpu, and ram to rise significantly in usage percentage when I launch a game, but the temperatures aren't abnormal.
Under reliability monitor and event viewer, I get a 'hardware failure' file, and it says this:

Description
A problem with your hardware caused Windows to stop working correctly.

Problem signature
Problem Event Name: LiveKernelEvent
Code: 141
Parameter 1: ffffe003c48004a0
Parameter 2: fffff808604cbd90
Parameter 3: 0
Parameter 4: 272c
OS version: 10_0_16299
Service Pack: 0_0
Product: 768_1
OS Version: 10.0.16299.2.0.0.768.101
Locale ID: 4105

Now, some people online say that event code 141 might be psu?
 
Personally I would prefer knowing the RAM isn't faulty by running Memtest, but that's for you to decide.

The following site suggests the code refers to a driver issue: https://windowsreport.com/fix-live-kernel-event-141-error/
At least have a look to see if there's something worth trying. I suggest the clean boot option and then play something you know typically crashes the system to see if it happens. To me it would indicate a third party driver causing the issue.

Gut feeling was a PSU issue, but with no further information on it I personally wouldn't rule it out. The fact you still experience the same issue even with a fresh installation of Windows would seem to rule out a software issue in most cases.

 
Solution

chuckthegreat16

Commendable
Jul 18, 2016
9
0
1,510
So I took the computer back to the computer shop and had them diagnose it again. This was after I did a clean reinstall of windows twice and the problem was still there.
Turns out it was one of the GPU's after all.
I'm not sure how they missed that the first two times, but that was the problem.

Thank you everyone for your help!