Instability (reboot cycle) only while gaming - HWinfo, CCleaner provide a clue?

bostonkris34

Honorable
Mar 9, 2017
61
3
10,535
HI there, this forum has been a HUGE help to me in the past and my computer has been running great - until yesterday, when it forced itself into a reboot cycle while I was playing Overwatch. I figured it was a fan issue, but just tried playing again today, and it rebooted again. And when I say rebooted, the thing went instantly dead. Lost all power, straight to black. USB backlighting on keyboard disappeared, all noise stopped instantly.

When it came back yesterday, the fans were going insane, and like I said it got into a cycle where it would load my Windows desktop and then crash again. Eventually I was able to log in and just shut down until tonight. Tonight, after that one crash, the fans stayed silent and it's nice and cool here this time - so that's not the issue.

I also tested playing a game in Steam, to see if it was an issue specific to Blizzard or Overwatch, and I couldn't even get into the start menu before it completely died again.

I ran CCleaner and installed HWinfo, which is telling me two things:
1. UEFI boot is not present (though it is enabled in the Secure Boot section of my bios)
2. VMX is in red, which I imagine means it's missing or disabled

I'm hoping to avoid having to re-install Windows 10 (or worse yet, test/replace my mobo and/or power supply) - do you think either of those two "errors" in HWinfo could indicate the culprit?

Also, not sure if this is relevant, but before all these crashes started yesterday, I did a deep auto-clean with Advanced SystemCare. Hopefully it didn't crunch some necessary file or setting.

Here are my system specs:

MOBO - ASUS Strix z270e Gaming
GPU - ASUS Strix 1080 ti
COOLING - Corsair h100i v2
CPU - Intel i7-770k
PSU - EVGA SuperNova 650 v2
RAM - 16GB Corsair Dominator DDR4 RAM 3000
OS - Windows 10 Home (x64) Build 15063.674 (RS2)

Any thoughts? I'm playing a movie right now and so far, no crash. Seems only to be while gaming, and this has happened all of a sudden. System boots in 30 seconds and games run beautifully fast and smooth, all with cool temps, until it abruptly shuts down and restarts.

As usual, any help from the experts on this fine forum would be greatly appreciated!
 
Solution
I've got my fingers crossed for you. This is a modular supply I think. If so and if the problem returns, I think you should have some spare PCI-E power cables, it couldn't hurt to try them. Of course that is if the problem returns.

It could be a bit of oxidization on some of the contacts was causing an issue and by reseating them you are getting better contact.
I'm thinking a PSU issue as it seems to be crashing quickly even outside of games. Do you have a spare PSU to try? Or one you could borrow?

Another possibility is memory. You could download memtest and run it. Let it run like 8 hours. If the issue is PSU then it will likely crash before this.
 

bostonkris34

Honorable
Mar 9, 2017
61
3
10,535


Thanks, I'll definitely give Memtest a try - however, I updated to clarify, it is happening ONLY during games so far - which makes no sense, especially as in my latest test, the game didn't even get past the title screen. I am worried about the PSU. I'll test the memory, but is there a PSU test that you know of as well?
 
Sorrry when I seen "and I couldn't even get into the start menu..." I assumed you meant the Windows Start Menu.

You could stress you CPU fully. Something like Cinebench on the CPU stress, or Prime 95. This will isolate the GPU from the equation. If it's PSU related, it should reboot when you put a heavy load on the CPU and thus the PSU. If it's stable, then it could be GPU related. There isn't any software that specifically tests the PSU. PSU problems manifest most often during high load conditions, and much less common during very low power states. It's very difficult to isolate the PSU from the rest of the system. The best way to eliminate the PSU is using another known good PSU and if the problem is gone, then the PSU is at fault, if the problem remains then you keep troubleshooting.
 

bostonkris34

Honorable
Mar 9, 2017
61
3
10,535


Terrific and helpful info - I'll try all this. Thank you. In the meantime, do you think a virus might be possible as well? Now that we're talking hardware, re-installing Windows 10 doesn't seem so daunting :)
 

bostonkris34

Honorable
Mar 9, 2017
61
3
10,535
UPDATE: I think it has to be my GPU. As you suggested, I ram some stress tests (including memtest86 for RAM and prime95 and a few others for CPU). Everything looked great. So then I downloaded 3DMark and tried to run a benchmark test. It INSTANTLY powered down.

I'm on a fresh, clean install of Windows 10 with the correct NVidia drivers as of this weekend, so it's not a software issue.

I think I'll swap in an old graphics card and see what happens. If it doesn't shut down, I think I've got the answer. Would you agree it's most likely GPU at this point? I guess it still could be the PSU or the MOBO malfunctioning under load, but occam's razor seems to suggest faulty GPU. Hopefully still under warranty.

Thanks again.
 
If it's only unstable during GPU related task, then most it's most likely GPU related. That's not to say it couldn't be software / driver related. If you find that an old GPU works, then you might want to try the "faulty" GPU in a known stable system. It could still be power related as your 1080Ti is by far the most demanding component with respect to power usage.

I'm leaning towards the GPU, but you want to be sure, especially when it comes to warranty.
 

bostonkris34

Honorable
Mar 9, 2017
61
3
10,535
OK you are super helpful. I'm going to do exactly that. Just two other small pieces of info that may help:
1. Playing videos does not crash the system. I doubt that's an intense process for the GPU but just a little bullet point. It's only games (and 3DMark) which really tax it. Maybe I should try a simple old school games to see if the same thing happens.
2. Because the system doesn't "Crash" but rather completely loses all power in one second - I'm thinking your best guess of PSU is correct. Maybe I'll just buy a new one at MicroCenter and then return it if it doesn't fix the problem. The thing is, this PSU is a nice one and only a year old. So why would it fail? 650 I believe is sufficient? You know, I'll also just re-check all the connections, maybe one came loose somehow!

Will update you tonight hopefully if I have time to run these tests:
1. old school game
2. remove graphics card and try 3DMark
3. check all PSU connections

Thanks again!!

Kris
 

bostonkris34

Honorable
Mar 9, 2017
61
3
10,535
You're not going to believe this (or maybe you will). But I opened the case, and I didn't notice any loose connections, but I unplugged all of the cables from my PSU, and from the GPU, and reattached everything. I don't want to jinx it, so let me put in a huge "SO FAR" disclaimer, but SO FAR (three 3DMark tests and 30 minutes of Overwatch later) there have been zero issues! I'll let you know if that happy status is maintained, or if it goes back to shutting down unexpectedly. Thanks again for all your help.
 
I've got my fingers crossed for you. This is a modular supply I think. If so and if the problem returns, I think you should have some spare PCI-E power cables, it couldn't hurt to try them. Of course that is if the problem returns.

It could be a bit of oxidization on some of the contacts was causing an issue and by reseating them you are getting better contact.
 
Solution

bostonkris34

Honorable
Mar 9, 2017
61
3
10,535
Well, apparently that was too good to be true. Everything ran fine for about a week, and then I did two things - I put my case back together (only putting one side of the case on, leaving one open to vent), and put the PC inside its cabinet (I wish my desk didn't have the PC cabinet but I leave the door open so it doesn't get too hot at all). The next thing I did was update my GeForce driver with the new release that came out.

I don't know if wires got jostled again, if it's a software update issue, or if it was all just a big coincidence, but sure enough I tried to play a game and it powered down, same as before, about 2 minutes into a game. I futzed with the cables attached to the GPU again just to check. Then I played Overwatch for about an hour until it powered down (some improvement but again, probably coincidence?). And this time when it rebooted, the fans went crazy. It did that once before as well, though I really don't think it was a heat issue as temps were at like 72 degrees while gaming.

NEW SYMPTOM INFO - I noticed two new things.
1. After it powers back up, while it's rebooting, there's a lot of screen flicker (flashing to black) as if it's trying to stabilize the image from a loose connection. Then, on the black screen when the Windows logo comes up, there's some horizontal white noise lines that scatter across the bottom 1/4 of the screen.
2. Once it finished the automatic restart, I can connect to networks but have no internet access. If I then instruct it to restart, the internet works fine.

Do either of those symptoms tell you anything? (Other than my PC is cursed?)

I asked a buddy to swap computer parts this weekend. I'll run my 1080ti in his rig, and run his 1070 in mine. Hopefully that will tell me something as you were suggesting early. This is just so frustrating. I'll update you when/if I find out more. Thanks again for your help!
 
Do not discount heat as an issue. That's not to say the component isn't faulty, but increased heat could be the symptom. You don't mention which component is 72 degrees, is this the CPU or GPU? Remember it doesn't need to be either the GPU or CPU, hot PSU's can shutdown too.

I definitely don't like tower cabinets in a desk. Typically it traps air around it, so while your case is blowing the hot air out, it doesn't get a fresh supply of cool air in and just ends up sucking that hot air back in through the intake. Ambient temperatures are important to cooling efficiency. It's hard to keep your parts cool when the "cool" air is 45 - 50 degrees already. I've seen all sorts of issues both in my personal and professional life caused by restricted airflow.

I would also look at one of my previous suggestions. If you think it could be a flaky power connection, if you have unused PCI-E power cables for you PSU, replace the ones you are using now with the spares. You could have a contact or contacts that are faulty. The faulty connection could be at either end of the cable, either the PSU side or the graphics card side.

That symptom 1 really makes me think that it could be temperature related, though it doesn't necessarily narrow it down to the GPU. If the PSU is overheating or sensitive to heat, power (most likely 12V) to your GPU could be fluctuating.

The first thing I would do to test the overheating theory is to remove it from the tower alcove in your desk. Take the side off of the case as well. Try to not disturb the cabling and system as best as you can. If the system is stable after a good run, put the side back on the case. If it continues to run, then you know it's the limited airflow where you put your case that is the issue.

It should also go without saying that you need to make sure that the radiator on your AiO is be clean and so should your heatsink on your 1080Ti. If they aren't, you need to blow the dust out of them. Also if your case has filters, make sure they are blown out. This is the most commonly overlooked cause to issues like these. Most people mistakenly think that because the instability came on all of a sudden, it can't be heat related as they think it should have slowly become worst over time.

I would do all this before introducing your friends GPU to your system. In fact if your system remains unstable after the above, I would test your 1080Ti extensively in your friends computer before putting his in your system. We still haven't determined your issue and you don't want to potentially damage his GPU if say your PSU is faulty. If your GPU works perfectly in his system, then I would look to your PSU. In fact you could see if your friend would loan you his PSU (huge pain in the ass for both of you) to test your card in your system.

Everyone's approach to troubleshooting is different. One rule of thumb I stress is, if you are using someone else's components to test, do as much as you can without endangering their components. You could turn a once helpful friend into someone that doesn't talk to you.
 

bostonkris34

Honorable
Mar 9, 2017
61
3
10,535
Hmm, that's very interesting, and an easy test to perform. The thing is, I ran this same rig beautifully inside that cabinet with both sides on during the summer months... and now i do typically leave the side of the case off... but why don't i start with this test tonight instead of potentially ruining a friendship this weekend as you suggest :)

And yes, switching out the PSU sounds like such a pain. I was only monitoring the GPU heat. I'll look into how to monitor and log the PSU heat as well! Thanks - will keep you updated - you're a wealth of info on this stuff.
 
You won't be able to monitor the temps inside your PSU. There are a couple very high end PSU's that support this, but most do not.

Yeah changing out PSU's is a pain especially if you've done a meticulous job of cable management.

While I know you hope that it's not your graphics card, it's the easiest one to test. In this case if it works in your friends system, then you can be certain it's not at fault. Though to limit the time it takes to troubleshoot, it might be better if it fails in his system. This way you know right away and can start working towards a solution.
 

bostonkris34

Honorable
Mar 9, 2017
61
3
10,535
Good point. But you know, I've been thinking it probably must be the GPU. For starters, I tried your suggestion to remove it from the cabinet, and so far there have been no stability issues - but I've had such runs of good stability before and I feel like it's luck. Then I thought about these three things and think it (probably) can't be temp related:

1. Once or twice, the thing has lost all power and automatically rebooted simply by starting a game and seeing the menu. There's no way the CPU or PSU was overheating at that low load - right? It's like, as soon as it needed to call the GPU (or shortly thereafter), it died.

2. I removed the GPU and booted up the computer using on-board graphics, and that white flicker of noise on the screen below the blue Windows 10 logo during startup disappeared. I tested several times. With GPU, it's there. Without GPU, no flicker. That has to mean something bad about the GPU, right?

3. Maybe it's a driver issue? The GPU was working fine until I updated to the latest drivers (again this could be coincidence as well).

Anyway, I'll keep testing it outside of the cabinet, but so far so good (I still think it's coincidence though, especially as I gamed a lot over the past few months with it inside the cabinet and everything was fine). And tomororw my buddy is coming by to give me his old 1070, I'll test that in my rig, and hopefully he will test my 1080ti in his machine as you suggest.

Thanks again - I'm going to search around to see if that horizontal white noise for a split-second below the windows logo on the black boot screen means anything definitive.
 
You may see a flashing when Windows is booting up. This can occur when the OS is booting up. The graphics card behaves like a basic VGA adapter with features and functions that all graphics cards support (kept in the BIOS of the card), when the OS loads the driver for your actual card, flickering is common when this driver takes over. It could even become more pronounced after a driver update if something was changed that affects how the driver initializes the hardware.
 

bostonkris34

Honorable
Mar 9, 2017
61
3
10,535


OK so my buddy stopped by today with his 1070, but we really think it's a heating issue at this point. We stress tested it with a number of programs, and it passed (GPU and CPU) with flying colors - and has been stable ever since I pulled it out of the cabinet and removed the side panel. Still weird because it would shut down sometimes as soon as I started gaming (before it even had a chance to get hot) but it SEEMS like heat is the issue.

He noticed it gets really warm inside my case even with the fans cranking at max. I have a Zelman case that has two intake fans in the front (though I usually keep the front panel closed which I guess blocks the inward air flow...?) and one fan venting out in the back. It also came with two fans at the top of the case, but I removed those to put in a liquid cooler. Do you think that could be the issue?

Lastly, I had the back fan incorrectly plugged into my AIO Pump part on the motherboard. I switched it to Chasis 1 Fan - maybe that will help the system self-regulate, not sure...
 
Air circulation is important. It's odd that your case is designed so that your air intake is impeded by having the front panel closed. For instance my case has a door over the front, but my intake fan is on the bottom of the case.

As for fan arrangements I tend to like positive pressure inside my case. This means that you have more fans drawing air in than fans exhausting. It's really half dozen one, six the other, I'm not sure it makes much of a difference to cooling. It can have an affect on the amount of dust gets in your case though. Most good cases have filters on the intake fans so air drawn in by them is generally speaking, dust free. However if you have negative air pressure inside the case, it will pull air in from anywhere it can which means it can come in from any crack or vent that isn't filtered.

As for where your exhaust fan was plugged in, academically (meaning maybe not necessary) it should be in a Chassis fan header. That said the speed of the pump header is typically controlled by the CPU temperature, so it should have been speeding up and slowing down in response to temp changes in your CPU which could in affect act quicker to load changes than a Chassis fan header that could be using a sensor located on the motherboard itself.

As for putting your liquid cooler in the top fan positions, this arrangement is pretty typical. Hopefully you have the fans on this cooler so that they are blowing the hot air out the top of the case. If you have them pulling air into your case, then you might want to turn them around. This way you are pushing that heat out of your case.

If you have filters on your intake fans, make sure they are clean. Make sure you've blown the dust out of your AiO, as well as the heatsink on your graphics card. Also blow the dust out of your PSU, there are heatsinks in there for the active power components (maybe some passive components too) that can get dusty and it's not as easy to see as the rest of your components. Dust is the enemy here.

 

bostonkris34

Honorable
Mar 9, 2017
61
3
10,535
Hi techgeek, I'm at my wit's end with this thing. Dust situation and heat doesn't seem to be an issue - I've pulled the sides off my case, removed it from the cabinet, and cleaned it out. I'm still shutting down and restarting while gaming or running 3DMark. So I installed a few programs, and recently have been using MSI Afterburner to monitor temps etc. while stress testing using 3DMark. Here's what I noticed:

CPU temps spiked to 70-75 when the program first started, then quickly dropped and averaged in the 40s and 50s throughout the test. GPU temp never got past 69 degrees. Power % (not limit) frequently hit 99%, topping out at 104%. I tried running it with ECO Mode on my PSU off and on, still crashes.

It crashes sometimes 30 seconds into the stress test and other times 30 minutes in (and, a few nights ago, I let it run for 8 hours with no crashes, but every other time including gaming it does). When it does crash, it's like I hear a click or a pop and all the power disappears. A second or two passes, and it restarts. At this point, the fans (I think the intake ones) spin faster than usual but go back to silent by the time I log into Windows. It's totally stable when not under load. No BSOD which makes me think it's NOT a GPU issue after all.

Personally, I think it's the PSU, but I'm no expert. Is the Power % hitting that high normal, or a red flag?

Second and last question, assuming it is the PSU (or a loose pin or something in a cable), do you think it's a function of it being faulty, or that 650W is insufficient to power my rig? (i7-7700k, 1080 ti, 16 GB DDR4). This one is still under warranty, but if 650W is pushing it, I could eat the money on this one and go buy an 850W instead.

Obviously I'm really quite frustrated and ready to smash very expensive components. I'm got an RMA lined up and ready to go with ASUS for my 1080 ti, but I'm starting to think it really is the PSU. In which case, this weekend I'll swap it out and see if it fixes it.

Thanks once more for all your help. This thing is driving me crazy.




 
Wattage wise 650W should be enough from a good quality PSU. Do you know what model of SuperNova you have? Is it a G2, P2, etc?

In general EVGA makes pretty decent supplies, however that doesn't mean you can't get a dud.

As for going with a higher wattage supply that is up to you. According to my quick calculations (making some assumptions about the rest of your system) I come up with 480W peak, so something in the range of 550W would suffice if it's good quality. Now if you are overclocking anything, you need to go up from there. The 12V amperage is really what matters. The SuperNova G2 can supply 54A on the 12V rail which is plenty providing the total system power usage doesn't exceed the rating of the PSU.

So I'd try to RMA the PSU first. I don't know if you have to go as far as 850W, maybe 750W if you want a little more wiggle room. That said I'm riding a 4770K with a single GTX980 with a SuperNova 1000W G2, so who am I to talk. Actually I was running it originally on an old PC&C 750W Quad Silencer, but the EVGA supply was on sale and I couldn't pass it up.

Power limit on your GPU really doesn't concern me. Essentially it's the ceiling of power the card can use before it starts to down clock to stay under that limit. It's part of TurboBoost, it works alongside the thermal limit. So lets say you have a game that exceeds your set power limit (old games that pump out huge framerates are a good example), TurboBoost will step in and start lowering the GPU clock until it gets within the limit. It's really an attempt to protect the VRM's on the graphics card from heat. This is one of the sliders that get adjusted when overclocking. For instance nVidia lets my GPU run a power limit of 125%.
 

bostonkris34

Honorable
Mar 9, 2017
61
3
10,535
Techgeek you are awesome. I'm learning so much here - hopefully eventually my computer will stop shutting down soon too. To answer your question, I have a "P2" - does that mean anything one way or the other to you?

And maybe it's the same thing, but actually I'm not monitoring the Power "Limit" %, I'm just monitoring the Power % (which I believe to mean, the capacity of wattage it is currently outputting to the GPU at any given time, but that's an assumption). I did see Limit as an option to monitor, but this one was just labeled "Power, %". So in my uneducated estimation, that means the PSU was pumping out the full allotment of its wattage/voltage while 3D gaming. I am slightly overclocked using the ASUS optimizer in BIOS (nothing manual) and I think it has me at + 9%.

Also, I tested some less intensive games (specifically, One Finger Death Punch, which if you haven't played yet you really should try!) and the computer was perfectly stable with "Power, %" at a steady 30%.

Anyway, the real concerning thing here is, I have all the latest drivers and my hardware config has worked great together for 4-6 months with no hardware or software changes (other than a driver update...) before all this began happening. And since then, I wiped Windows 10 and reinstalled everything so I'd be really surprised if it's a software issue. So I don't know if something just failed on its own or what... but from your help and other threads I've read, I'm suspecting PSU for two reasons. One is, the complete power shutdown (without CPU or GPU temp spikes) instead of a BSOD. So either maybe the PSU is overheating as you said, or is faulty in some other way. The second is, a couple of times I pulled the computer out, detached all the cables from the PSU (none were loose) and reattached them - and then the compute worked great! Until, that is, I jostled it or moved it, and the problems resumed. So I"m thinking, what if it's not the PSU itself, but one of the CABLES has a loose connection inside or something like that?

That's why on the way home from work tonight I'm stopping at MicroCenter and I'll buy a nice PSU, I'll probably overspend on an 850 just because I'm traumatized now, and let you know if the problems do or don't continue. If it solves it, I can always RMA the 650 G2 and return the 850, or, I'll try the 650 with the new cables that come with the 850 and see if that really was the problem.

So, stay tuned :) Very appreciative of you investing so much time and thought into my computer problem. Really hope I'll have good news to share this weekend. I'm a few frustrations away from smashing some very expensive components and starting over haha.




 

bostonkris34

Honorable
Mar 9, 2017
61
3
10,535
SUCCESS! (So far... I don't want to jinx it.)

I picked up an EVGA 850W P2 because it was on a Black Friday sale, and hooked it up with the new cables. Everything worked great! I ran 3DMark for 2-4 hours, something like that, and played Overwatch for an hour or two with zero issues (except Overwatch froze after an update - I'm confident that was software related as the problem disappeared after a re-install).

Now I've had this kind of good luck in the past so possibly it's a coincidence, but, I took the extra step of using the NEW cables with the OLD PSU, and it crashed immediately. Then I re-hooked it up to the 850, and it worked beautifully again.

Upon closer inspection of the cables going into the 650, the 24-pin motherboard connector wouldn't fully click into place. It looks like it is fully seated, but the tab at the top that clicks in will not drop to flush, it's raised at an angle, and doesn't lock into place. I'd noticed that top tab on the connector was raised in the past, but I figured it was harmlessly bent because the pins themselves seem to be fully inserted and flush with the unit. But this was the NEW cable and it still wouldn't seat correctly in the 650. So my bet is there's a physical problem with the 650. I'm not sure how that would happen all of a sudden, but there we are, My cable management isn't great so perhaps it was pulling all this time and eventually something went bad. Or, maybe it's just a coincidence and there's something else wrong with the 650. Either way, my plan is to RMA it and sell the replacement (or, perhaps they'll just refund me since I bought another EVGA product).

Anyway, thank you again for all your help. Hopefully, this solution keeps working and everyone is happy. If it fails again I'll probably just throw the whole rig out the window :) Until then, it's a relief because moreso than not being able to game (I don't even play that much) it's just knowing there's something undiagnosed wrong with the system that is so incredibly frustrating. Fingers crossed it stays fixed. Thanks again and I hope you have a great Thanksgiving!

P.S. My next challenge is going to be getting my GPU temps down (75 while gaming, despite the side being off and computer out of cabinet) but that can wait for another day - I'd love not to think about fixing this computer for a while!
 
Hey sorry for the delayed reply, I haven't logged in in awhile.

Good to hear, and hoping since you haven't updated it since Nov 18th that it's still working well for you.

A couple things regarding you GPU temps:

First you mentioned cable management, if you have cables obstructing good airflow, you might want to have a look at this first.

Next, make sure you have the heatsink thoroughly clean. Focused blasts of canned air is usually good at removing most of the dust and cruft that builds up. Be careful with the air around your fans though. I usually try and keep the fans from spinning as a result of sustained blasts of air as it can cause the fan to spin very fast which could potentially damage the bearing in the fan(s).

Lastly look at your case fan placements, the direction of airflow, and the possibility that you may need to add fans if you have the mount spots available.

Though that all said, 75 degrees isn't that hot.