GTX 960, is it dying? Can it be saved?

gamd

Reputable
Feb 19, 2018
20
0
4,510
I've been having a lot of display issues recently. I often have my display freeze and become covered in random green squares and black outlines before it resets with a message about my display driver recovering. I updated the drivers and saw no change. I have recently gotten bluescreens and seen these green squares even on the BIOS screen, which to my understanding rules out it being a software problem. These problems went away for a time when I turned on debug mode in the Nvidia control panel, as google suggested might help, but they have returned now. I've run memtestcl and it returns 0 errors, but the tests take much longer to complete than they used to. https://i.imgur.com/tc5UvB0.png. As you can see, there are several tests which take upwards of 500ms to complete when previously, none took more than 50ms. However, if I run memtestcl while I have a game running in the background, the tests are very fast. I have never experienced these display problems while a game was running or while running Furmark(though often they will occur immediately after I close the game). My temps are fine, from what I can tell: MSI Afterburner has never returned a temp over 65 C under load and it idles in the mid 20s. I dusted my computer and checked all the connections and saw no change.

There is another problem I have had essentially since I built this PC. My computer only worked properly 1 out of every 2 times I started it, alternating. If I started it, my display would have terrible stuttering and it would refuse to recognize USB ports at random. If I restarted it, these problems disappeared. If I shut it down and started it again, they would return. This was fully predictable. Due to my own laziness, I never bothered finding out why this happened. But now I notice that, on the bad restarts, I get the display problems immediately on the BIOS screen and they persist(without the display driver resetting), Afterburner does not recognize that I have a video card installed, and it will not recognize any USB connections other than mouse and keyboard(which fails to actually type randomly). Is it possible that this is actually a problem with some other part of my computer, that is also causing my video card issues? Or is that just wishful thinking in a time of overpriced GPUs that I don't want to have to pay for? Or do I just have more than one problem?

I copied the bluescreen messages and dumps here: https://pastebin.com/fhYYYTS8

Specs: Windows 7, CPU: AMD FX-6300, Motherboard: ASRock 980DE3/U3S3 (CPUSocket), GPU: 4095MB NVIDIA GeForce GTX 960 (MSI),
 
Solution
Your PSU is shot. Look at the Furmark screenshots.

Your +12v is at 11.88 at idle, which is already low considering 11.4 is the lowest you would ever want to see it at and is already at "this needs to be replaced status" if you see it that low, while your Furmark load drops it to 11.2v.

Your 3.3v is at 2.9v. 3.1v is discard tolerance. And your 5v is at 4.8v, which is not quite at the discard level of 4.75v, but it's almost there. Not good. That PSU is very, very weak.

For reference:

http://www.jonnyguru.com/modules.php?name=NDFAQs&op=FAQ_Question&ndfaq_id=28


Of course, none of those readings are using a multimeter, which is usually more accurate than a sensor value, but those sensor values are uniformly low across the board and...
What is the EXACT model number of your power supply, and how long has it been in service?

This could be a PSU, graphics card OR motherboard issue, so I wouldn't rule any of them out yet.

The fact that these issues happen OUTSIDE of Windows, pretty much completely rules out it being an OS or driver issue.
 

gamd

Reputable
Feb 19, 2018
20
0
4,510


Thermaltake Smart 650W SP650AH2NCB-A

Approximately 2 years old

also: https://i.imgur.com/ZKXnX20.png, first test is with a game open in the background, second test is literally 1 minute later with the game closed. I don't know why the time difference is so large.
 
Smart series are not very good power supplies, and certainly ANY power supply can have issues.

Please install HWinfo and take screenshots of the system voltage sensor readings at idle and if possible, under load. Probably just idle will at least tell us something since it does this with very little graphics load in pre-OS conditions.


HWmonitor, Open hardware monitor, Realtemp, CPU-Z and most of the bundled motherboard utilities are not terribly accurate. Some are actually grossly inaccurate, especially with some chipsets or specific sensors that for whatever reason they tend to not like or work well with. I've found HWinfo or CoreTemp to be the MOST accurate with the broadest range of chipsets and sensors. They are also almost religiously kept up to date.

CoreTemp is great for just CPU thermals including core temps or distance to TJmax on AMD platforms.

HWinfo is great for pretty much EVERYTHING, including CPU thermals, core loads, core temps, package temps, GPU sensors, HDD and SSD sensors, motherboard chipset and VRM sensor, all of it. Always select the "Sensors only" option when running HWinfo.

In cases where it is relevant and you are seeking help, then in order to help you, it's often necessary to SEE what's going on, in the event one of us can pick something out that seems out of place, or other indicators that just can't be communicated via a text only post. In these cases, posting an image of the HWinfo sensors or something else can be extremely helpful. That may not be the case in YOUR thread, but if it is then the information at the following link will show you how to do that:

*How to post images in Tom's hardware forums



Run HWinfo and look at system voltages and other sensor readings.

Monitoring temperatures, core speeds, voltages, clock ratios and other reported sensor data can often help to pick out an issue right off the bat. HWinfo is a good way to get that data and in my experience tends to be more accurate than some of the other utilities available. CPU-Z, GPU-Z and Core Temp all have their uses but HWinfo tends to have it all laid out in a more convenient fashion so you can usually see what one sensor is reporting while looking at another instead of having to flip through various tabs that have specific groupings.

After installation, run the utility and when asked, choose "sensors only". The other window options have some use but in most cases everything you need will be located in the sensors window. If you're taking screenshots to post for troubleshooting, it will most likely require taking three screenshots and scrolling down the sensors window between screenshots in order to capture them all.

It is most helpful if you can take a series of HWinfo screenshots at idle, after a cold boot to the desktop. Open HWinfo and wait for all of the Windows startup processes to complete. Usually about four or five minutes should be plenty. Take screenshots of all the HWinfo sensors.

Next, run something demanding like Prime95 version 26.6 or Heaven benchmark. Take another set of screenshots while either of those is running so we can see what the hardware is doing while under a load.

*Download HWinfo


For temperature monitoring only, I feel Core Temp is the most accurate and also offers a quick visual reference for core speed, load and CPU voltage:

*Download Core Temp

When it comes to temperature issues, especially if this is a build that has been running for a year or more, taking care of the basics first might save everybody involved a lot of time and frustration.

Check the CPU fan heatsink AND graphics card for dust accumulation and blow or clean out as necessary. Avoid using a vacuum if possible as vacuums are known to create static electricity that can, in some cases, zap small components.

Other areas that may benefit from a cleaning include fans, power supply internals, storage and optical drives, the motherboard surfaces and RAM. Keeping the inside of your rig clean is a high priority and should be done on a regular basis using 90 psi or lower compressed air from a compressor or compressed canned air.

Use common sense based on what PSU your compressor is set to. Don't "blast" your motherboard or hardware to pieces. Start from an adequate distance until you can judge what is enough to just get the job done. When using canned air use only short blasts moving from place to place frequently to avoid "frosting" components.
 

gamd

Reputable
Feb 19, 2018
20
0
4,510
https://i.imgur.com/lJiAyhu.png
https://i.imgur.com/E7K1tlL.png
First is idle. Second is with furmark 1080 preset running. Unfortunately I wasn't capable of doing this off of a cold boot because my computer is totally unusable since it will not recognize keyboard inputs and the display is garbled, so this is off the first successful boot after a power cycle.

It's worth noting that I've seen visual glitching during BIOS but saw none for the entirety of the Furmark test, which seems to suggest that it's not related to being under load.
 
Your PSU is shot. Look at the Furmark screenshots.

Your +12v is at 11.88 at idle, which is already low considering 11.4 is the lowest you would ever want to see it at and is already at "this needs to be replaced status" if you see it that low, while your Furmark load drops it to 11.2v.

Your 3.3v is at 2.9v. 3.1v is discard tolerance. And your 5v is at 4.8v, which is not quite at the discard level of 4.75v, but it's almost there. Not good. That PSU is very, very weak.

For reference:

http://www.jonnyguru.com/modules.php?name=NDFAQs&op=FAQ_Question&ndfaq_id=28


Of course, none of those readings are using a multimeter, which is usually more accurate than a sensor value, but those sensor values are uniformly low across the board and are probably accurate enough to assume a faulty or failing unit.

Does this mean it is your ONLY problem, no, in fact, often when a PSU has failed there is collateral damage to other devices, especially if that unit has poor or failed ripple control. That will take a motherboard or graphics card out in short order from overheating conditions in the capacitors. Other things can be affected by a bad PSU as well, like storage drives and memory.

I'd replace the PSU with a quality unit, first, because even if something else is wrong you need a new PSU anyhow. And it MIGHT only be the PSU.

I don't know what country you reside in, and I know that sometimes it's hard to come by good units in some regions, but when possible, when it comes time to get that PSU, I'd stick to the following if you can.

Seasonic. Just about anything made by Seasonic is good quality for the most part. There are really no bad Seasonic units and only a very few that are even somewhat mediocre.

Corsair. The CX and CXm units are ok as a budget option, but I do not recommend pairing them with gaming cards. The newer 2017 models of CX and CXm are better than the older ones, so if it specifically says 2017 model, then it's likely at least better than those older ones. Aside from that, any of the TX, RMx, RMi, HX, HXi, AX or AXi units are good. Those are listed from best to worst, with the best being the AX and AXi units.

Antec. True power classic, High current gamer, Edge, High current Pro, Earthwatts Pro Gold and Earthwatts platinum models.

Super Flower. They are like Seasonic and they make power supplies for a variety of other companies, like EVGA. Super Flower units are usually pretty good. I'd stick to the Leadex, Leadex II and Golden Green models.

EVGA. They have good and bad. Bad are the W1, N1, B1, B3 and G1 NEX models. Good models are the B2, G2, G2L, G3, GQ, P2 and T2 models.

FSP. They used to be very mediocre, and are a PSU manufacturer like Seasonic and Super Flower, although not as well trusted based on historical performance. Currently the FSP Hydro G and Hydro X units are pretty good.

I would avoid Thermaltake and Cooler Master. They do have a few good units, but most of the models they sell are either poor or mediocre, and the ones they have that ARE good are usually way overpriced.

Beyond that, there is a pretty good basic guideline available at the following link, although it has not been updated with newer models in about a year.

http://www.tomshardware.com/forum/id-2547993/psu-tier-list.html


And most of the models I have linked to the reviews of at the following link are exemplary.

http://www.tomshardware.com/forum/id-3612443/power-supply-discussion-thread.html
 
Solution

TRENDING THREADS