Graphics card isn't running correctly

stormking2010

Prominent
Jan 17, 2018
11
0
510
My GTX 765m about 2 months ago suddenly started having issues with running games properly. Games I knew it was capable of running with 60 fps would instead have 30 and could dip even further in things like company of heroes 2.

I tried numerous fixes all of which worked temporarily. I've uninstalled and reinstalled the drivers, wiped the drives, I've told the nvidia control panel to use only my nvidia card, I told it to prefer maximum performance, even telling the windows event logger to stop oddly enough also worked temporarily. It is also worth noting that it's even fixed itself with no apparent input on my part several times only to after a period of days cease to function properly again.

I'm currently at a loss and I'm in no ways an expert with a computer in advance just as a heads up. Any help would be greatly appreciated.
Thank you in advance!
 
Solution
When you first start the system, ALL of the startup routines/processes are loading, drivers are loading, the CPU is USUALLY in full load mode, and tends to run hotter than at other times. It may take a WHILE for all of those processes to finish loading and for the core temps to drop low enough that the system releases it from being throttled. Plus, since it IS being throttled, it takes a lot longer for all of that stuff to complete being loaded, so the process is further extended.

If you have access to an air compressor, or can get a can of compressed air from the store, it might be worth at least TRYING to blow out the junk through the fan intake grills on the bottom and exhaust ports on the sides of the unit. Sometimes, rarely, it...
Uninstalling and reinstalling is NOT usually enough. You need to remove ALL of the program files, folders, dynamic link libraries, drivers, registry settings and resource allocations, which is mostly NOT done when you uninstall. For this reason it is recommended that you restart the system into safe mode by pressing the shift button while clicking on "restart" on the power menu, and then run the Display driver uninstaller followed by a clean install of the latest drivers.

You will want to download both the latest driver package AND the DDU program before you get started.


How to do a complete clean install of your graphics card drivers using the DDU


After doing that, you might want to tell Windows to STOP updating the drivers for it automatically, as this has had a tendency to create problems for users with mobile discreet graphics ever since the release of Windows 10, or possibly even prior to that.
This may be why it works fine temporarily after making changes but then goes back to not working correctly again. Windows MAY be overruling your desired driver installatino package.

http://www.tomshardware.com/faq/id-2763685/stop-windows-automatically-updating-device-drivers.html


Or it could just as easily be a thermal issue as sizzling has suggested. Problems due to thermal issues on laptop GPUs is highly common and they simply do NOT have long lifespans when used for gaming. CPUs and GPUs tend to die early deaths when pushed hard in mobile devices because obviously there is a fundamental lack of cooling capability. The fans and cooling systems these are equipped with are not even remotely capable of keeping gaming hardware cool long term. If you have tended to game on that machine for long periods of time in the past, you can almost guarantee it has experienced at least some level of thermal fatigue.
 
Could be, but after so many hundreds of times of hearing "wiped this or that", "reinstalled drivers", "yeah, I ran the DDU", but they really DIDN'T, or they didn't do it while in SAFE MODE so that registry entries could be removed, it becomes the first line of attack to simply DO the DDU process and make sure this isn't a simple issue FIRST before moving on to what is usually a more complicated problem resolution. Address the basics first, then worry about more complex troubleshooting. Seems to catch a lot of "oops" crap.
 

stormking2010

Prominent
Jan 17, 2018
11
0
510

actually this is what i used to uninstall my nvidia drivers in the first place when i first went searching for solutions so i've done this already also by wipe the drives i meant i wiped my hard drive and ssd my apologies for not clarifying that
 
OK, by "wipe" do you mean that you reinstalled Windows and during the installation you chose the "Custom" option, followed by deleting ALL the existing partitions on the target drive and then installing to the unallocated, unpartitioned, unformatted drive allowing Windows to create, partition and format the drives automatically,

OR

Do you mean you just deleted/formatted the C: partition? Because there are other, hidden partitions on the OS drive in every case, and since the MBR or GPT tables determine the how and why of hardware resource allocation and basic hardware level settings not recreating the entire boot partition during the installation can have undesirable outcomes.

If you did NOT do it that way, I would suggest that you do, as follows. Be sure to disconnect ALL secondary drives before proceeding so that you do not accidentally delete a wrong partition on another drive by accident.

http://www.tomshardware.com/faq/id-3567655/clean-installation-windows.html


If you DID do it that way, exactly, then we can move on. If Windows has been clean installed according to the instructions, you've tried using the DDU to no improvement then you either have faulty hardware or a thermal problem. I'd recommend posting screenshots of ALL the HWinfo sensor readings while at both idle and a high demand load, so we can see what's going on.

In order to help you, it's often necessary to SEE what's going on, in the event one of us can pick something out that seems out of place, or other indicators that just can't be communicated via a text only post. In these cases, posting an image of the HWinfo sensors or something else can be extremely helpful. Here's how:

*How to post images in Tom's hardware forums



Run HWinfo and look at system voltages and other sensor readings.

Monitoring temperatures, core speeds, voltages, clock ratios and other reported sensor data can often help to pick out an issue right off the bat. HWinfo is a good way to get that data and in my experience tends to be more accurate than some of the other utilities available. CPU-Z, GPU-Z and Core Temp all have their uses but HWinfo tends to have it all laid out in a more convenient fashion so you can usually see what one sensor is reporting while looking at another instead of having to flip through various tabs that have specific groupings.

After installation, run the utility and when asked, choose "sensors only". The other window options have some use but in most cases everything you need will be located in the sensors window. If you're taking screenshots to post for troubleshooting, it will most likely require taking three screenshots and scrolling down the sensors window between screenshots in order to capture them all.

*Download HWinfo


For temperature monitoring only, I feel Core Temp is the most accurate and also offers a quick visual reference for core speed, load and CPU voltage:

*Download Core Temp

When it comes to temperature issues, taking care of the basics first might save everybody involved a lot of time and frustration. Check the CPU fan heatsink for dust accumulation and blow or vacuum out as necessary. Other areas that may benefit from a cleaning include fans, power supply internals, storage and optical drives, the motherboard surfaces and RAM. Keeping the inside of your rig clean is a high priority and should be done on a regular basis.
 

stormking2010

Prominent
Jan 17, 2018
11
0
510


I only used the option within windows itself to clean the drives if you believe that wasn't good enough ill move on and perform a clean installation of windows like you suggested
 

stormking2010

Prominent
Jan 17, 2018
11
0
510
So i did the wipe and nothing of any real notice changed however i did download hardware info and ran it. I'm not certain which screen you wouldve wanted for the screenshots but i pulled it from the sensors menu
Gpu_idle_Important.png

This is the gpu while it's idle (i have screenshots of the rest of the system being idle but i figured since its mostly my gpu i figured this was fine)
https://s13.postimg.org/fleevujjr/Apprently_this_is_under_stress.png
This is the gpu while its under stress
 
Ok, that's good for a start, however, what we need takes three screenshots at idle and three screenshots under a game or stress utility load, in order to capture ALL of the sensors.

You open HWinfo and then resize it's window as tall as you can from top to bottom. Take a screenshot of all the sensors that are visible. Then scroll down and get another one of the middle sensors and again for the rest of them. Usually it takes three screenshots to capture all of the sensors. Then do the same thing again with it under a load. You can post all of the screenshots here.

I can see where you'd THINK this would primarily be a GPU problem, but it might not. If the CPU is throttling due to thermal compliance issues it can, and will, easily drop half it's FPS or more trying to keep the CPU from overheating. Same can happen with the GPU as well, so we like to look at everything including memory, drives, etc., which is why you want to post all the sensor values at both idle and under load, for comparison.
 
Ok, that's good, however, a few points to check into.

CPU temp is high. For being at idle with very little CPU load (Which, by the way, did not change under "load" so I think we need to retest the load conditions using a different utility or process, which I'll outline a little further down.) the CPU temp, at between 47-55 is awfully high for an idle temp.

Idle temp for the CPU should be more like 30-40°C with max temps under 100% load being no more than a maximum of like 70-75°C when running a steady state full load utility like Prime95 version 26.6.

Also, your GPU load is mostly zero, so that's not under a load either. It's great that you're not having problems now, but I think it likely that you WILL again at some point.

I'd recommend posting two more sets of screenshots to look at.

One set runing Prime95 version 26.6, to set the CPU at 100% load and see what thermal compliance looks like then. Run it for five minutes and then take the sensor screenshots.

Prime95 version 26.6: http://windows-downloads-center.blogspot.com/2011/04/prime95-266.html



And another set running Furmark to set the GPU at 100% load.

http://www.ozone3d.net/benchmarks/fur/


This will tell us if things are actually working as they should, or if something is overheating and causing the thermal protections to kick in, which throttles performance back in order to avoid being outside thermal spec. As I mentioned earlier, it's incredibly common for laptops to not be thermally capable if they've been used a lot for extended gaming, as thermal fatigue often affects the hardware AND the cooling system, with fan motors getting weak and controllers failing to function correctly.
 

stormking2010

Prominent
Jan 17, 2018
11
0
510
Cpu_Prime_First.png

Cpu_Prime_Second.png

Cpu_prime_final.png

So this is the stress test on my cpu however while my computer wasn't performing correctly it only sat about 50 when it spiked to high temps is when it suddenly fixed itself for some reason (confuses me a bit) ill run some additional tests and grab that gpu test in the meantime though but while it wasn't working properly its temp was about 50 ish (im not sure if the change in VID has anything to do with this)
 
There is your answer. Where it says "Core #(1, 2, etc.) thermal power limited" = Yes, means it is throttling the CPU to try and keep it cool. It's not working at normal operating clock speed or voltage.

Where you see temps in RED, the CPU is WAY over thermal limits.

This is one of three things, maybe four.

Either the CPU cooler heatsink is full of dust and junk and can't dissipate enough heat, you can try and blow it out with compressed air but this usually isn't terribly effective once it's been overheating.

Two, the fan motor for the CPU heatsink is failing and it is not ramping up to full speed. Fan can be replaced but the laptop usually has to be completely disassembled in order to do it. Not for the faint of heart. I hate doing it and I've disassembled maybe 100-110 laptops over the years. The newer it is, the more of a PITA it is.

CPU or motherboard could be thermally damaged. If that's the case, nothing you do will fix it and it will either need replacement of the damaged hardware or just replace the whole thing as the hardware plus labor usually outweighs the cost of a new unit.

It's also possible that you, or somebody else, has fiddled with the CPU clock or voltage settings. Generally on laptops, except for a very few custom units, you can't overclock the CPU in the bios so some kind of Windows based utility is necessary to do that, if it's even possible.

First thing I'd do is go into control panel, open power options, see what profile it is set to. If it is set to "Performance", then click next to the performance profile where it says "change plan settings".

Next, click on "Change advanced power settings".

Double click on "Processor power management" to expand the min and max settings. Change the min setting to 5% and leave the max setting at 100%. Make sure it's the same whether for plugged in or on battery power. Also, make sure cooling profile is set to active. Save settings, exit and then open HWinfo. Scroll to the CPU thermal sensors.

Open a game and play for a while keeping an eye on thermal sensors. I would probably not run Prime95 again since you already know that you have a cooling issue. Changing those settings I just instructed you about is not going to fix that problem. More like a band aid. Allowing cores to drop to as low as 5% when there is no load on that core allows it time to cool down in between load demands. Otherwise they never really do. It's a standard setting. You can also just change the whole profile to "Balanced" and it will pretty much do the same thing.

Bottom line is, even at 100% load your CPU should never go into the red like that. Either there is thermal damage from age/faulty cooling or the cooling system is plugged up. Fans get weak, especially on laptops, since they come with fairly cheap ones to start with. Using one of those "cooling pads" won't help. Those don't do a damn thing for cooling the cores themselves. All they really do is cool off the case, maybe the drives, occasionally they might help the memory stay a little cooler, but they will NEVER cool the CPU cores at all.


I'll guarantee that when the system throttles the CPU, that is 99% likely to be the cause of your drops in FPS and performance.
 

stormking2010

Prominent
Jan 17, 2018
11
0
510
ok so it is the cpu then but that wouldnt explain why the majority of the time i get throttling is after i restart the computer itll randomly decide that its going to run slow because thats normally when it happens so is it possible that its simply throttling the cpu despite it not actually overheating? i just ran a performance test from the main multiplayer game that i play and my cpu wasn't actually overheating and i guess it should be noted as well that my fans were not running during this time period so this was without and cooling to help it
Thermal_example.png

i mostly have the issue randomly when i boot the pc and itll randomly decide thats its going to have issues usually (im not sure if this affects your judgement in anyways but ill see about having a friend of mine help me out and seeing what we can do about the heat issue from earlier)
 
When you first start the system, ALL of the startup routines/processes are loading, drivers are loading, the CPU is USUALLY in full load mode, and tends to run hotter than at other times. It may take a WHILE for all of those processes to finish loading and for the core temps to drop low enough that the system releases it from being throttled. Plus, since it IS being throttled, it takes a lot longer for all of that stuff to complete being loaded, so the process is further extended.

If you have access to an air compressor, or can get a can of compressed air from the store, it might be worth at least TRYING to blow out the junk through the fan intake grills on the bottom and exhaust ports on the sides of the unit. Sometimes, rarely, it helps. Usually it does not, but it is certainly worth trying. Better would likely be disassembling the unit, cleaning out the heatsink/fan areas, removing the heatsink and applying fresh thermal paste between the heatsink and CPU lid. Probably also the same procedure to the GPU cooler heatsink at the same time while in there.

As mentioned, this is not an EASY or SIMPLE procedure though, so unless you are willing to risk not being capable of getting it all back together correctly, or have experience doing this, it's usually best to take it to a professional to be done. If the unit is more than three years old there is always a good possibility that the thermal paste has dried up and is no longer adequately doing the job it was intended for.

Usually I go ahead and replace the cooling fan at the same time, since they are relatively inexpensive and even if they are still "working", there are no absolutes as far as whether they are still working at full RPM or whether they've weakened over time. If the system has been pushed hard, and for long periods of time, it's likely to have become weak and not be providing it's original full cooling potential.
 
Solution