Odd driver BSOD issue, nvlddmkm.sys

wtjwillis

Honorable
Aug 22, 2012
17
0
10,510
Guys, I'm at wits end. I have an issue thats been on going for a couple weeks now.

Anyone looking for a challenge? Well here it is. Read on.

So, I have a mining system that's running this exact same setup, even hardware, as another machine. Yet this new setup is crashing after I run the mining software with an Blue Screen of Death (nvlddmkm.sys) error. A Video TDR failure it says.

Here are the specs:
Biostar TB85 mobo
128gb SSD
4GB ram
Intel G1840 processor
5 GTX 1070's
Corsair HX1200i

Like I said, I have this same setup working perfectly on another machine, this one not so much.
At first I thought it was the AX1500i PSU I had failing(Which I think it failed anyway as it showed no power on the 5V and 3.3V lines) so I replaced it with the HX1200i one. Next I thought maybe it was the motherboard,as I was originally using a Asrock h81 pro btc rev 2.0 board. Nope, still getting the crash.

There isn't much more I know what to do. I'm running minimal programs to cut down on third party conflict. I am only using my vpn, Corsair link to manage the PSU, and Gigabytes graphics card tool. Thats it. Same exact programs on the working machine I might add.

What I've tried
I've tried this with my cards both overclocked and not overclocked. They're not overheating either. I did however noticed something odd with the Corsair link Program. When I clicked on configure to see the current power settings, every time it would trip the BSOD nvlddmkm.sys crash.

I've reinstalled windows 10, even 8.1 countless times. I even used a system image from the working machine, with no luck. I have reinstalled the graphics drivers countless times, using both the nvidia website and even tried letting Windows install them no. Still no luck. The Bios has the latest firmware from what I can tell.

I've contacted Nvidia support and they're about as helpful as a 3-legged dog eating air. I don't know what else to do folks, I could sure use some help on this.

Thanks
 
Did you try removing all those GPU's, and run just one, maybe two? Do that. Once you can see it's stable, add another, then another, until you get the crash again. Run some stress tests like AIDA64 and Unigine Heaven. Maybe avoid the Corsair Link software if it causes crashes.
 

wtjwillis

Honorable
Aug 22, 2012
17
0
10,510


It runs 4 fine, its when number 5 gets installed, and I run the mining software, that I see the error. But this issue has been really weird too. I normally can narrow issues down well, but not this one. Especially with how broad that error is.

I mean, I sort of have an idea that its the graphics cards, but i'm heavily leaning that this is a driver issue somewhere along the lines. I have the Corsair link software uninstalled for now. Clean version of Windows running currently.

 
Is it just one graphics card that causes the problem? Let's say we have GPU 1-5. You plug in GPU 1-4, and no issues. You plug in GPU 5, and you get crashes. Now, what happens if you remove one of GPU 1-4, and let GPU 5 stay in? Do you still get the crashes? Basically what I wanna know is if it's one specific card causing problems, or just the fact that 5 GPU's are plugged into the PC causes it to crash. If it's just one card, the card could be defective too. If it's the other reason, then maybe there's a hardware limitation, or even a driver limitation, to the no. of GPUs you can throw into the system.
 

wtjwillis

Honorable
Aug 22, 2012
17
0
10,510
I'm not sure honestly. Thing is, they all work by themselves. I feel like one might be failing possibly, but then I stop thinking that when they work individually.

I will test this right now and let you know shortly.

Again though, the hardware is capable of this setup, as I have the same setup working perfectly on another system.

UPDATE
So, I just got done testing 4 at a time, unplugging one card each restart. So 5 cycles in total. All worked

One thing i've not done is check the logs on those crashes I get. Would any sort of log be available to tell me what is potentially failing on that error?
 


You can usually find them in the Event Viewer in Windows 10, but it's pretty hard to isolate the issue there. The best way to find the issue to filter the events, and show only 'Critical' events. That should limit it down to crashes and a few other things,
 

wtjwillis

Honorable
Aug 22, 2012
17
0
10,510
Did some more troubleshooting this morning and I moved the 5th card from the last pci x1 slot on the motherboard to the 2nd slot which is a pciex16 slot. Minus one hiccup on the startup of reinstalling the driver for that card(cause windows said I had to reinstall it for that card), it has been running fine. I reinstalled Corsair link and Gigabyte graphics card program, Im overclocking the crap out of all the cards and they are running just fine. Nothing I did this time , except move the card to the x16 slot, is different than anything I've tried to do on this system in that past week to get it to work. That puzzles me.

What makes me mad is the system seems to be running stable now with no crashes and I still don't know what was causing that nvlddmkm.sys error.
 
Feb 16, 2018
1
0
10


I'm having the same issue too :(

Do you know what is the manufacter of your GPU's memory card?

My rig infos:
OS: Windows 10 64bits Basic (not activated)
GPU: GTX 1060 3GB (Hynix) - x6
RAM: 8GB
CPU: Intel Celeron