M2 nvme SSD Stopping working/disappearing

Jun 24, 2018
1
0
10
Hello everyone, I've got a bit of a problem that I could use some help with. I think it is a problem with my Samsung 950 PRO nvme SSD, although it could be a motherboard problem (MSI X99-A SLI Krait).

From a cold boot my PC will boot into Windows 10 with no problems. However, after an indeterminate length of time, seconds to a few hours the computer will stop responding. Not crash immediately but sit there no accepting any inputs. I've got some activity monitors (HW64info) running and I can see that processor activity is changing, there is some network activity going on and the mouse is still moving the cursor, although clicking on anything does nothing. It is probably worth mentioning that ctrl-alt-del does nothing and there is no hard disc activity at all.

On rebooting the computer goes straight to bios and can't see the SSD. If you turn it off and leave it for at least 10 minutes it will find the SSD and boot into windows 10 again. Has anyone come across something like this before? Do I have a broken SSD or a dodgy M2 slot?

That is the outline of the problem and I'll put some more detail below.


CPU: intel 5820K running at stock clocks
RAM: 16 GB (4x4) DDR4 Corsair LPX 2666MHz
Mobo: MSI X99-A SLI Krait (latest N93 bios)
SSD: Samsung 950 PRO M2 NVMe 512 GB (latest drivers and firmware)
Video Card: MSI Armor 1070
PSU: 650W (can't remember the brand exactly but a good one)
OS: Windows 10 64 Bit (1709)

The system was running with no problems at all for ~2 years. Then the meltdown and spectre "fixes" appeared. My computer began to restart at random, although occasionally be stable for a long time and I could turn it off when I wanted to stop using it. Occasionally on the reboot it would give the message:

"EFI Shell version 2.40 [5.9]
Current running mode 1.1.2
map: Cannot find required map name.

Press ESC in 1 seconds to skip startup.nsh, any other key to continue.
Shell>"

I think it meant it can't find the SSD but I'm not sure.

What I was also seeing around this time was a lot of WHEA errors all of type 17, which I eventually chased back to the device ID 2f03 - Name: Haswell-E PCI Express Root Port 1. Sometimes I would see none and the PC was fine, didn't crash. However, when they went up quickly the computer would crash and restart. The most I saw before crashing was ~ 8,000,00 in 30 mins, but if I saw the errors go up by several hundred or thousand a second,a crash was pretty much inevitable. This could happen while playing games or just doing some light browsing.

Updating the BIOS fixed this, I no longer see any WHEA errors and I thought all was good. Anyway, I'm crashing randomly, the SSD looks like the culprit and it is getting more frequent. The drive's health looks good via HWinfo64 and Samsung's magician software (9 TB written, drive health 100%). I've checked my RAM for errors (100% fine) and all of my drivers, firmware etc. are the latest stable ones (no beta versions).

My current thinking is that it is either the SSD or the motherboard, perhaps the M2 slot or something related to that, which is the issue. However the rest of the motherboard seems to be acting fine, so I'm less suspicious of that. Sadly, I've not got another M2 drive or spare motherboard with an M2 slot so can't swap things around and test bits in that way.

I'm going to talk to Samsung next week but their helpline is currently closed

I've tried to add as much detail as possible but if you need any more information I will gladly provide it. Any insights you have would be greatly appreciated.
 
Jul 25, 2018
1
0
10
I have the exact same issue.
Samsung 960 evo M.2 disc disappering either during boot or 5 first minutes of machines uptime.

I tried to boot ubuntu from an usb drive, and same thing happened here. The disk(or controller) stops and throws alot of errors in logfiles.

Then I found a old nuc, and have now tested the disk there. No error.

I guess my next step will be to test another m.2 disk in the first machine.

Did you find any solution?