Yearlong SSD stall, hasn't gone away with new drive!

robertzhowie

Prominent
Sep 30, 2017
12
0
510
Hi everyone,

So I have held off posting about this for well over a year, as I have been trying to fix it myself via the various help threads and forums. However, despite my best efforts (and buying a new drive!), I have been totally unable to fix it and am reaching out for a slightly more educated solution.

The Issue
Sometimes my computer totally freezes, and my computer becomes unusable - see below image.


Note:
- Usage goes to 100% with 0kbs disk transfer
- Ethernet goes to 0% in same period
- Some CPU usage, but drops, with a spike as "stall period" ends
>I see these happening every time.

Some important notes:

- Sometimes I can still use cached data, e.g. I can move a window round fine, and button animations still work, but actual input provides no response (e.g. I can access stuff still in RAM)

- I can sometimes reproduce the issue by hammering the disk with a program. Usually zooming in and out rapidly on satellite mode on google maps does it after a couple of minutes. Although it must be noted that it can happen randomly (no active usage on my end) and also not happen (in the case of my google maps trigger) randomly (~5% of the time zooming on google maps does not cause a freeze)

- I think the issue started after getting windows 10, However, I upgraded soon after building my new PC so I can't be certain of this.

- Might be placebo, but the old ctrl+alt+delete menu seems to help it end earlier than if I don't mash those keys.

What I have tried so far

I have tried everything I could find on the internet, however, I am happy to retry them. I was so convinced that I have tried everything that I got a new SSD about 2 weeks ago. It had no effect on how often the issue was happening. If people leave suggestions, I will implement and test them to see if they fix it.

List of things tried so far
- New SSD [no effect]
- RAM tests [zero errors found]
- Changing RAM slots [not a fix]
- adding another 8Gb ram [no effect]
- Disabling services - Superfetch, Windows Search [no effect]
- removing all my user install programs [no effect]
- reinstalling windows 10 via system refresh [no effect]
- replacing the sata ports with a sata-PCIe [no effect]
- replacing all case + CPU fans [no effect]

I will add to this as I try things. I have discounted all previous things I have tried before this new SSD for the sake of this list. Don't think I am being lazy here begging people for help without trying, I have devoted literally dozens of hours to this bug without success!


Things I have ruled out trying for now
- getting another new SSD.
I have already bought a new SSD and it has the same issues as mentioned, therefore I am reluctant to buy SSD #3 just to prove that it is in fact not the SSD

- Major component changes
Can't afford a new CPU/RAM kit/Mobo, unless I have exhausted all possible options (being a poor AF student)


System specs



SSD has 90gb free (so not running out of space)


Thanks for your help!
 
Solution
Good work.

File corruption is a very good possibility.

First clear the Event Viewer logs to get a cleaner picture of events and verify the correlation between the stalls and the DCOM Server error message.

Just to add some certainty that there is not some other factor involved.

I took a quick look at the Lifewire link. Looks to be a very good start towards finding and applying a fix.

And the process certainly avoids all sorts of recommended "fixit" software downloads so commonly suggested by many other sites.

Do be very cautious if you venture into cleaning or editing the registry. Last resort and full backups are strongly recommended.

Overall start with the simplest solutions/fixes and work towards more complicated.

Simple...

Ralston18

Titan
Moderator
Your system specs show Disk 1 (TCSUNBOW) running much hotter.

Any background software running that may be continually trying to read/write to the drive? Some buggy app perhaps.

Check fans and airflows. Is your system free/clean of dirt, dust, debris, etc. on the inside?

Have you tried known working data and power cables.

Have you run any disk diagnostic software on the drive? Should be some such tool available via the manufacturer?

What PSU is installed on your computer - what wattage? May be overloaded.
 

robertzhowie

Prominent
Sep 30, 2017
12
0
510


Thanks for taking some time to look!

Answers in order:

-> Hot probably because it is a cheap Chinese drive (my Kingston one with the same problem ran cooler) additionally the heat may be because I was hammering it for about 1-2 hours pre-posting. I had a scheduled antivirus, installed VS2017 and was doing a few stress tests for the sake of this post.

-> Recently got 2 new Coolermaster case fans (I have a full ATX) plus cleaned it out. I might be getting a CPU fan soon, but that shouldn't make a difference

-> Data cables actually came to mind. The only thing that I would say is that I have previously used the same data cables without issue, although it mightb e worth me investing in some new ones since they are so cheap. I will try this one and report back!

-> Diagnostic came back A-okay previously. Remember I have also replaced the drive since then, so I'm somewhat confident that it is not an issue with the drive itself. However, I am happy to try it! Is there a specific software that you would suggest? I will run it and report back.

-> Interesting new suggestion. Wouldn't I be experiencing bluescreen restarts over stalls with a PSU issue though?
But in answer to your question: 430W Corsair 80+ bronze. I worked out my system max load power draw to be ~400W, so I should be fine. Is there a way to test it that you know of?
 

Ralston18

Titan
Moderator
Power supply problems manifest in many ways.

Your PSU is "below" the load with respect to the wattages. However not much of a margin left as I see it. For the most part PSU manufacturers tend to overstate the capabilities (via testing in ideal circumstances) and the components (to be green, power saving etc.) are likewise determined to be lower than real world.

So if the PSU is deteriorating it may no longer be up to the imposed real world load.

Easy enough to test by simplifying things some. Disconnect the HDD, the DVD, graphics card, etc. and see if doing so makes a performance difference.

Then add them back one at a time. Even in different orders just to be sure. Watch what happens as the power load is increased.
 

robertzhowie

Prominent
Sep 30, 2017
12
0
510


I bought a socket power meter to make this test a little more scientific!
I will do all of the messing with the power once I get my new SATA cables and open up the PC for that.

Might be a day or two until they arrive
 

Ralston18

Titan
Moderator
Thanks for the update.

By "socket power meter" you mean a Kill-a-Watt like product.

Interesting approach to the problem....

Curious to see how the meter reports the load changes as you add/remove components.

Be sure to maintain as much consistency as you can while adding the loads. I.e., do not let some AV scan or backup start running - may skew the numbers.

 

robertzhowie

Prominent
Sep 30, 2017
12
0
510
Okay so new updates:

Max power usage over 24 hours is 180W - WELL below the rating of the PSU.

I have also discovered hundreds of entries in my event log for a process named:
"{9AA46009-3CE0-458A-A354-715610A075E6}"

with the error message:
Unable to start a DCOM Server:
{9AA46009-3CE0-458A-A354-715610A075E6} as Unavailable/Unavailable.
The error:
"740"

These roughly link to the times at which my stalls are happening.
There are already 414 google results for this, all without solutions...

I went into regedit via the path: HKEY_LOCAL_MACHINE\SOFTWARE\Classes\CLSID\{9AA46009-3CE0-458A-A354-715610A075E6}
and if found that this error is generated by:
%SystemRoot%\System32\rundll32.exe %SystemRoot%\System32\shell32.dll,SHCreateLocalServerRunDll {9aa46009-3ce0-458a-a354-715610a075e6}

I have no experience reading this string, but to me it looks like rundll32.exe (the process which runs dlls?) is calling shell32.dll (a dll, no surprises here) and then is calling SHCreateLocalServerRunDll with some ID code.
Can anyone clarify this?

Either way this might signal some corruption in my shell32.dll, so I think next steps are listed by this article on shell32.dll errors:
https://www.lifewire.com/how-to-fix-shell32-dll-not-found-or-missing-errors-2624008
 

Ralston18

Titan
Moderator
Good work.

File corruption is a very good possibility.

First clear the Event Viewer logs to get a cleaner picture of events and verify the correlation between the stalls and the DCOM Server error message.

Just to add some certainty that there is not some other factor involved.

I took a quick look at the Lifewire link. Looks to be a very good start towards finding and applying a fix.

And the process certainly avoids all sorts of recommended "fixit" software downloads so commonly suggested by many other sites.

Do be very cautious if you venture into cleaning or editing the registry. Last resort and full backups are strongly recommended.

Overall start with the simplest solutions/fixes and work towards more complicated.

Simple and complicated are subjective with respect to the time, effort, and risks involved.

I would probably go with Item 1 first, then Item 4.

Be sure to have everything backed up, do not download third party products. Stay with Microsoft's tools and downloads.

Go forward and hopefully the corrupted file will be fixed or replaced with working version.
 
Solution

robertzhowie

Prominent
Sep 30, 2017
12
0
510
Okay so back to it after graduating and travelling.
Unfortunately, the DLL error was a bit of a red herring.

Following things have been tried recently:

- Disabling the following services: Superfetch & windows search (as per an old thread) [NO EFFECT]
- Changing RAM slots, as a hunch that it might be an issue with the RAM communicating with the drive [NO EFFECT]
- Checked power draw again as per Ralston18's comment - stable before and through fluctuations [PSU fault much less likely]

Interesting observation:

- If multiple drives are in use at the same time, then when one has the 100% usage fault, they all have the fault. Since there are a couple of SATA headers on my board, this might indicate that a MOBO or connection error is unlikely.

Upcoming fix:

- Fresh reset from scratch. I have been unwilling to do this before since my Coursework was on this PC, however now I have graduated, it is less paramount. Will report back on the effectiveness of this approach.
 

robertzhowie

Prominent
Sep 30, 2017
12
0
510
New update:

The windows log:
"The beta feature EseDiskFlushConsistency is enabled in ESENT due to the beta site mode settings 0x800000"

Seems to appear alongside almost all of these crashes. In addition, a simple google search reveals that this is a common problem, with many people reporting the same issues. Will report back one I know for sure.

I wish that I could roll back to a previous version of windows... but alas this is not allowed.
 

robertzhowie

Prominent
Sep 30, 2017
12
0
510


This is for ESSNT errors - rather than my message (occurring at exactly the time of the hard drive stall). I have scoured the MS forms for my "beta feature" warning - and while I have found lots of posts complaining of this issue - MS is yet to comment with a solution.

However, I have kept trawling logs since then, and I have found another error in the NTFS storage section. This occurs at the same time as the ESSNT warning.

See image:
29olwfd.png
 

robertzhowie

Prominent
Sep 30, 2017
12
0
510


Thanks for the ideas, but its definitely not chrome (was my first thought!) While the problem is usually caused by heavy usage on chrome, I have tested it with other programs & its occurs at any high load.

disk drive diagnostic software comes back green + it has persisted over a change of hard drivers.
 

robertzhowie

Prominent
Sep 30, 2017
12
0
510


Now that's an interesting idea. I am buying one now to have a go. The last component I will buy before I give up and deem it to be the mobo!
 
ok, open Resource Monitor and monitor the CPU, the SSD and the Network, sorting by max bytes written/read sort of thing when this happens see what's actually using all the cpu and ssd etc..

at this point i would suggest backing up all data (to cloud as well if you can) and do a complete fresh install of windows 10