Need Someone far beyond good for this problem..

Reactiontime

Prominent
Jul 1, 2017
6
0
510
I have had a problem for over 3 years with this rig. I can't figure it out. I need help of someone with even more knowledge than myself. Many have tried, nobody seems to be able to lead me in the right direction.

My Rig;

MSI 990FXA-GD65 Mobo
16GB Dual Channel (8x2Gb) RAM (CMY32GX3M4A2133C11) (this is where I am still thinking the problem lies, so I will omit more info on this for now)
AMD FX-8350 Stock clock 4.0
EVGA GTX 1080 SC
1xSSD for OS + startup programs
W10 Pro (previously W7 pro in the past, and had the problem on that OS too)
Corsair CX750M PSU
3 other drives for things like home network, media, and business of different variations, although I've ruled them out entirely.

The problem;
I get into games (I'm big on wow and counter strike global offensive). I get to a point where sometimes almost immediately, sometimes after 5 minutes, and sometimes after 4 hours, the games start stuttering. If I do not restart the computer, the stuttering becomes SIGNIFICANTLY worse over a VERY SHORT period of time, so much to the point where I can't move for 10 seconds freely without freezing for an additional 20-60 seconds at a time. (Stutter to be described as complete game freezing, with the audio going into an infinite loop until the system picks back up again) I have had this problem for YEARS!
The problem is not only video game related, but merely occurs faster when I am in them. The problem also occurs when im only simply in windows, but more frequently when browsers are open. Often times if google chrome is open, google chrome will say at the bottom left (waiting for cache). Whenever it says that, the computer is in the middle of freezing. Reliability report shows no problems other than it seeming to point the finger at whatever program was active at the time of the lockup. For instance, reliability report shows that Wow.exe "stopped working" as the summary when the freeze occurs, but if the freeze occurs when I'm in microsoft edge, it points the problem at edge. The programs have no alterations or changes. It occurs even on a brand new fresh install, with no programs other than appropriate drivers


What I have tried;
I have done a clean install of both w7 and w10. Problem remains the same
I have replaced the PSU. Problem persists.
I HAD a 32Gb 4stick setup, but dropped it to 16 with only 2 sticks to remove some load from the IMC. No change.
I have upped the ram voltage to the maximum safe zone to try to increase stability (1.65). Problem Persists.
I have tried using only 1 stick of ram at a time, resulting in the same thing for 6 different sticks, including sticks that are known to be running properly in my wifes computer with her A10CPU setup.
I used to have 2xGTX460's in SLI. The problem existed with those video cards when I only had 1 and 2 cards running.
I have tried changing the PCIE slot the GPU's reside in. Problem persists.
I have tried using only 1 stick at a time, including a stick of ram that is known good from my wife's computer, that is NOT the same brand or setup that she runs with her A10 processor and still had the problem.
I have turned off all the power saving options within the bios
I have turned off AMD Turbo Core technology to prevent the multiplier from throttling.
I have tried putting a little more voltage to the Vcore to increase stability.
I have tried all current, beta, and rolled back versions of drivers.
I have updated my bios to the latest verison per MSI's website. The problem existed even with older versions of the bios.
I have tried changing power profiles to support maximum performance across the board, and even at one point thought I had the problem pin-pointed to cores parking at inappropriate times and causing load to be redistributed improperly. To combat this, I used a software called ParkControl, but it didn't seem to bode any stability changes, but a slight performance increase when the machine actually wasn't freezing.
I have tried rolling the multiplier to result in a clock speed of 3.6 instead of 4.0. This seemed to make a very slight change, but the end game was the same.
I have ACTIVELY and AGGRESSIVELY monitored the temps of my CPU, GPU, and MOBO, all of which stay in a completely acceptable range both at idle and under load (under 50C)

What is seeming to help me right now;
I have been recently playing with the RAM timings, since this RAM has an XMP profile of 11-11-11-27, but I barely know enough about RAM timings to feel super comfortable messing with the numbers much, outside of what the system automatically configures it to. What I have noticed, is that when I am set to either 1333, 1600, 1866, or 2133, and I run prime95 the hardware never fails, but I still have the stuttering issue in windows and in games.

On a hunch, I backed the ram off to an amazingly slow 1066, and the system automatically clocked it one time to 6-6-6-15. I played games with 6-6-6-15 for almost a whole day, and amazingly, the stuttering problem was almost completely gone. It occured one time, and didn't reoccur again afterwards, but amazingly, if I run prime95 on these settings (6-6-6-15 with 1066Mhz), I almost immediately get hardware failures, workers shut down, or BSOD. This is what leads me to believe my motherboard is clocking this ram in a way that isn't proper

As another note, the ram also runs completely fine in my wifes computer, and has passed memtest86+ dozens of times.

Is there any RAM or just super gurus in general out there who might be able to explain to me what might be going on, and what I can safely try in order to finally lay this problem to rest? I'm no stranger to tech work, but this one has me ENTIRELY stumped, and has for years. If you need ANY additional information from me, I will post it up.
 
Solution

Reactiontime

Prominent
Jul 1, 2017
6
0
510


While I appreciate the insight, the problem existed when I was using a GTX460, which is 2 years OLDER than the FX8350. At the time of using the 460, I also came to testing the theory that the graphics card at that time was bad, and switched to a 250 as well, but it did not solve it. Since then, I have clean installed windows almost a dozen times and moved to the new GPU since I was fairly certain it had nothing to do with it.

Any other approaches?
 

Darroch192

Reputable
Jun 22, 2014
17
0
4,510
It looks more like a CPU, Motherboard or Hard drive problem. If you changed the stick to a totally different brand and model and the problem persisted then it's not the ram. It even looks more suspicious as when the ram functions slower, the error rate is reduced.
Try to use your ram on another PC and check if there are errors. If not, try to change the hard drive. If that doesn't work, then it's Motherboard or CPU.
 
You know I was once told not to start on the answer before you have read all the question. My bad, that is just what I did. I can see you have an extensive analysis so far.

You have done the usual software settings/fixes with no effect. System temperatures are normal. You have done component swapping with the PSU, RAM and GPU with no effect, leaving the CPU, mobo and storage in question. Slowing RAM timings affects the issue but does not eliminate it. Running Prime95 at standard RAM settings does not result in system failures but when you downclock the RAM Prime95 fails.

First, if Prime95 is not revealing any hardware failures then I wonder if it is a hardware issue related to the computational and memory management processes. The floating point calculations used in Prime95 are pretty good at picking off errors in these computational and memory management processes. If Prime95 passes chances are that your hardware involved is good. I have had RAM pass Memtest and Prime95 picked off an issue when I stress tested the system. In fact I am not surprised that Prime 95 failed when the timings were tightened, this leads to system instability when you go too far. So CPU, RAM and those portions of the mobo involved are likely good.

All this leads me to one question......does the stuttering only occur with programs that are accessing the internet? Maybe there is a hardware/software problem with connectivity portion of the computer which leads to the system freezing. If your port if feeding garbled info (jabber packets I believe they are called) to the CPU it might be the issue. This part of the system would not be tested by Prime95. But this is just a hunch at this point.

The one other thing that I can suggest is trying to lay your hands on another FX CPU to see if that fixes the issue. I wish I had a stronger answer for you, I know what it is like to face these kinds of problems, they are tough.





 
WOW and CSGO are both online games, did you check your internet connection? Are you using a WiFi connection? With your CPU you should not really see stuttering as bad as you are seeing, although the AMD FX CPUs do cause stuttering and lower FPS in games. My son had a 6 core AMD CPU, went to a two core i3 and CSGO not only smoothed out, he got another 40 fps from the CPU.
 

Reactiontime

Prominent
Jul 1, 2017
6
0
510


As far as only happening to programs on the internet, that's a tough one. I believe I narrowed that problem out when I was still using Windows 7 as my primary OS by completely removing the Ethernet cable, but as far as windows 10 I have not revisited that. I don't think I'm being lead there because of the fact that the whole system is locking up, and even if I boot the computer, and before launching any programs MANUALLY (aside from auto launching programs obv), I'm still seeing the issue. I've experienced the issue even minutes after a perfectly clean install, before installing any programs, which for me pretty much negates much possibility of it being program-related. I have also while the problem has persisted, been through 2 different modems, numerous different CAT cables, and even been to a LAN event where we used THEIR networking equipment, and the problem still persists, so that kinda rules that possibility out. The same exact ram does NOT cause issues in my wife's computer, nor my daughters computer. They both have A-Series processors and slightly different gear, but no issues.

The only thing that leads me to still point at RAM, is when I run Prime95 if my motherboard is on ALL auto/default settings, prime 95 passes with zero errors for days (literally days), but the system stutters horribly. When I down=clock the ram, the system stuttering significantly subsides (not entirely, but extremely noticeable), but prime95 fails MISERABLY and almost instantly BSOD's me. So it's either I get no recognizable stability what-so-ever with what is seemingly proper configuration, or I have a machine that is horribly improperly configured, but is actually somewhat stable....(still not even stable enough to not drive me nuts though).....

Do you see why this problem drives me absolutely insane? When properly configured, it's at it's worst! When IMPROPERLY configured, the thing is better than before, but the problem is still there....it makes 0 sense! I'm hoping for some sort of computer GOD to help with this one.
 

Reactiontime

Prominent
Jul 1, 2017
6
0
510


I feel like reducing the speed of the RAM and that having an affect on it should point it to a RAM settings more particularly problem shouldn't it? Or motherboard failure.....
But my question is more of a, ***WHY*** would reducing the speed of the RAM noticeably change the rate at which the errors occur? If I can figure that out I may be able to more definitively point at something as the "for sure problem", but I don't know enough about RAM or the way it communicates....I'm a 31 year old self-taught high school grad....not an engineer... and unfortunately if I tweak settings too horribly and blow something up, I don't have the means to replace anything at this moment. This makes me scared to push things to super-highs or super-lows to experiment further for a solution without good reason to suspect that it would be the solution to my problem or knowing that my test can't cause damage.

By raising ram timings, I'm effectively slowing the ram down? I tried to read several articles on RAM and I still don't understand what I'm actually causing to CHANGE when I change the 4 golden numbers of the RAM timings....The explanations are extremely technical beyond even my scope of understanding....can someone dumb it down for me? Maybe I'd feel more comfortable playing with the timings if I knew for sure what the changes affected.
 
I know how you feel, I have 25+ years of professional engineering experience in the area of root cause and failure analysis and have faced problems like this before. They are frustrating and in truth some never get solved. But let's try, maybe someone will provide the divine insight.

First, just because slowing the RAM speed affects the issue doesn't mean the problem is with the RAM. I am starting to think that it might be related to an error correction process where by slowing the I/O of the RAM you allow the error correction process to catch up and not bottleneck the other processes. And this is likely a hardware issue as a fresh install does not fix the problem. Based on what you just stated the errors would have to be generated on boot but likely are not related to the CPU or RAM. Not for sure but indicated against by the Prime95 stress testing.

Not sure what diagnostic tools you have used to test the various systems but I will suggest that now. Here is a link to some, maybe someone with more knowledge than me can chime in on this.
http://www.makeuseof.com/tag/13-windows-diagnostics-tools-check-pcs-health/

Also, again I suggest trying to do a component swap on the CPU. This is one of the main techniques in something called "Shainin RED-X Problem Solving" which relates inputs to impact on the problem.
https://www.innovationservices.philips.com/app/uploads/2017/01/shanin-techniques-jo-mooren.pdf

 
Solution

Karadjgne

Titan
Ambassador
If the motherboard drivers are out of date or non-existant (common especially for win10 creators update) then that will act as a hardware issue. Using Intel drivers (native) on the Intel Sata connectors is fine, but the other Sata ports are commonly Renasus or Marvell or Asmedia and not having those drivers means relying on windows generic, which leads to Sata issues as the performance is not there. This is especially true of the Lan drivers, so much so that just for the Creators update MSI has released updated Lan drivers for the Z77 boards for sure, which have not been touched since October of 2013. It's a common step missed, especially in new or reset builds as the drivers are part of windows/System32 or syswow64 folders.