Is one of my memory channels fried?

bubbleking

Prominent
Sep 9, 2017
3
0
510
I have a Gigabyte GA-X79-UP4 rev 1.0. You can find its manual on this page:

Click

To make a long story short, I was recently having some problems with overheating, so I opened my box and discovered the radiator of my cooler was caked in drywall dust from when I had a home repair done some months before. I cleaned the radiator and everything else, and reassembled, only to find my machine didn't POST (I thought) or show video. I had assumed my case had an internal speaker, which wasn't true after all (more on that later).

I tested my CPU (i7-3930K) in another machine, to no avail, and had tried the other machine's CPU (a Xeon with the same socket) in my machine, also to no avail. At this point, I assumed my motherboard and CPU were fried, and was about to order some old stock of the same models, when I decided to try a few more tests.

I discovered my case had no speaker, so my POST assessment might not have been true. I picked up a speaker and a mobo diagnostic card. Using those tools, I soon discovered the problem was memory related. Removing the RAM and adding it back in various configurations, I found that the machine could boot and launch Windows! However, when I tried to use all 4 of my sticks, I had the same problem come back.

My motherboard has 4 channels of 2 slots each. I have 4 sticks of 8 GB each. I tried various configurations, guided by the mobo's manual. I found that I could use 2 or 3 sticks (any of the sticks... all of them work), as long as I did not use channel B. As soon as I put a stick into channel B, the problem returns. The readout on the diagnostic card, for what it's worth, is 6766. To be honest, I'm not sure what to make of the 4-digit readout for 2-digit codes. I suppose this means codes 67 and 66... maybe the last one that succeeded and the first one that failed?

In any case, does this sound like channel B is fried? Is there something I can do? If the channel is fried, should I replace the motherboard, or is it safe to use the board as long as I avoid using the damaged channel? It would certainly not be fun to reduce my RAM from 32 GB to 24 or 16, but it possibly beats having to track down my same board, paying double for another one since it's out of production and seems to still be quite popular. I should mention that I have good reasons, other than money, for wanting to use the same old board and CPU rather than upgrade to the latest stuff.

Also note: I can only assume that when I tested my CPU, which I now know works, in the other machine... that machine's motherboard must not have supported my CPU, despite having the correct socket.
 
Solution
Ah, the LGA 2011 socket is rated for 30 insertions, but LGA 2011 i7 processors are only rated for 15. Each time you install the chip it bends each pin and puts a dent on each pad, and while things can last longer than the rated life they are only guaranteed to work that long. The PCIe x16 connector is only typically rated for 50 insertions for example (depending on manufacturer) but nearly always lasts longer than that before developing problems. But it's certainly nothing like microUSB's 10,000 insertions.

BTW the DIMM sockets are only rated for 25 insertions.

Any time there is a change to settings or hardware, it should be retested for stability. I prefer 10 standard passes of IntelBurnTest and 24h of Prime95...
All it takes is a single pin in the CPU socket or DIMM connector with a poor connection to make the channel fail. Take it apart again and blow it clean with a can of air, then look for bent pins before giving up.

BTW everything has a spec: that processor is only rated for 20 insertions, and the LGA1155 socket is only rated for 15 insertions!
 

bubbleking

Prominent
Sep 9, 2017
3
0
510


Yikes. I never even considered there might be such low insertion limits. Where did you find that? By the way, the socket is LGA 2011 on my board. In any case, if you don't mind, I'd like to ask a couple of follow up questions.


    ■ Now I'm a little bit freaked out about insertions and finding those pins. Right now, the machine is working with 3 channels for 24 GB of RAM. It really needs to be backed up. Would there be any danger in running it like it is now, until my cloud backup service can complete, before I start messing around with pins and another round of thermal paste?
    ■If the problem is in the DIMM connector, is there anything I can even do about that?



 
Ah, the LGA 2011 socket is rated for 30 insertions, but LGA 2011 i7 processors are only rated for 15. Each time you install the chip it bends each pin and puts a dent on each pad, and while things can last longer than the rated life they are only guaranteed to work that long. The PCIe x16 connector is only typically rated for 50 insertions for example (depending on manufacturer) but nearly always lasts longer than that before developing problems. But it's certainly nothing like microUSB's 10,000 insertions.

BTW the DIMM sockets are only rated for 25 insertions.

Any time there is a change to settings or hardware, it should be retested for stability. I prefer 10 standard passes of IntelBurnTest and 24h of Prime95 to be absolutely sure, but it depends on your preference and how critical it is to have no errors. If tests stable, then there is certainly no problem using it. If the system is heavily overclocked then retesting yearly to detect any deterioration is likely a good idea too.

It could simply be a small piece of dirt or drywall dust preventing a good connection which is why I suggested blowing it out. A bent pin can be carefully bent back (be very careful as they break easily). There are good contact cleaners like Deoxit but it's rather oily so could attract more dust in the future--however if it works, just try to avoid disassembling things again in the future.

And I prefer to configure fans to suck rather than blow through heatsinks and radiators just so they can be vacuumed and cleaned from the other side without having to disassemble anything.
 
Solution

bubbleking

Prominent
Sep 9, 2017
3
0
510


I'm curious about this part. Wouldn't that be a bit worse for ventilation as you'd be passing the air from inside the case, which is presumably warmer than the air outside the case, across the radiator?

Regardless, this has been extremely helpful information, and I'll be trying this all as soon as I have the time to dig back in. Thanks!

 
Yes with radiators if you wanted the fans inside the case--the other components would run hotter in the prewarmed air, but at least this would allow you to vacuum the radiator without even opening the case. However the fans could be mounted outside of the case instead if you wanted better cooling, and I use those wire fan grilles to keep fingers from inadvertently touching the moving fan blades. An alternative would be to use a filtered case so any air reaching the exhaust fans wouldn't leave dust in the radiators, and while the filters plug quickly so need frequent maintenance (the holes are much smaller) they usually just slide out for washing.

With air cooling, things are much simpler as you simply reverse the fan. Then you could see any dust buildup on the other side of the heatsink so can easily decide when a cleaning is necessary. When it is, it can be done with a toothbrush + air or vacuum without even taking the fan off.