Computer hangs at any random time

Slavon-93

Honorable
Jan 22, 2014
25
0
10,530
Hello, everyone.
I'm having this problem for some months and don't really undestand, why it's happening.
So, for example, I'm surfing the internet, watching youtube and suddenly everything stops, fans start working faster and my computer stops responding until reset. Moreover, when I press the RESET button, it takes some time to actually reset and the reset looks like a power off and then a power on.

Some info about my computer:
1. Asus P8Z77-V LX
2.16GB RAM Hynix
3. Intel core i5 2500K overclocked to 4200MHz with multuplier. Voltage is auto.
4. Gigabyte GV-N670OC-2GD
5. PSU Chieftec cft-750-14cs 750W

What did I try to solve this?
1. I've run prime95 for a couple hours 2 or 3 times and the results are usually good. Only once it showed some instability with rounding. But it was only once.
2. Overvoltage and undervoltage. Currently both lead to a kernel panic in Linux. Working only with voltage set to auto. I don't know why. It behaves this way since I changed the thermal paste (and when I was changing it, I took the processor out of socket). BTW the computer has already been hanging before I changed the paste.
3. Different OS - I have Windows 8.1, Gentoo Linux and OSX 10.9.1. All of them can hang - it's not a software problem. Usually I'm on Linux or OSX.
4. Changing the thermal paste.
5. memtest86 - passed well.
6. Updated BIOS (latest version now)

What else can I say?
1. I've noticed that a little area of the processor contacts (not socket) is a bit darker then others.
2. Computer usually hangs once or twice a day. Rarely more.
3. It never happened when the processor was compiling something or while tests were running.
4. It never happened on LiveCDs, in UEFI BIOS and while testing memory.
5. It usually happens after some hours of work. (3-4)
6. There's nothing in logs about this and I can't predict the time when it will hang.
7. Some days it doesn't hang at all.
8. It shuts down loudly (I don't know how to describe it. It looks like all the fans just stop working at once and the sound is like "peeew")
9. Sometimes it resets on start before POST. Not always, but sometimes.
10. It soesn't like RAM frequencies more than 1333MHz. 1372MHz lead to a continuous reset-on-start and I need to press the MemOK button to make it work again.

So, what's happening? Is it CPU, PSU, RAM, MB? Can I solve this problem without buying new everything? Thanks in advance.
 

npsgaming

Distinguished
Mar 31, 2009
146
0
18,710
Out of curiosity, what kind of power supply are you running your system with? How about some additional specs (the rest of the computer too :p ) ?
 

Slavon-93

Honorable
Jan 22, 2014
25
0
10,530
If you look more attentively, you'll see that I've posted everything about my computer config. I don't mind repeating, but...

Some info about my computer:
1. MoBo: Asus P8Z77-V LX
2. RAM: 16GB RAM Hynix 4x4GB
3. CPU: Intel core i5 2500K overclocked to 4200MHz with multuplier. Voltage is auto.
4. Video: Gigabyte GV-N670OC-2GD
5. PSU: Chieftec cft-750-14cs 750W

If you need something exact, let me know. The list above is just what I remember.
 

npsgaming

Distinguished
Mar 31, 2009
146
0
18,710
Aha I missed it in the original post! (it is 6:45 am but oh well :) ) What about your cooling setup?

I had a horrible time getting my 1st Generation i7 960 past 4.0Ghz and found out it was not CPU cooling but chipset cooling on my mainboard that was holding me back. Do you have a method to check temps on your chipsets while the machine is in operation?
 

Slavon-93

Honorable
Jan 22, 2014
25
0
10,530
vmN
I thought so, but not all OS would hang then, cause each one is on its own HDD. The fan speed usually increases, but not always. E.g tonight I realised that it's hanging just because I didn't hear the HDD sound.

npsgaming
CPU cooler is Thermalright True Spirit 140 as far as I remember. I don't get very high temps. When I just do some usual stuff, the temperature is about 38-40 °C, while compiling it's about 55-60 °C. When I was running prime95, it raised up to 78-80 °C after some hours of work.
About chipset. I don't know if there's a method. In linux there're some temp monitors but I don't know which one is chipset. I touched it by hand and can't say it's very hot. Hot, but not very :) At least it didn't hurt my hand. I think it's about 50 °C plus-minus 5 °C. I can reboot to windows a little later and see if AIDA64 can show me the actual temp, but not now.
 

npsgaming

Distinguished
Mar 31, 2009
146
0
18,710
Those are good temps for an overclock. I think in the order of possible solutions, I would remove any overclocking (maybe save the settings to a BIOS profile if possible for quick retrieval?) and see if the problem persists without an OC.

Everything you've described sounds like the problem is thermal in nature. I don't imagine its the CPU though as you said it doesn't hang during tests or renders.
 

DeathAndPain

Honorable
Jul 12, 2013
358
0
10,860
Could well be the northbridge. You may want to take its cooler off, carefully remove the thermal pad (if any) or clean away the thermal paste and replace it with fresh paste.

Other than that, the obvious start is removing the overclocking and then running Prime95 for several hours, looking whether the system is still unstable. Make sure to run it twice: One time torture test at small FFTs (to take the CPU to its limit), the other time custom mode with all your RAM minus 1.5 GB being specified for use to test northbridge and RAM.
 

Slavon-93

Honorable
Jan 22, 2014
25
0
10,530
Ok, I'll try, but something tells me it's not overclocking. Anyway, thank you. I'll write here if I get some results.
 

npsgaming

Distinguished
Mar 31, 2009
146
0
18,710


Indeed. I had thermal issues with my northbridge chipset when I overclocked my i7 960. The cpu never went above 45c on load but the northbridge was frying eggs at 101c . A quick trip for thermal paste and a small fan and the problem vanished!

 

DeathAndPain

Honorable
Jul 12, 2013
358
0
10,860
I perfectly agree it may not be overclocking in your case, but when running into problems and wondering what the reason could be, going back to the specifications is the obvious way to start with. If the problem persists, you are 100% it has nothing to do with oc. If the problem vanishes, you can continue your investigations from there.

Likewise, you should revert everything to BIOS defaults (except absolutely necessary settings like AHCI mode). I have witnessed it often enough that all BIOS settings appeared to be perfectly innocent, yet when loading BIOS defaults, the problem vanished. The mainboard manufacturers sometimes define defaults for some settings that appear weird and unreasonable, but they do so for a reason. All it takes is a simple attempt. If loading BIOS defaults does not change a thing, go looking elsewhere. If it fixes things, put one setting after another back in place until the problem reappears. It may require some time and tenaciousness, but this is the professional approach.

(BTW, I learned the latter years back from my former boss. I kept saying: "It cannot possibly be the BIOS settings!" He said: "Ok, but reset them anyway and see whether the problem persists." After embarrassedly noticing more than once that the problem went away, it is now part of my personal best practice. :) )
 

Slavon-93

Honorable
Jan 22, 2014
25
0
10,530
Oh, just got a hang, lol :). So, what was running?
1. Linux: Chrome, Vmware (Win7), Virtualbox (Win7), Skype
2. Win7 (Vmware): Office 2013
3. Win7 (Virtualbox): Chinese cloud tool was uploading files.

As I was touching the chipset, my computer has an open side now, so it gets more air. While everything was hanging the network interface was blinking. HDDs had a sound like an attempt to read which stops at once. (repeatable). Chipset had the same temperature I've already written. Time since the last hang - about 10 hours (I don't know when it hung at night - I was sleeping).

I pressed a "Load optimised defaults" button in bios and only turned the virtualization on cause I need Windows now. If it doesn't hang until evening, I'll run prime95. If it does, I'll write back.
 

Slavon-93

Honorable
Jan 22, 2014
25
0
10,530
And once again it hung. Now I'm having no overclock, max CPU Speed is 3,7GHz in turbo mode. Starting MPrime, to see if CPU or memory are working bad.
 

Slavon-93

Honorable
Jan 22, 2014
25
0
10,530
It's me again. I don't know if someone's watching this thread, so I'm not editing messages, but writing a new one. I've run MPrime first with small FFT, then in blend mode with 14,5GB RAM (actually my monitor showed 15,5GB used and 1 in swap). This test resulted in death of 3/4 workers.
Full log of both tests: http://pastebin.com/WTfGwJmM
There you'll see some log opening errors- it's just because I was running MPrime as a user, not root.

I'm launching memtest86+ for the whole night to see if my errors are RAM related. Will write the result in the morning. If it passes, I'll be surprised.
 

npsgaming

Distinguished
Mar 31, 2009
146
0
18,710
Yeah with CPU out of the way, overclocking not an issue, and power cleared the number of usual suspects are running thin. At this point I would be willing to bet its ram related (maybe check timings in BIOS when you can?). Otherwise its time to start getting into that dark zone of "what on the motherboard is dead...." Good luck with memtest! Post back when you get results :)
 

Slavon-93

Honorable
Jan 22, 2014
25
0
10,530
2 errors
1. Test 7, Pass 3, 000630a4230, 1584,6MB, good - e628b6b7, bad - e620b0b6b7, err bits - 00080000

2.Test 7, Pass 3, 0004e6c4230, 1254,7MB, good - 468f69c8, bad - 468769c8, err bits - 00080000.

I'll be testing until I need the computer. If errors appear, I'll write. At this moment I'm curious, why these errors are so alike? Does it mean bad RAM itself, bad memory slot or bad settings in bios? And what should I do with timings - set them to an unknown value and wait for 7 hours until memtest finds errors again? Is there a quicker way?
 

Slavon-93

Honorable
Jan 22, 2014
25
0
10,530
So, I finally stopped the memory test and here're the results + bios timing setttings. Sorry for the quality - Nexus 7 doesn't have a very good cam.
 

DeathAndPain

Honorable
Jul 12, 2013
358
0
10,860
Ok, so you got a RAM-related problem. Either any settings in your BIOS are suboptimal (you lowered RAM timings or the BIOS failed to set the right timings on automatic, something that unfortunately does happen with some BIOSes), or your RAM does not work properly. In the latter case it is again the question whether your RAM is faulty or simply incompatible to your mainboard. If you have used it successfully before on this mainboard, the latter is ruled out. In that case you have a case of faulty memory. Test one stick at a time to determine which of your sticks is faulty.
 

Slavon-93

Honorable
Jan 22, 2014
25
0
10,530
I'm currently testing a pair in slots A1, B1 (the left pair). If I get some errors, I'll test each stick in this pair separately, until I find out what stick(s) give(s) errors. The thing I don't really know - is what to do if they DON'T give errors, being tested separately.
 

npsgaming

Distinguished
Mar 31, 2009
146
0
18,710
It is my understanding that as you add more memory having the correct timings predefined in the BIOS becomes more important. If the memory passes being tested separately double check for correct timings first (in case you didn't know the timings are usually on a sticker and look like 11-11-11-28 ;) ). Next, verify voltages and make sure your dram voltage or dimmV is set appropriately. If these are set and the problem still occurs when the memory is grouped it could be a bad slot on the motherboard :( .
 

Slavon-93

Honorable
Jan 22, 2014
25
0
10,530
Emmm, I don't see anything similar to timings.

I only found in this doc from hynix that timings for PC-10600 are 9-9-9. And I don't know where to find the last number there. It's written "Hynix H5TQ2G83BFR H9C 111V" on chips.
 

Slavon-93

Honorable
Jan 22, 2014
25
0
10,530
Ok, plugged the rest two dimms back, set the timings to auto-auto-auto-10 (auto = 9). Will see, if it helps or not. And I see memtest is still reporting that CAS is 9-9-9-24, though I've definitely set the last value to 10. And I really don't get it, why the lower timing values (this means lower delays, faster work, right?) should be better? Or they shouldn't? Will post the results in the morning. Thank you.
 

DeathAndPain

Honorable
Jul 12, 2013
358
0
10,860

Well, what will you do if your computer catches fire during your tests? There is no point worrying about possible outcomes until/unless they have become true. I would be surprised if there was no problem with any of the modules when tested individually.


Well, there are dozens of timing values that can be set in BIOS. You need to make sure to alter the right ones that correspond to the 9-9-9. The 24 is a different value, and it is nice that way. If you find the corresponding setting in your BIOS, you may want to set it to a more conservative setting of 27 and see whether that helps.

9-9-9 is pretty normal for such modules. For testing purposes, you may want to try 10-10-10.


Look, a RAM chip is an integrated circuit with a number of address and data legs. When the CPU wants to access (e.g. know the content of) a certain memory cell, it puts the address information on the address legs of the chip. The chip then needs a certain time until the voltages on its data legs properly reflect the content of that cell.

These values represent how long the computer waits after sending the target address over the address pins until it processes the voltages on he data pins and interprets them as result data. If this delay time is too short, the voltages may not yet have reached valid levels, and the content of that cell may be misinterpreted.

These delays are given as multiples of clock cycles. Based on the clock speed with which you run your RAM you can compute the actual length of a clock cycle in nanoseconds. This is also why these numbers usually need to be higher when you clock your RAM faster (e.g. use DDR3-1600 instead of DDR3-1333) and set your BIOS accordingly. The faster clock speed means shorter cycles (in terms of nanoseconds), so you need more of them for the same delay.

Shorter delays obviously mean a performance increase as the CPU needs to wait less for the result data. Longer delays increase reliability. Which is why it may make sense for you to experimentally increase your delay settings just to find out whether the problem is anyhow related with them. If the errors go away, they were related with too short delays. If they do not go away, then they had nothing to do with your memory timing settings, and you can revert back to your original settings (either that, or you did not alter the right of the numerous available timing settings in your BIOS, most of which are usually poorly documented).
 

Slavon-93

Honorable
Jan 22, 2014
25
0
10,530
Well, after setting the last timing value to 10 (instead of 24) and testing memory with memtest86+ for around 15 hours (6 passes), I've got no errors. It's still not clear for me, why it does work this way, but anyway, it does. Now I'll just boot into different OS and see if this solution really solved my problems. Thank you all for your help.