New homebuilt has frozen a few times, how should I diagnose?

brownbat

Distinguished
Oct 23, 2009
52
0
18,640
I recently built a system, the details are here.

In the last 3 weeks since I've owned the system, I've had a total system freeze* 4 times. I'm a little worried that I did something wrong applying thermal paste, only because I've never done that before. But these freezes seem to happen after I've left the machine for a little while not under heavy load, or shortly after I've resumed from sleep. It's never happened while gaming, and RealTemp (even though I haven't calibrated it with an infrared thermometer or anything) seems to always stay below 30 degrees C. (That's fine, right?)

Maybe it's a RAM thing, or maybe it's just that Win7-64bit is a little buggy.

It does not seem consistently repeatable, I can't trigger it, but it does seem to happen every so often.

Any tips on how I can catch this event to fix it? Can you help me rule out the thermal paste thing, which worries me the most?

*By "freeze" I mean no blue screen, the mouse and keyboard stop responding and the screen gets fixed as it is. Num and caps lock are nonresponsive.

Other than these pretty rare events, the system runs beautifully.
 

kufan64

Distinguished
May 12, 2009
391
0
18,810
There's not really much you can do if it happens intermittently. I'd record what you were doing and what was running each time it crashes to see if you can spot a connection (it helps me to write things out sometimes). You could also check the logs in Event Viewer and see if you can find any clues there.

I'd run Memtest and Prime95 if you haven't already.
 
You're describing exactly what happens if the RAM is not getting enough voltage. What brand/model of RAM do you have, and how many sticks? And what motherboard -- couldn't hurt to know.

Could also be faulty RAM, so it wouldn't hurt to check. The fact that there's usually no "trigger" for the problem does seem kind of strange, since voltage issues often cause freezing when you start to run programs that put a load on the memory.

But with any luck, all you'll have to do is go into the BIOS and manually adjust the RAM voltage to the correct level that your memory requires. Start with the easy stuff, that's my motto.
 

brownbat

Distinguished
Oct 23, 2009
52
0
18,640
ASUS P7P55D PRO LGA 1156 Intel P55 ATX Intel Motherboard
http://www.newegg.com/Product/Product.aspx?Item=N82E16813131405

G.SKILL Ripjaws Series 4GB (2 x 2GB) 240-Pin DDR3 SDRAM 1600 (PC3 12800)
http://www.newegg.com/Product/Product.aspx?Item=N82E16820231277

There's a comment in newegg about the RAM's voltage and BIOS settings blaming the board for some inconsistency.

From the BIOS:
DRAM Voltage: Auto
Current DRAM Voltage: 1.593V

1. Should I just be specifying this voltage as 1.333 or 1.6?
2. Couldn't find a way to specify timings in that BIOS, maybe it's there and I missed it, maybe it appears after I take voltage off auto, maybe it's not important. Not sure what I should set that to anyway, so probably for the best I couldn't find it.
 


Hmm, from the specs, it doesn't look like that RAM should be giving you any problems. 1.5V is probably the easiest voltage to deal with for DDR3, and often works right out of the box with no adjustments. Only thing I can think of in that respect is that perhaps the motherboard is autodetecting it as something other than a standard 1.5V stick, and needs to be told the right setting manually. But that would surprise me.

I'm not as well versed in timings, but from what I saw of the Newegg comments, it looked like some people were just having trouble getting it to run at the full 1600mhz unless you mess with other settings on the motherboard. That's a pretty common problem with higher-end RAM, but it shouldn't be freezing your system -- in all likelihood, if it was just defaulting to 1333mhz, you wouldn't even notice it unless you checked.

I'd go with shortstuff's recommendation of testing the RAM itself for flaws.

There's always an outside chance that some software issue you don't know about is causing trouble -- e.g. a driver that's causing a conflict because there are bugs in the Windows 7 version. But try the RAM first.

 


That doesn't look like the problem. You could try manually setting the voltage to 1.5V exactly and see if that helps anything, but I doubt it, since it looks like the board is already giving it enough juice. Being over the recommended voltage won't cause RAM issues -- unless you're WAY over and you fry it. In your case, the number to watch out for is 1.65V, which I believe is the maximum your motherboard recommends without causing harm to the CPU. Don't bother trying to set it to a lower voltage 1.333V; that'll just cause problems, not fix them.

And yes, if you do decide to do it, the option to set voltage manually is often hidden in the BIOS until you disable the "auto" setting.
 

brownbat

Distinguished
Oct 23, 2009
52
0
18,640
Memtest86+ 4.00 has run for 5:30:00 so far without errors. I can leave it up for another 12 hours or so as we visit family. I wonder if there's such a thing as leaving memtest on too long...
 
You can leave memtest running for days if you want -- in fact, a lot of people recommend running it for 12+ hours to really be sure nothing's wrong with the RAM.

But it's sounding less and less likely that you have a RAM issue, or at least an easily identifiable one. If memtest doesn't come back with any problems, next thing I'd do is start checking for software issues -- maybe an antivirus program that's trying to do a specific task at a specific time every day and causing problems, or maybe if one of your device drivers is known to be buggy with Windows 7. Hmm. This is getting fairly puzzling.
 

ekoostik

Distinguished
Sep 9, 2009
1,327
0
19,460

Just to be clear - I think someone, somewhere misinterpreted speed as voltage. 1.333 and 1.6 are not common voltage settings. They are speeds for the RAM. More commonly reported as 1333 and 1600. What people are probably complaining about is that the RAM only runs at 1333MHz. That's true, under default settings. And with an i5 750 to get the RAM to run at 1600 you have to overclock your CPU. Which typically disables Turbo. So if all you're trying to do is bump the RAM, the sacrifice is likely too high, only most people don't notice that this happens.

Anyway, just wanted to clarify 1.333 and 1.6 were likely speeds and NOT voltages. Before trying to punch the RAM up to 1600 (if you even decide to do that) you may want to get the problem sorted out. Have you updated your motherboard's BIOS recently? If not you may want to do that, let it run as default and see what happens.

Also, after a freeze you should open Computer Management (Start - Administrative Tools - Computer Management). Then expand Computer Management - System Tools - Event Viewer - Custom Views - Administrative Events. When you click on Administrative Events it will show all warnings and errors. Is there anything reported at the time of your freeze?
 

brownbat

Distinguished
Oct 23, 2009
52
0
18,640
1. Thermal grease, temperature

There was a question about my cooling situation. I used MX-2 thermal grease, and RealTemp reports temperatures of 24-28 deg. C. Sometimes I've seen it go just over 30, after running some games with high settings for a few hours. Maybe as high as 33. Cores 1 and 3 seem to run a few degrees hotter than 2 and 4, maybe I didn't spread the grease very evenly.

2. I don't plan on using nonstandard RAM settings until this is sorted, maybe not after that.

3. Thanks for the Computer Mgmt tip. I've located six Critical Kernel-Power errors from mid-late November, those are probably my crashes. If I read correctly, these are the reboot events, so I should look for errors/warnings just before these.

11/17 12:15:30 AM Critical Kernel-Power reboot thing
11/16 11:48:37 PM ATI EEU Client event error in ATIeRecord

Maybe it's an ATI thing? I'll need to stress test the GPU.

11/19 9:30PM Reboot
11/19 9:22PM Google Update Error
Nothing good here.

11/20 1PM Reboot
11/20 12:51:18 RTL8167 Warning.
11/20 12:51:19 DNS Client Events Warning.
Closest error was at 9:30, DHCP timeout.
Nothing good here.

11/21 10:18PM Reboot
Warnings about DHCP and DNS, last error at 8:30PM: volsnap, shadow copies aborted.

11/24 8:48PM Reboot
In the previous half hour there were eight RTL8167 warnings about DNS timouts and disconnections from the network.

11/30 2AM Reboot
1:50AM: DNS, RTL8167 errors.
11/29 11:51PM: System Restore Errors, failure to create restore point.

Ok, so there's definitely a trend of network errors before the crashes. But that might be because the network hardware is noisier about problems than other stuff, reporting every time there's no connection, or every time I've messed with configs (and maybe while setting things up, I've been doing a lot of that).

 
Well, in any case, I'd say if you're getting those errors, it would be a good idea to update your motherboard's ethernet drivers to the latest versions. Probably all of your motherboard drivers, actually.

When you first built the system, did you install the motherboard drivers off the CD that came with it? I wonder if the drivers that came with it were from an old CD that was out of date for Windows 7. If you can find newer ones online, try them.

I'm worried about the failure to create a system restore point .. that's happened on a couple of my machines, and it's always meant a problem with the OS that was fiendishly difficult to figure out (although I admit I'm worse at software problems than with hardware problems). Usually, it got fixed by reinstalling Windows. But this is kind of a side point; try updating the mobo drivers.

Your heat does not look like a problem, by the way. Those are perfectly normal temperatures.
 

ekoostik

Distinguished
Sep 9, 2009
1,327
0
19,460
Along with capt_taco's suggestions, make sure your video card drivers are up to date. AMD was releasing updates fairly regularly through-out October and September as they addressed issues from early adoptes and Windows 7.
http://support.amd.com/us/gpudownload/Pages/index.aspx

I have read some users reports of problems with Catalyst Control Center. So I recommend first installing just the drivers. If that fixes your problem and you want the CCC, then install the CCC. If the problem happens again after that you'll know you can just revert back to drivers only.
 

brownbat

Distinguished
Oct 23, 2009
52
0
18,640
Happened again this morning shortly after resuming from sleep.

System Restore Approach
Found some system restore point errors, and I think I've solved them. Going to Control Panel > System > System Protection, I noticed that System Restore was really trying to save the state of some external drive G, and didn't care about C. Some fiddling fixed this. (Had to configure G to off before it would let me set the properties of C, strangely).

Driver Approach
1. A new RealTek driver hit Windows Update Dec 8th, so maybe that will help.

2. There's an ATI driver on their site from November 17th. I'm updating, but will stay with CCC this time through, moving to driver only if I have another crash.

3. I have a lot of other drivers to check.

The first four events were a day away from each other, but now they've spread out to 6, then 9 days. I wish I could think of something specific in my behavior that changed over that period...
 
Did you have a USB device (like an ipod or USB memory stick) plugged in when the restore error was happening? That could explain that, anyway. Although who knows if that has anything to do with the freezing problem.

I also noticed: It looks like most of your crashes are happening late at night, between 9 p.m. and 1 a.m. I wonder if your antivirus software is trying to do something automatically at roughly the same time every day, like download updates, and that's causing a conflict somewhere?

I have also heard of programs that display pop-up messages (e.g. antivirus software doing things automatically) causing conflicts with graphics card drivers if they're trying to use the same resources as something that's already running. That's usually caused a blue screen or Windows error message on anyone's machine I've run into having that problem, but it's not out of the question that it would just freeze.

Hard to say for sure, though. But maybe try a) seeing if you can induce a crash by having your antivirus program update itself while something else is running, or b) put it in silent mode for 4 hours or so at night, and see if you go crash-free when it's not active. Assuming it is the antivirus software, that could at least diagnose it.
 

brownbat

Distinguished
Oct 23, 2009
52
0
18,640
No USB drives, but I think I know what caused those errors. After the crashes started occurring, I cloned and swapped my hard drive. (I had borrowed one while waiting for my part to come in, HDD was like a week after everything else, took me at least another week to get around to swapping back). System Restore's desire to back up "G" and the volume label problem were probably left over from that process. Should've mentioned that, but it didn't seem to have any effect on the crashes, which occurred both before and after the swap.

Just now, Device Manager was letting me know that Intel's SMBus controller was not starting. Reinstalling via an Intel utility didn't seem to help, but removing that driver and letting Windows reinstall it seemed to fix the error.

I don't have any scheduled tasks, but I was using avast!, which interrupts the system all the time. I've ditched it. Since I haven't had much trouble with viruses, and since I'm mostly just concerned with OS integration, I'm going to give MS Security Essentials a go. At least while I get this sorted.
 

db17k

Distinguished
Dec 10, 2009
9
0
18,510
I'm having the same issues as stated here. I just built my computer last week, here is the hardware list.

Intel i5 core 750 cpu
Gigabyte P55-UD3R mobo
2x2gb Corsair XMS3 RAM 1600 Mhz

All my drivers seemed to be up to date until I read that there was a new realtek driver out as 12/8/09

Anyway as I was researching error RTL8167 online, i came across an article that said to use performance monitor and add extra counters to help diagnose the issue of what's causing my system to crash.

As i was monitoring the Performance Monitor with counters (Event Tracing for Windows & Event Tracing for Windows Session) added I noticed that my (Events logged per second - MsMpPsSession7) counter kept spiking at around 100 every second, also randomly the (Events logged per second - Audio) would spike up to 100. As i was monitoring this behavior maybe 5 minutes, all a suddion the (Events logged per second - Audio) spiked up to 100 for 3 seconds back to back and also counter ( Events logged per second - Circular Kernel Context Logger) almost mimicked those audio spikes then BAM computer froze. No blue screen just frozen screen. Same as what's been happening several times a day since i built this mofo. This leads me to believe it's something with the Realtek audio device. I've since disabled it in device manager, but from this here post it's only been about a half an hour of run time.

Although when i look at the performance monitor now (having the realtek audio device disabled) the (Events logged per second - MsMpPsSession7) is now reading at around 10 processes per second every second as opposed to 100 as stated earlier. Also there are no random audio processes spikes (obviously) because the device is disabled.

I'll post again if my computer freezes and if it doesn't I'll post up in the morning to let you know. hope this helps.
 

db17k

Distinguished
Dec 10, 2009
9
0
18,510
Yeah i woke up today and my system was frozen again, so i guess that's not it. I was using the performance monitor that comes with windows 7.

from the Start Menu search bar, type in performance monitor and the program should pop up.

on to more troubleshooting.
 

ekoostik

Distinguished
Sep 9, 2009
1,327
0
19,460

It may be a long shot, but did you ever try with CCC uninstalled?

Also, what BIOS version do you have installed? The latest for your board is 1102 released on 11/27, and Asus indicates at least one of the updates prior "Improves system stability."
http://asus.com/product.aspx?P_ID=j02KziJq95KbCQNm&templete=2
 

brownbat

Distinguished
Oct 23, 2009
52
0
18,640
"It may be a long shot, but did you ever try with CCC uninstalled?"

Not yet, hasn't crashed yet. Based on my previous timeline, could be up to two weeks before I know for sure if the problem is still around. I'll check out the BIOS though, thanks.

*BIOS updated. Fingers crossed. With all the changes, if the problem goes away, I won't really be sure what fixed it.