Sign in with
Sign up | Sign in
Your question

At a complete loss.

Last response: in Systems
Share
September 22, 2009 11:20:11 PM

I am experiencing system crashes, seemingly at random (though no doubt not!). These occur in windows while idling, while booting or while playing (some) games. They occur in ubuntu more rarely, and have thus far only occurred when resizing video windows.

The crashes manifest in two different ways. If in windows, the screen will completely freeze with the numlock key no longer changing the light etc. If in a game, the screen becomes "blocky" either with two different shades of a colour or using the image that was last on the screen (will try to get a picture of this next time it happens). If in a game, the sound will play for a short period of time, and if using voice chat then it is still possible to send and receive information for a few seconds.

These crashes have occurred on : ubuntu 9.04, ubuntu 8.10, windows vista 64, windows 7 64 bit (RC).

The nature of the crashes lead me to think graphics card - so numerous versions of drivers were tried, all making no difference. I then changed the graphics card (ATI Radeon 4870) for an older Nvidia card. Same problem.

I then thought : motherboard. So I sent my motherboard back to the supplier, who tested it and found that there was indeed a fault with it. As this board was no longer in stock with them (ASUS P5kc) they sent me a different model (ASUS P5Q - E).

Now, I received this new board earlier today. After rebuilding the system, what do you know - I get the same crashes! I don't have the resources to test each individual component with a known working rig, and I am at my wits end! Since the crashes on this new motherboard, I have run prime95 for an hour, to try and induce a crash. CPU temperature (specs below) gets up to around 70 degrees (looking at speedfan), this did not cause a crash (I know these seem a little high, but I read somewhere that some value for my processor (QC6700) were wrong, so it lists as slightly higher than it is) . Will run 3dmark 06 after I post this, but more out of frustration than any hope it will do anything.

I have also been looking at the windows event viewer logs - and there are several errors which occur. However, these don't seem to coincide with a crash but they are:

Event filter with query "SELECT * FROM __InstanceModificationEvent WITHIN 60 WHERE TargetInstance ISA "Win32_Processor" AND TargetInstance.LoadPercentage > 99" could not be reactivated in namespace "//./root/CIMV2" because of error 0x80041003. Events cannot be delivered through this filter until the problem is corrected.

I have googled this, and it doesn't seem relevant.


1 x Sony DRU-190S 20X DVD±RW DL & DVD-RAM Serial ATA - Retail Multi Bezel & Nero

1 x Corsair 650W TX Series PSU - 120mm Fan, 80+% Efficiency, Single +12V Rail 135514

1 x Samsung SpinPoint F1 HD103UJ 1TB Hard Drive SATAII *32MB Cache* - OEM

1 x NZXT Tempest Steel Mid Tower Case - No PSU

1 x Sapphire HD 4870 512MB GDDR5 Dual DVI HDTV Out PCI-E Graphics Card

1 x Corsair 4GB (2x2GB) DDR2 800MHz/PC2-6400 XMS2 DHX Memory Kit Non-ECC Unbuffered CL5

1 x Intel Core 2 Quad Q6700 2.66GHz Socket 775 1066MHz 8MB (2x4MB(4MB per core pair)) L2 Cache OEM Processor

1 x Black Multimedia Keyboard - USB/PS2 Connection - UK Layout

1 x OCZ Vanquisher CPU Cooler For Sockets AMD 754/939/AM2, Intel LGA775

1 x Samsung Aqua SM2253LW 22" TFT Monitor 1680x1050 1000:1 300CD/M2 5ms DVI widescreen Gloss Black

1 x Asus P5Q-E P45 Socket 775 8 channel audio ATX Motherboard




Any help would be much appreciated - I simply don't know what to do anymore.



Tl;dr

I am crap at computers.

More about : complete loss

September 23, 2009 12:06:04 AM

Power issue? Try disconnecting the non essentials and try to induce a crash see what happens, also run a memtest for your ram.
a c 113 B Homebuilt system
September 23, 2009 12:24:59 AM

Your data is wrong about Q6700 temps. 70C is HOT and unsafe.

Here is the reference
http://www.tomshardware.com/forum/221745-29-sticky-core...

Also, use the latest version of Coretemp or Realtemp.

1. remove the OCZ vanquisher from the CPU.
2. Completely clean both surfaces with alcohol.
3. Apply a single drop of thermal paste to the center of the CPU. About the size of a cooked grain of rice.
4. Re-attach the cooler, making sure that all four push-pins are completely engaged.
Related resources
a b B Homebuilt system
September 23, 2009 1:28:10 AM

^+1 This does sound like a heat issue. Follow Prox's steps including choice of temp monitor.

If it ain't heat, next step is to swap out the psu.
September 23, 2009 1:51:20 AM

The heat from the cpu can be affecting the board as well. I recently had one fail, the heat from the cpu killed part of the power stage after two years of use. Symptoms, simple app crashes like firefox suddenly closes with crash reporter coming up and the dreaded blue screens. By the way are you overclocking your cpu? What are the typical room and case temps?
September 23, 2009 7:30:42 AM

Thanks for the replies, guys!

Quote:
By the way are you overclocking your cpu? What are the typical room and case temps?



Not overclocking the CPU. Typical room temps around 25 degrees. Not sure which temp corresponds to the case temp, but at the moment (in speed fan) System is 35 degrees, CPU is 19 degrees, and AUX is 24 degrees.

Quote:
Your data is wrong about Q6700 temps. 70C is HOT and unsafe.

Here is the reference
http://www.tomshardware.com/forum/ [...] ture-guide

Also, use the latest version of Coretemp or Realtemp.

1. remove the OCZ vanquisher from the CPU.
2. Completely clean both surfaces with alcohol.
3. Apply a single drop of thermal paste to the center of the CPU. About the size of a cooked grain of rice.
4. Re-attach the cooler, making sure that all four push-pins are completely engaged.


As I recently got the board, I only recently applied the thermal paste. Prior to doing so, I cleaned both surfaces using alcohol. While applying the paste (for the first time! Originally the CPU had some pre applied) I read that it was best to put a small amount on the CPU, spread it evenly using a card and then attach the heat sink. Is this incorrect? I will give this a go and report back, however.

Also - if this were a heat issue, why would I crash rather than get an automatic shut-down from the BIOS?



Quote:
Power issue? Try disconnecting the non essentials and try to induce a crash see what happens, also run a memtest for your ram.


I forgot to mention that I have run memtest86+ overnight, and it found no problems. (I have also tried running each stick separately). I will give the power idea ago, but unfortunately the only non-essential component I am running is my wireless card, which is not in use!


Thanks very much for the replies.

a b B Homebuilt system
September 23, 2009 7:37:04 AM

You either missed it or ignoring this:

Please use CPU-Z, GPU-Z, or CPUID Hardware Monitor to report temps here. We do not rely on Speedfan.

I hope the cpu did not come with paste on it; the stock heat sink had the paste.

Heat can make the system unstable before the cpu's destructive threshold is reached.

Memtest doesn't catch all memory errors, but if you can run Prime95 for a couple hours on all threads (with "Detect rounding errors checked") without melting down or failing, you're good to go. But do not do this unattended until you are sure heat isn't an issue, ie, watch it for an hour.

Download source list for above programs:
http://forums.tweaktown.com/f69/latest-overclocking-pro...
September 23, 2009 7:45:43 AM

Quote:
I hope the cpu did not come with paste on it; the stock heat sink had the paste.


You are quite correct, my mistake.


I did miss that Speedfan was not used. Installing CPU-Z now.


I will run prime95 after breakfast (after re applying the thermal paste/heatsink).
a b B Homebuilt system
September 23, 2009 7:48:40 AM

Ok. We're obviously a few time zones apart - almost bed time for me, 4AM EDT.
September 23, 2009 7:50:07 AM

Ah, yep 8:50 BST.
a b B Homebuilt system
September 23, 2009 7:59:52 AM

Sorry for this late PS: CPUID Hardware Monitor is probably better - also tracks gpu temps which may be your problem, and I believe it has a log function which is helpful for seeing what happened just before the crash.
September 23, 2009 10:18:27 AM

Looking at CPUID Hardware Monitor, which temperatures should I be mainly concerned by? There are the temps of the individual cores, as well as the CPUTIN. I don't know which to keep an eye on!


Thanks.
September 23, 2009 10:49:15 AM

Ok, sorry for the double post.


(This is all after re applying thermal paste / heatsink etc)

I installed real temp, and using the Tj max values from http://forums.ebuyer.com/showthread.php?t=33001 (tjmax for my CPU is 90 degrees).

After running Prime 95 for an hour, I get a maximum core temp of 64 degrees. I have not yet had a crash - will try booting a game that usually causes one now.
a c 113 B Homebuilt system
September 23, 2009 12:02:07 PM

That is promising, although 64C should not happen on a stock speed Q6700 with a better than stock cooler.... in a NZXT Tempest case.... in the UK. Now, if the ambient temp, in the room, was about 40C, that would be the right temp.

I would expect a max core temp around 52C.
September 23, 2009 12:15:22 PM

Well, 24 minutes into a game I had a crash. Tried to update the BIOS, but the ASUS website is not helpful. Downloaded the bios updater utility for vista 64, but it doesn't recognise my OS. Any suggestions?

I am now running a game in windowed mode, hopefully something new and helpful will happen (51 degrees on the hottest core at the moment).

I also had a thought. I realised on my windows installs, I hadn't completely wiped the driver beforehand - could this make a difference? I am mostly just grasping at straws now.
a c 113 B Homebuilt system
September 23, 2009 12:30:34 PM

Are you saying you did not use a fresh install when you installed the board? You used a HD with Windows already installed on it?

If that is the case, that can cause all sorts of issues.
September 23, 2009 12:37:16 PM

No, I mean that when you install windows (ie boot from the CD to install) there are old files etc which are still there from your previous install (in a windows.old folder). If I used gpartedit or something to format the partition beforehand, they wouldn't be there.

Ps I took a photo of the screen after a crash. http://img5.imageshack.us/i/photohy.jpg/ Sound is looping when this photo was taken.

a c 113 B Homebuilt system
September 23, 2009 1:23:32 PM

Yes, the partition should be reformatted before installing, but that doesn't sound like the issue.

What BIOS version are you using? You can update from a DOS disk, the preferred method really, but should not unless you see some update that addresses your issue.

Those artifacts on your screen could be heat related. What does Catalyst control center say about your card? If you enable ATI Overdrive you can lock down the fan speed to something higher, see if that helps.

You should certainly wipe your ATI drivers and install the latest. I myself am using 9.2 as that was happy for me and I haven't needed to change.
September 23, 2009 1:33:25 PM

I worked out how to install the BIOS using a flash drive. I am now on version 2101 (the latest).

I currently got version 9.9 of the Catalyst Control Centre running - this is the only version I have ever installed.

Have enabled overdrive and put the fan on to 50%, sounds like some sort of aircraft taking off. However, temps are down to around 46 degrees now. Will try a game again, and see if it changes anything.

edit: With the fan on 50%, cashed after about 3 minutes. Damn.
a c 113 B Homebuilt system
September 23, 2009 2:11:21 PM

Yeah it was a long shot. I have my fan at 32%. 70-80C is fine for the GPU, although mine doesn't break 60C.

Did CPU temps improve any?



September 23, 2009 3:00:43 PM

Nope. Still around 64 at the hottest for Prime95. Just re checked the ram (running each stick separately etc), nothing. I really don't know what to do.
September 23, 2009 3:17:39 PM

Yup, ran it overnight a while back (before the new motherboard). Forgot to mention it. It came back with no errors, in any case.
a c 113 B Homebuilt system
September 23, 2009 3:21:01 PM

It sounds like some hardware failure, but I can't see it right now. I'll come back to it later.

Could be something as simple as a damaged SATA cable or loose fan connector.
September 23, 2009 3:24:39 PM

Ok, well thanks very much for your efforts. I'll keep thinking.
a c 113 B Homebuilt system
September 23, 2009 7:16:55 PM

I have reviewed everything and I'm fresh.

I keep getting led to the PSU but then I look at your PSU and it's one of the best... this is odd. Your PSU might be defective.

Do you have a digital camera handy? I would like to see a bunch of pics of your BIOS... every single portion, if you have to scroll down anywhere then a second pic after scrolling. We have the same board and I know the BIOS well.

If you want to do this, shoot me a PM and I'll give you a way to send them to me. I have quite a bit of online storage space and can host them so they can be posted in this thread if needed.

Here is another handy tool... Belarc advisor:
http://www.belarc.com/free_download.html
September 25, 2009 12:01:01 PM

Update:

Installed a new heat sink for CPU. This is definitely not the issue, I am now reaching maximum temperatures of 45 degrees.

I did notice that my systin temperature reading from CPUID Hardware Monitor is idling at around 40, going up to maybe 41. Am I correct in thinking that this corresponds to the NorthBridge? If so, is it too hot?

Found the following threads by someone who had a similar problem. He did not post a solution however, but as the threads are around a year old I assume he fixed it:

http://www.tomshardware.co.uk/forum/page-253156_12_0.ht...

http://forums.mushkin.com/phpbb2/viewtopic.php?f=3&t=12...

Any more thoughts?
a c 113 B Homebuilt system
September 25, 2009 9:50:39 PM

No, the NB always runs a bit hotter.

Ok let me see here....

Under AI tweaker in the BIOS, your DRAM frquency and voltage are both set to Auto. Set these to manual please.

If you now have the option to set the frequency to 533, do so.

Set the voltage to 1.9V... this is a bit higher than default and will rule out any issues there.

Do you need Express Gate? Removing that would at least simplify the system and remove one thing that could cause issues.

Pictures in both posts are identical, so it's the same guy...

Artifacts in video cards usually look more like this:
Http://www.gwprox.com/spiders.jpg
a b B Homebuilt system
September 26, 2009 12:20:20 AM

Aritfacts? Hey, prox, the only unusual graphic in that ss is the guy in the robe carrying a two-hander.
a c 113 B Homebuilt system
September 26, 2009 6:31:45 AM

They're a bit hard to spot :)  Look at the spiders. Those spikes coming out of their backs stretch to infinity. That was AOC beta btw, and there is a sword wielding mage class called... uh... I forget now. Fire mage with sword.
That was the day the old 7900GT gave out.
I had intended to post more shots, but ran out of time right then and had to cut it off.

http://i16.tinypic.com/2qlu8ma.jpg

I'm glad you are in on this twoboxer. Frh sent me his BIOS pics... Let me ftp them to my webspace and I'll put them up. I saw nothing serious but have a look.
a c 113 B Homebuilt system
September 26, 2009 6:45:03 AM


















a b B Homebuilt system
September 26, 2009 8:36:30 AM

Guys, I don't see much - but I guess if the problem were mine, I would try:

Ai Clock Twister - Disable
CPU Margin Enhancement - Compatible
Suspend Mode - S1 (POS) Only

My sole rationale is that something is happening over time or after time, but not at a specifically identifiable time. So let's stop BIOS from doing anything to change operating parameters that might cause a freeze. Also, don't go into S3, same sort of rationale.

I might also temporarily change "AddOn ROM Display Mode from "Force BIOS" option to "Keep Current". Again, an effort to stop anything from changing without my knowlegde.

Weak, but its all I got.
September 26, 2009 10:10:31 AM

PROBLEM SOLVED:


Ok guys, thank you so much for all the help. However, we were barking completely up the wrong tree.

Turns out that the specific graphics card which I purchased (Sapphire HD 4870 512MB GDDR5 Dual DVI HDTV Out PCI-E Graphics Card ) with the P/N:188-01E85-001SA SKU#11133- 00 has a BIOS error. Many people were getting problems with this specific card. The solution is located on the Sapphire website: go to downloads, enter your graphics card details and download the new BIOS. It is then a matter of flashing the BIOS, and all will be well!


I am so happy that this is solved!

Thanks again for all your help.
a c 113 B Homebuilt system
September 26, 2009 8:28:48 PM

Graphics card BIOS, makes a lot of sense. I'm glad, and have a new thing to look for as well. I'll have to check mine now ;)  not that I have any issues.
a b B Homebuilt system
September 26, 2009 9:36:05 PM

Maybe there is a graphics card bios problem, but to have the same system symptoms with *two* graphics cards?

"I then changed the graphics card (ATI Radeon 4870) for an older Nvidia card. Same problem."

Once we get a report like this, its hard to suspect the graphics card as the problem. And if that report was correct, it still is lol.
September 26, 2009 9:38:53 PM

Yup. However, that was occurring on a different motherboard - it must have been some wildly unlikely coincidence.
a b B Homebuilt system
September 27, 2009 1:29:24 AM

We shall see, eh?
!