Sign in with
Sign up | Sign in
Your question

Very, very strange problem

Last response: in Systems
Share
May 26, 2007 10:55:53 PM

Hi all, I've had a very strange problem with my computer for a few years now (yeah, I know, probably should have put more effort into fixing it by now). I have scoured the internet for solutions, but simply can't find any. I also consider myself somewhat adept at troubleshooting, but I am at a complete loss. Here is my problem:

My computer, randomly, turns itself off. When it does, it tries to restart itself after a few seconds. I use the word "tries" because it will try to reboot itself even if I have unplugged it (it will come on ever so briefly using the electricity stored in the capacitors). This indicates to me that it is probably trying to reboot, as opposed to shut down, but I could be wrong.

Now, sometimes my computer will run fine for weeks on end. Other times, it will not even make it to the windows start up screen and continually reboot for hours until I unplug it (and take several days of fiddling with it before it is semi-stable again). Generally, when it is in one of its cycles of doom, the power button on my case is completely unresponsive.

It gets wierder. When it is in the worst shape, when I finally bring it back, sometimes the clock multiplier in the bios has changed itself to a lower setting (and it surprisingly gives me an overclocked warning when it does the memtest at the beginning of the boot process).

I think the culprit is most likely the motherboard or the processor. It may also be the RAM, but I highly doubt it (I've done some not completely comprehensive mem tests with no errors). I know for a fact it isn't the PSU (because I replaced it), and something in the back of my mind tells me the power button on the case might be faulty.

I really can't tell, because I have not noticed any patterns related to when it is stable and when it is not. The only one is it is generally more stable when it is cooler, but I'm not even sure if that is entirely true. As for overclocking, I've run with aggresive bios settings in the past, but nothing that really qualifies as overclocking. I have been running with factory settings for a long time though, and that seemed to have improved it a little. What really gets me is why the computer, in general, still works.

Anyway, does anyone have any ideas? This is a real headscratcher if you ask me, but I would be curious if anyone else has ever had this problem. Is there a fix, or a part I should replace?

More about : very strange problem

May 26, 2007 11:20:45 PM

What are your specs?Did you check your temps?
May 26, 2007 11:30:48 PM

When I see this at work I check

1) How dirty CPU heatsink is, then I re-grease the heatsink while I'm there
2) Temporarily hook up a different PSU
3) Swap out CPU if I have a spare lying around
4) Change mobo

I run sse3dnow to check for CPU errors, memtest86+ for RAM issues, and run cpuburn to heat up chip and check for stability under load.
Related resources
May 27, 2007 12:10:57 AM

Specs (probably should have posted these to begin with):

AMD Athlon XP 2600+ running at 2,075 MHz (166MHz FSB x 12.5)
ASUS A7N8X
1GB RAM (higher end stuff with factory 5-2-2-2 timing)
Enermax 485W PSU
NVIDIA GeForce 7800 GS (yeah, didn't feel like getting a new processor and mobo, as well as a new card, to play Oblivion)

I think that's all that is relevant, but feel free to ask for more.

As for the heatsink being dusty, I have one of the early Koolance cases, so it is water cooled.

Temps are currently at (viewing them in the bios):

33*C MoBo
45*C CPU

My Koolance case has a probe that goes in a groove on the bottom of the water block (i.e. it is right in between the processor die and the heat transfer plate on the cooler) and it reads 30*C at the same time.

Voltages are (all set to auto):

1.66V - VCORE
3.26V - +3V
4.94V - +5V
12.22V - +12V

Some other BIOS settings:

FSB Spread Spectrum - 0.5%
ASP Spread Spectrum - Disabled
Graphics Aperture Size - 256MB
AGP Frequency - Auto
System BIOS Cacheable - Disabled
Video RAM Cacheable - Disabled
DDR Reference Voltage - 2.6V
AGP VDDQ Voltage - 1.5V
AGP 8x Support - Enabled
AGP Fast Write Capability - Disabled


If you need anything else, just ask. I may be able to get a processor from work and see if that helps. If not, do you all think I should just get a new MoBo?
May 27, 2007 12:19:58 AM

Quote:
I run sse3dnow to check for CPU errors, memtest86+ for RAM issues, and run cpuburn to heat up chip and check for stability under load.


I am aware of memtest86+, but where can I get the other two? Did a fellow named Roy Longbottom make sse3dnow. That is the site I found after searching, but I wanted to check. The website for cpuburn that I found also doesn't look that great.
May 27, 2007 12:25:36 AM

Unless the processor is overheating or has been severely over volted you can put it at the tail end of the list of things to check, IMO.

I would have guessed PSU but you have a new one already and still you have the problem.

The switching thing is odd, perhaps the power switch is simply shoring itself? You must at least rule this out. Simple way to do this is just unhook it from the board and manually short the pins on the panel to start her up. Try this.

If not that then probably motherboard is cracked or shorting or has a dying componant somewhere.
May 27, 2007 12:36:04 AM

Quote:
I run sse3dnow to check for CPU errors, memtest86+ for RAM issues, and run cpuburn to heat up chip and check for stability under load.


I am aware of memtest86+, but where can I get the other two? Did a fellow named Roy Longbottom make sse3dnow. That is the site I found after searching, but I wanted to check. The website for cpuburn that I found also doesn't look that great.

Yes, Longbottom's program. Its great for checking for cpu errors, especially when OC'ing, but even at stock its useful.

You can use any program you like to run the CPU at 100%. Prime 95, etc. Just the one I happen to use is cpuburn-in

http://users.bigpond.net.au/cpuburn/

Its old, but it works. For dual-core, you just run two instances of it. All you are doing is trying to put a load on your PSU and see if your cooling is adequate.

While you run it, run ASUS pc-probe or a similar program to watch your voltages and temps. You could use a volt-meter to check your PSU while under load as well. These methods aren't totally solid, but its the best you can do if you don't own some very expensive diagnostic equipment.

EDIT: My money is on the mobo at this point. Is there any airflow across the voltage regulators and chipset? The problem with watercooling is other mainboard components rely on the CPU cooler's fan for some air flow. Have a peek and see if you can see any bad caps as well. Pay special attention if you have gold-striped purple caps near the CPU.
May 27, 2007 12:43:51 AM

Quote:
Unless the processor is overheating or has been severely over volted you can put it at the tail end of the list of things to check, IMO.


I would have thought so too. I am running the sse3dnow tests now and so far no errors.

PSU was my first guess as well, which is why I replaced it (about two years ago now - like I said, have had this problem for a LONG time).

Over that time frame, I've given the MoBo several visual inspections and never seen a crack or burst capacitor (see why this is so hard to diagnose, all the obvious tests come up negative).

The inability to find a solution is also why I think it might be something as stupid as the power button. However, I have been too lazy (too busy at work really to have much time to use my computer and therefore maintenance has not been a high priority) to move it to a new case to test. Should I take a voltmeter to the power button or something, of just unplug the front panel connector and short it manually like you say and see if it is stable (I am thinking I should disconnect the reset button as well).
May 27, 2007 12:48:46 AM

Hey mate

Right it sounds like to me you've got a pretty fundamental hardware problem.

First reset all BIOS settings to factory defaults (but make sure everything is switched on, i.e. USB controllers etc).

Take down your memory timings to something like 3-3-3-10 or even 4-4-4-12 so we can eliminate your RAM (seeing as how your timings are so tight), though to really do it, put it in another machine and check all is OK. If that machine fails you know its your RAM.

Try switching out the power supply first, seeing as this is where your problems start (make sure its known good).

If that fails then its most likely your motherboard or your CPU.

But anyway let us know how you get on. :D 

hope that help dude
May 27, 2007 12:51:40 AM

Quote:
Unless the processor is overheating or has been severely over volted you can put it at the tail end of the list of things to check, IMO.

I would have guessed PSU but you have a new one already and still you have the problem.

The switching thing is odd, perhaps the power switch is simply shoring itself? You must at least rule this out. Simple way to do this is just unhook it from the board and manually short the pins on the panel to start her up. Try this.

If not that then probably motherboard is cracked or shorting or has a dying componant somewhere.



Actually, notherdude has got it spot on, i should really read an entire forum before posting shizzen.
May 27, 2007 12:52:52 AM

Am running Longbottom's program now and so far no errors (about half way through). Will try cpuburn next.

As for cooling, I was aware of those issues when I built the system. I have a big fan in the front, two fans in the back, the two fans in the PSU and the fan on the video card. I think I have enough, but I will run pc-probe when I do the cpuburn test next. I also have the northbridge water cooled, even though it originally came with a tiny heat sink.

I will have a look for those purple caps next. As I said in another post, I've given the board several visual inspections (including a quick one a couple nights ago) and never noticed anything amiss.
May 27, 2007 1:07:53 AM

Allright, finished with sse3dnow and got no errors. Will do cpuburn next.

My question is, as these shut downs are not regular at all and testing a solution will therefore take several weeks, what would you like me to try first after the diagnostics are finished?
May 27, 2007 1:48:50 AM

Well I was hoping putting your machine under a load would force the problem to occur.

If it doesn't, and all your tests pass then I would say its the mobo or a switch.

Find an old ATX case from somewhere. Rip out the power switch and reset, then hook them up to your board and run them out of your case.

You could just jump the pins with a screw driver, but if it takes such a long time for the problem to crop up this could end up being a huge pain in the butt.

You won't be able to tell if a switch is bad from a voltmeter if the problem is intermittant. The switch could be working at the moment you test it :) 
May 27, 2007 2:07:11 AM

I've had this problem on one of our PC, it drove us nuts for ages. Would sometimes run stable sometimes not and had the resetting of bios etc.
Err specs of it are Amd 2200 or there abouts and PC Chips motherboard(cheap and cheerful). We had used memtest etc and that showed no errors, so we assumed all was fine there. Tested it with different graphics cards and all possible bios settings but it was still tempremental.
Anyway we was building new PC and so was swapping old hardware about between machines. Turns out that board only likes a couple of sticks, all the other identical ones it will carry on as before being tempremental. All the ram sticks we tried in it were all the same speed, make and size and non show errors in memtest on any of the Pc's.Non of the other Pc's had problems with the ram sticks. So my advice is to try other memory sticks if you can or if you have two in it try justone at a time see if it dosn't like one of the sticks.
May 27, 2007 2:09:15 AM

Ok, so here are the results of the cpuburn test:

Idle (win xp desktop with typical background items running) temps according to pc probe:

CPU - 31*C (my Koolance probe says 28)
MoBo - 37*C

Voltages:

12.224-12.288
4.919-4.945
3.264-3.28
1.664-1.68

1st test: Lasted a few minutes before the system crashed. In that time, I didn't notice any voltage shifts and temperatures increased a degree centigrade max.

2nd test: I kept a closer eye on the voltages this time and noticed that some of them dropped slightly below the idle range almost instantly. All returned to normal rather quickly, except +5V which was a little lower than normal, with a range of 4.892-4.919 for the entire test. The +3.3V infrequently dropped to 3.248 (including at the beginning). This time the test lasted about 20 minutes before I posted this and is still going. Temps again barely increased (peaked at 32 cpu and 39 mobo).

I have zero experience with overclocking: Is it indicative of a major problem when voltages drop slightly like that? Should I try the test again with FSB spread spectrum disabled?

BTW, CPU Burn-in reported no errors the entire time. Also, windows task manager indicated that my CPU usage was at 100% the entire time as well.
May 27, 2007 2:21:56 AM

Update: A couple of the voltages briefly slid a little more (I am still running the second test without crash). The lows:

12.224
4.865
3.248
1.648
May 27, 2007 2:23:50 AM

Quote:
You won't be able to tell if a switch is bad from a voltmeter if the problem is intermittant. The switch could be working at the moment you test it :) 



Good point :oops: 

I'll try the power switch thing.
May 27, 2007 2:27:26 AM

Quote:
So my advice is to try other memory sticks if you can or if you have two in it try justone at a time see if it dosn't like one of the sticks.


Thanks, I will try this too. Although, these shifty voltages have got me worried, so I'm curious what people have to say about that.
May 27, 2007 2:39:11 AM

Your voltages appear to be within normal variance. It is rare for them to be exact. I'm no expert on PSUs but I see variances like that all the time.
May 27, 2007 2:52:12 AM

Alright cool. The 2nd test ended without crashing (I set it to last 1 hour). My MoBo temperature very rarely touched 40*C, but mostly sat at 39*C. My CPU sat at 32*C for most of the test.

So, conclusions:

My cooling seems adequate as the temps barely raised at all.
It doesn't seem like the processor is the problem as none of the tests reported errors (although it did crash once, but again, doesn't seem like the processor caused it).

The 1st and 2nd test illustrate just how random this is. Initially crashes very quickly, then keeps on chugging like a champion.

I'm going to leave the PC Probe recorder on in the mean time and play some Oblivion (which I usually play at almost maxed out settings) to see if that stresses it a lot (more RAM, Northbridge and GPU usage than CPU Burn-in).

So, here are the things I think I need to do next based on everyone's feedback (thanks, by the way!):

Try a different power switch for a couple weeks and see if that works.
Try using the pair of RAM sticks individually and see if that is the problem.

If those don't fix it, is it safe to assume that I need a new MoBo?

Anyway, I'm going to run memtest86+ again just to be sure, than try the power switch thing first. If you think I should proceed differently, please advise.
May 27, 2007 2:57:45 AM

It is really hard to diagnose an intermittant problem like that.Try unplugging and repluging any connections to MB,PSU,reseat ram and CPU and disconnecting and hotwiring the power switch.Before you throw it out the window.
May 27, 2007 3:28:53 AM

Ok, it finally crashed again (was just idle at desktop) and the last recorded temps were:

CPU - 30*C
MoBo - 36*C

Voltages:

12.288
4.919
3.264
1.664

So everything seemed normal...
May 27, 2007 3:44:47 AM

Change the CMOS battery on the board could be a cheap fix
May 27, 2007 3:46:03 AM

Well, thats not right at all. You should be able to run cpuburn-in all night long.

The easiest switch would be the ram. Is that 1gig of RAM consist of more than one stick? If so try one stick at a time and see if it crashes on a particular stick.

If you just have a 1GB stick in there, swap it out. Even if its not 2700 and you have a 2100 stick lying around, try it and see it if crashes.

What BIOS revision are you running?

EDIT: Thanks Belinda. I have seen a few machines were memtest would pass for hours but there was still a RAM issue. Some boards just don't like certain brands, single-sided RAM, mixing different brands, etc.
May 27, 2007 3:47:48 AM

It is two sticks.

Bios Rev 1001.E

EDIT: Actually, PC Probe says Bios Rev 1019 Beta 003 T2, release date 11/27/02 (I seem to remember flashing it not too long ago, so that date has to be wrong at least). It says 1001.E at boot.
May 27, 2007 4:00:15 AM

Quote:
It is two sticks.

Bios Rev 1001.E


Well if your PCB is ver 1.03, 1.04 or 1.06 it is 8 versions out of date.

BIOS flashes 1002, 1003 and 1007 specifically mention fixing stability using certain types of RAM.

1009 is the latest. Just makes damn sure your PCB has vr. 1.03/1.04/1.06 silkscreened on it if you intend to run this flash.

A7N8X and previous listed PCBs ONLY!!! Don't use for Deluxe, -X, -VM, -E Deluxe, etc.

http://dlsvr01.asus.com/pub/ASUS/mb/socka/nforce2/a7n8x/AN8B1009.zip

You will need this utility

http://dlsvr01.asus.com/pub/ASUS/mb/socka/nforce2/a7n8x-deluxe/awdflash.zip

Here's an image to make a boot disk for flashing

http://www.ts.nu/Files/drdflash.exe

Extract this image to a floppy, then extract the awdflash utility and the BIOS image to the disk.

Set your bios to boot to floppy first and you're ready to rock. (Don't reboot, shutdown, unplug while you flash or you'll have a dead mobo...)

EDIT:

After flashing, go into the BIOS and load the setup defaults. Save and reboot. Go into the BIOS again, and then make any changes you need to.
May 27, 2007 4:16:53 AM

Ok, by board is PCB revision 1.04 (I knew that window was stylish AND functional). Here is what I have installed:

Version 1001E 2003/01/13 update

Description A7N8X BIOS 1001E for PCB revision 1.03, 1.04, and 1.06 only.
Improve memory stability

Here are my options for more recent BIOSes:

Beta Version 1009 2005/09/27 update

Description A7N8X BIOS 1009 for PCB revision 1.03, 1.04, and 1.06 only.
Support AMD Sempron CPU
Patch 3D Labs AGP card compability issue

or:

Version 1007 2003/10/09 update

Description A7N8X BIOS 1007 for PCB revision 1.03, 1.04, and 1.06 only.
Improve system stability with Hynix and PSC memory modules.

I am thinking I go with 1007 since I don't have a Sempron or 3D Labs card and that way I avoid the Beta. Thoughts?

EDIT: Thanks VIC20, looks like you did the research as well. Should I go with the most recent, Beta Bios or the one before it?
May 27, 2007 5:14:39 AM

Try 1007 first. That could solve your issue right there. If it doesn't, then try the beta.
a b B Homebuilt system
May 27, 2007 5:50:45 AM

Quote:
Change the CMOS battery on the board could be a cheap fix


I was thinking that exactly myself. If your BIOS settings get reset after crash than maybe the CMOS battery is borderline.

My 2cp
May 27, 2007 7:14:42 AM

In my opinion, the only way to fix an intermittent problem like this is to systematically swap out/interchange parts of the system to establish which parts of your system are *good*. Instead of poking around looking for a needle in a haystack, you have to first establish roughly which part of the haystack the problem resides in. You are going to need a friend's computer to do this.

One thing noone has mentioned yet, that you should test as well is the power in your house. Voltage spikes caused by faulty appliances/wiring in your house could affect your system. Bring it over to a friend's house and try it there.
May 27, 2007 9:29:29 AM

Okay guys! Is it normal for the CPU to be running at 100%? It dosen't seem normal to me. What would produce this situation? Or was it just because of the test being run?
May 27, 2007 3:09:06 PM

Quote:
Try 1007 first. That could solve your issue right there. If it doesn't, then try the beta.


Alright, I am going to try this course of action (stopping when the problem does) and will report back on my results:

Flash to 1007 (and install the latest drivers for good measure).
Flash to 1009.
Try a new power switch.
Try one stick of ram at a time.

Here's hoping it doesn't crash during a flash!

If none of that works, I'll swap out the CMOS battery (this isn't high on my list as the clock multiplier getting changed - and it is only that one setting which changes - happens very rarely and only after many, many consecutive crashes).


Quote:
One thing noone has mentioned yet, that you should test as well is the power in your house. Voltage spikes caused by faulty appliances/wiring in your house could affect your system. Bring it over to a friend's house and try it there.


I have a surge protector which should cover me on the high side at least. I don't think this is it though, as I have moved a total of 6 times since this started (and it happened in all 6 places I have lived).

As for isolating what the problem part is, I think I've pretty much got it down to the MoBo or RAM (or a combination of the two) or potentially the power switch for now. After seeing all those BIOS revisions talking specifically about stability issues with RAM, I am starting to lean in that direction.


Quote:
Okay guys! Is it normal for the CPU to be running at 100%? It dosen't seem normal to me. What would produce this situation? Or was it just because of the test being run?


Yeah, it was the test. CPU Burn-in puts max load on your processor. I think that test helped me establish that the processor and cooling set up are fine.
May 27, 2007 5:14:30 PM

Ok, so I flashed to 1007 and now my FSB won't run at anything but 100 MHz (no matter what I set it to in the BIOS). I ran memtest86+ and I got no errors. I think this is indicative of the MoBo and RAM not getting along. Thoughts?

BTW, I feel like a douche for not flashing the BIOS before coming here, so I apologize for that. I could have sworn that I had already done this, and maybe I did and ran into the same problem and flashed back, but I don't remember...
May 27, 2007 6:52:26 PM

It crashed while I was editing a BMP in MS Paint of all things, so I went ahead and flashed to 1009 Beta, which seems to have fixed the FSB issue. Fingers crossed it doesn't crash.
May 27, 2007 7:37:19 PM

Well, the BIOS updates hasn't seem to have fixed it as it is still crashing.

I'll try using single sticks of RAM now.
May 27, 2007 9:06:15 PM

Quote:

Flash to 1007 (and install the latest drivers for good measure).
Flash to 1009.
Try a new power switch.
Try one stick of ram at a time.

Here's hoping it doesn't crash during a flash!

If none of that works, I'll swap out the CMOS battery (this isn't high on my list as the clock multiplier getting changed - and it is only that one setting which changes - happens very rarely and only after many, many consecutive crashes).

One thing noone has mentioned yet, that you should test as well is the power in your house. Voltage spikes caused by faulty appliances/wiring in your house could affect your system. Bring it over to a friend's house and try it there.


I have a surge protector which should cover me on the high side at least. I don't think this is it though, as I have moved a total of 6 times since this started (and it happened in all 6 places I have lived).

As for isolating what the problem part is, I think I've pretty much got it down to the MoBo or RAM (or a combination of the two) or potentially the power switch for now. After seeing all those BIOS revisions talking specifically about stability issues with RAM, I am starting to lean in that direction.


surge protectors only keep your system from being fried by a dangerous event. they do not filter the power at all. they usually have little more than a $0.05 varactor or a circuit breaker which smokes open during a huge surge. they must be replaced/reset after each event. however, since you've moved to different houses a lot, this is a good sign -- you are right. wall power problems seem unlikely. the last thing to do here is to make sure it's not the surge protector itself that is the problem. to test this, try a different surge protector, or plug your computer directly into the wall.

you mentioned in your original post that the problem has been going on "for a few years", yet you do not mention whether or not the computer ever worked properly. and if it did, whether you changed RAM/BIOS since then. if it did work properly initially, and RAM/BIOS did not change before the problem arrived, then i think it is very unlikely that updating the BIOS will fix it. to establish that it does indeed have something to do with RAM i think you need to put other sticks in there, or put your sticks into someone else's mobo.

the power switch idea is a good one, but as per the initial suggestion, you do not need a new power switch to test whether or not your old one is shorting. you just have to unplug your current switch from the motherboard. front-panel power switches are momentary-on switches so you can leave your case open and short the header briefly (the two motherboard pins that your switch plugs into) manually with a metal object to start the system (be real careful, obviously...). some power supplies also have on/off switches, but it is easier to swap out the whole power supply to test that.

if that doesn't work, then you have to start swapping stuff. if there is a cracked trace or a short, it could be anywhere in your system (Ram card, PCI card, moboard, drives, case-touching-moboard, etc.) and this is the only way to find it. if some component has degraded and no longer meets spec, this is also the best way to isolate it. in extremely rare cases it could also be a malfunctioning peripheral (monitor, printer, keyboard, mouse) that momentarily causes a power problem. to test this, swap these out, or run your computer without them.
May 27, 2007 9:47:40 PM

aside from the suggested fixes, i'd just like to note that your AGP aperture size is 256mb. I read somewhere that the optimum is 64 or 128mb... and performance decreases if increased further. It's something about the video card being allowed to access RAM.

I know when I increased my aperture size before, it made my computer slower and act weird sometimes. But maybe if you already reset and flashed your BIOS, this might not be the problem.
May 27, 2007 10:00:43 PM

Quote:
the last thing to do here is to make sure it's not the surge protector itself that is the problem. have you always used the same one?


Yes, I have used the same one and it is pretty old at this point. If you think it is worth it, I will plug the computer directly into the wall or get a new one. I don't think that is the problem due to all the moves though and it sounds like you agree.


Quote:
you mentioned in your original post that the problem has been going on "for a few years", yet you do not mention whether or not the computer ever worked properly. and if it did, whether you changed RAM/BIOS since then.


Yes, it did work properly initially. Unfortunately, I can't remember exactly when it started any more. I have not changed RAM since I built the system.


Quote:
if there is a cracked trace or a short, it could be anywhere in your system (Ram card, PCI card, moboard, drives, case-touching-moboard, etc.) and this is the only way to find it. if some component has degraded and no longer meets spec, this is also the best way to isolate it.


I am under the impression that if there is a short or cracked trace, it shouldn't boot at all, right? Or, if it might, that it shouldn't run for very long. The reason I have ruled this kind of stuff out is, at times, it will run for weeks without crashing, and for months crashing very infrequently. Other times, it will continuously crash after a very short period, and the only way to fix it is to unplug it and try again the next day.

For instance, when I flashed to 1009, it started crashing consistently shortly after boot. I turned it off and went and ran some errands. When I came back, I turned on a rugby match which I have been watching now for 2 hours without a crash. It will change character that quickly. Is that consistent with a short?

Sometimes I wonder if all the moving hasn't caused it, which is why I think it might be the power switch, but I obviously have no idea. I'll try those things and get back to you all.
May 27, 2007 10:06:58 PM

Quote:
i'd just like to note that your AGP aperture size is 256mb. I read somewhere that the optimum is 64 or 128mb


I thought you were supposed to make it the same size as the RAM on the card. Should I change this?
May 27, 2007 10:27:09 PM

testing the surge protector is easy to do, so it's worth it. just plug your computer into the wall. also test your peripherals by removing them/swapping them out.

a crack in a trace could, but would not necessarily prevent the system from booting, which is why it's the hardest kind of intermittent problem to track down. a crack could cause exactly the behaviour you describe. metal traces on boards (moboard, graphics card, RAM card, hard drive electronics, basically every part in your system) are stuck to the motherboard plastic/bakelite-type material. if the board is bent or there is a large temperature gradient (large change in temperature over a short distance) the metal can crack resulting in an intermittent "open" -- severing the electrical connection the trace was meant to make (opposite of a "short"), just like a switch. as the temperature of the board changes during normal operation of your computer, the different materials in the board expand/contract at different rates and can cause the cracked trace to open/short back together intermittently. these cracks are really small and usually impossible to see even when on the surface. motherboards have several layers of traces (piled vertically inside the board) so unless it's a huge crack from some physical trauma, there is zero chance of seeing those. another type of crack is caused by a cold solder joint which is basically a manufacturing defect where they do not heat the solder enough.

the components that plug into boards (chips, power transistors, capacitors) can also age/fail with no externally visible signs. so the best way to locate these things is by swapping stuff. i am not saying you have a crack, or a bad component. i am just saying they are consistent with your observed problem.

it just occurred to me that a safer way of testing your front-panel power switch is to open your case, turn on your computer like normal using the switch, and then reach in and unplug the switch from the motherboard. if your system is stable this way, replace the switch.

Quote:
the last thing to do here is to make sure it's not the surge protector itself that is the problem. have you always used the same one?


Yes, I have used the same one and it is pretty old at this point. If you think it is worth it, I will plug the computer directly into the wall or get a new one. I don't think that is the problem due to all the moves though and it sounds like you agree.


Quote:
you mentioned in your original post that the problem has been going on "for a few years", yet you do not mention whether or not the computer ever worked properly. and if it did, whether you changed RAM/BIOS since then.


Yes, it did work properly initially. Unfortunately, I can't remember exactly when it started any more. I have not changed RAM since I built the system.


Quote:
if there is a cracked trace or a short, it could be anywhere in your system (Ram card, PCI card, moboard, drives, case-touching-moboard, etc.) and this is the only way to find it. if some component has degraded and no longer meets spec, this is also the best way to isolate it.


I am under the impression that if there is a short or cracked trace, it shouldn't boot at all, right? Or, if it might, that it shouldn't run for very long. The reason I have ruled this kind of stuff out is, at times, it will run for weeks without crashing, and for months crashing very infrequently. Other times, it will continuously crash after a very short period, and the only way to fix it is to unplug it and try again the next day.

For instance, when I flashed to 1009, it started crashing consistently shortly after boot. I turned it off and went and ran some errands. When I came back, I turned on a rugby match which I have been watching now for 2 hours without a crash. It will change character that quickly. Is that consistent with a short?

Sometimes I wonder if all the moving hasn't caused it, which is why I think it might be the power switch, but I obviously have no idea. I'll try those things and get back to you all.
May 28, 2007 12:58:21 AM

It just occured to me that it probably isn't the power switch, but it may be the reset switch. The reason is I have the power set to soft off after 4 secs of depression. Maybe the power switch is shorting for that long, but, when I am in Windows, it will then start a normal shut down instead of just turning off.

Anyway, I'm going to play around with the RAM some more since I think the MoBo - RAM relationship seems to be at the top of the list. Just to make sure, I'm going to unplug the reset switch and plug the computer directly into the wall.
!