Random System Failure, very odd please advise

teryglenn1

Distinguished
Jun 27, 2006
4
0
18,510
Hey guys i have a box running cent OS 3.7 linux here are the specs.

Pentium III 550mhz
512 RAM
intel 440bx mobo
Dlink ethernet 10/100

here is the problem randomly usually over a period of a few days i'll come to log into the system and notice the power LED still on however the screen is black and i cannot SSH or do anything but hold the power button to restart it. I know linux is a very stable OS so i am having a hard time believing that its a software issue. I am more inclined to say its hardware related however i have not been able to track it down. I have disabled most of the power management features i am aware of becuase at first i thought that was the issue. Anyways, any help would be much appreciated thank you.
 
It is very hard to isolate problems that happen infrequently.
You can test the power supply or look up the voltages in the Bios or use a Linux program to do it.
I would not advise you to spend much on such an old machine.
 

bmouring

Distinguished
May 6, 2006
1,215
0
19,360
The next time it has this problem I'd suggest looking at the system log to see if you can figure anything out from that (it's lead me to a failing USB controller locking the PCI bus and crashing my system before). Here's the rundown:

After the next freeze,

Reboot

At Grub screen (assuming you use the cent default), hit 'e' to edit the boot parameters (you might need to hit some key to bypass the automatic boot, read the message provided)

Highlight the line that starts with "kernel"and append a "single" to the end of it, hit enter

Back at the grub entry area, hit 'b' to boot the system. It will continue booting without the fancy splash screen and stop at a "sh#" prompt before the system log daemon starts.

Review the tail end of the system messages "tail /var/log/messages" or you may increase the number of lines produced by using "tail -n N /var/log/messages" to show N lines. If it is too much to look at on one screen, you may pipe the output to "less" ("tail -n N /var/log/messages|less") or scroll back up manually (shift+pgup/pgdn)

Post anything that looks suspect or outright errors (or, if you want to fix it yourself, try googling around with those error messages)

If you don't see anything strange here, then something goes wrong and goes wrong quickly, could be one of only a few major components (processor, power supply, memory, possibly a card locking the PCI bus but that's not likely since it will usually complain to the system messages even after locking the bus)

Good luck
 

linux_0

Splendid
Great advice from bmouring as always :)

I would suggest running memtest86 or memtest86+ overnight ( 8 - 12 hours ).

Based on the age of the system I would say the RAM, motherboard and PSU are the prime suspects not necessarily in that order.

It may also be an overheating problem, several of your fans may be bad.

Please see the memtest howto below:

http://forumz.tomshardware.com/software/memtest86-ISO-burning-HOWTO-ftopict230767.html

You can find memtest86+ at:

http://www.memtest.org/


GL :-D