difficult hardware diagnosis

frank

Distinguished
Dec 31, 2007
1,588
0
19,780
Archived from groups: alt.comp.hardware.pc-homebuilt (More info?)

Winxp Home. Updated except for the buggy KB835732.
Gigabyte mobo GA-8S650GXM (socket 478).
Celeron 2400
DDR 2100 - 128mb.

Experiencing frequent (several times a day) hardware crashes and automatic
reboots. I have turned off the control panel option (system>advanced) to
automatically reboot after a crash, but it still does so anyway.

Machine_Check_Exception.
There is a blue screen error called Machine_Check_Exception, just prior to
the crash & reboot, but it's impossible to read it as it's only on for an
instant. Definitely not a Windows screen. I believe it's an Intel message
from the cpu diagnosing a hardware error (perhaps cpu or mobo).

There is no pattern to the crashes and no relation to what programs may be
running - sometimes it happens when the machine is standing idle.

On reboot the following is sometimes displayed (not everytime) - you have
recovered from a serious error:
BC Code:9c BCP1:00000000 BCP2: 8005366FO BCP3:CC0000FF BCP4:20040189
OSVER:5_1_2600 SP:1_0 Product 768_1


Memtest86.
I ran Memtest86 as I suspected a faulty ram module (128 & 256 installed). I
replaced the 256 module that was probably faulty but the crashes continued.
So I removed the new module and am only running the original 128 module
which tests good. Crashes continue.

PSU.
I suspected the PSU - Verified voltages - all seem ok (within 5%)- checked
the 3.3, 5 & 12v rails on the main mobo connector with a autosensing digital
multimeter - all seem ok,

Fans.
All fans are working (cpu, case and psu).

MBM5 (MotherboardMonitor).
The MBM temperature results are different from the Bios readings.
case 141F/61C
cpu 30F/-1C
sensor3 32F/OC
core0 1.6v
core1 .00v
+3.3 3.39v
+5 5.00v
-12v -12.27v
-5 -4.89v
fan1 5625rpm
fan2 33750rpm
fan3 16875rpm
cpu 2424mhz
cpu0 0%

Bios:
system temp=32C/89F
cpu temp=fluctuates from 39C/100F to 41C/105F
cpu fan=3125rpm
system fan2766rpm
vcore=1.58v
+3.3=3.39v
+5=5.02v
+12=11.97v

Event Viewer.
I looked at the Event Viewer errors and am continually getting STOP
0x0000009c errors, which point to hardware problems.

Dumpchk.
I used Dumpchk.exe to analyse Minidump files (created by XP in the
windows>minidump folder) and have copy/pasted one below as an example, and
to see if it offers any clues:

C:\WINDOWS>dumpchk minidump\mini042704-02.dmp
Loading dump file minidump\mini042704-02.dmp
----- 32 bit Kernel Mini Dump Analysis

DUMP_HEADER32:
MajorVersion 0000000f
MinorVersion 00000a28
DirectoryTableBase 00039000
PfnDataBase 81053000
PsLoadedModuleList 8054be30
PsActiveProcessHead 8054de78
MachineImageType 0000014c
NumberProcessors 00000001
BugCheckCode 0000009c
BugCheckParameter1 00000000
BugCheckParameter2 8053f0f0
BugCheckParameter3 cc0000ff
BugCheckParameter4 20040189
PaeEnabled 00000000
KdDebuggerDataBlock 8053dde0
MiniDumpFields 00000dff

TRIAGE_DUMP32:
ServicePackBuild 00000100
SizeOfDump 00010000
ValidOffset 0000fffc
ContextOffset 00000320
ExceptionOffset 000007d0
MmOffset 00001068
UnloadedDriversOffset 000010a0
PrcbOffset 00001878
ProcessOffset 000024c8
ThreadOffset 00002720
CallStackOffset 00002978
SizeOfCallStack 00003000
DriverListOffset 00005c08
DriverCount 000000a3
StringPoolOffset 00008c70
StringPoolSize 00001680
BrokenDriverOffset 00000000
TriageOptions 00000041
TopOfStack f2ebd000
DebuggerDataOffset 00005978
DebuggerDataSize 00000290
DataBlocksOffset 0000a2f0
DataBlocksCount 00000003


Windows XP Kernel Version 2600 (Service Pack 1) UP Free x86 compatible
Kernel base = 0x804d4000 PsLoadedModuleList = 0x8054be30
Debug session time: Tue Apr 27 12:57:06 2004
System Uptime: 0 days 0:05:41
start end module name
804d4000 806c6980 nt Checksum: 0020230B Timestamp: Thu Aug 29
05:
03:24 2002 (3D6DE35C)

Unloaded modules:
f309f000 f30af000 NAVENG.Sys Timestamp: unavailable (00000000)
f2dff000 f2e90000 NavEx15.Sys Timestamp: unavailable (00000000)
f2ea0000 f2eb0000 NAVENG.Sys Timestamp: unavailable (00000000)
f2dff000 f2e90000 NavEx15.Sys Timestamp: unavailable (00000000)
f78c8000 f78d8000 NAVENG.Sys Timestamp: unavailable (00000000)
f4238000 f42c9000 NavEx15.Sys Timestamp: unavailable (00000000)
f2f30000 f2f57000 kmixer.sys Timestamp: unavailable (00000000)
f38c1000 f38e8000 kmixer.sys Timestamp: unavailable (00000000)
f7de7000 f7de8000 drmkaud.sys Timestamp: unavailable (00000000)
f3abe000 f3acb000 DMusic.sys Timestamp: unavailable (00000000)
f7cb2000 f7cb4000 splitter.sys Timestamp: unavailable (00000000)
f3ace000 f3adc000 swmidi.sys Timestamp: unavailable (00000000)
f3923000 f3946000 aec.sys Timestamp: unavailable (00000000)
f7b80000 f7b85000 Cdaudio.SYS Timestamp: unavailable (00000000)
f757c000 f757f000 Sfloppy.SYS Timestamp: unavailable (00000000)
end.

Prime95
I ran Prime95 to put load on the system and it failed the torture test. But
no clues as to why - not necessarily cpu ?
Readout - Beginning a continuous self test to check computer.
Test1, 4000 Lucas-Lehmer iterations of M19922945 using 1024k FFT length.
FATAL ERROR:Writing to temp file.
Error opening results file to output this message:
Unable to open log file.
Torture Test ran 0 minutes_1 error.0 warnings.
Execution halted.

CPU Stability Test ver.6
I ran the Normal test mode, and it lasted about 9 minutes before crashing.
No telling if it was because of strain on the cpu as the machine crashes
like that anyway, even when not under a load.

So, definitely a hardware problem :) But how can I be specific and sure?
Would a Post diagnostic card tell me if it's the cpu or mobo or ? I dont
have a spare cpu or mobo to swop with known-good parts.
 
G

Guest

Guest
Archived from groups: alt.comp.hardware.pc-homebuilt (More info?)

"Frank" <anyone_1@anywhere_1.com> wrote in message
news:V5qlc.17446$3Q4.275809@news20.bellglobal.com...
> Winxp Home. Updated except for the buggy KB835732.
> Gigabyte mobo GA-8S650GXM (socket 478).
> Celeron 2400
> DDR 2100 - 128mb.
>
> Experiencing frequent (several times a day) hardware crashes and automatic
> reboots. I have turned off the control panel option (system>advanced) to
> automatically reboot after a crash, but it still does so anyway.
>
> Machine_Check_Exception.
> There is a blue screen error called Machine_Check_Exception, just prior
to
> the crash & reboot, but it's impossible to read it as it's only on for an
> instant. Definitely not a Windows screen. I believe it's an Intel message
> from the cpu diagnosing a hardware error (perhaps cpu or mobo).
>
> There is no pattern to the crashes and no relation to what programs may be
> running - sometimes it happens when the machine is standing idle.
>

Your symptoms point to iffy power supply* or a motherboard component
failing. I'm wondering if you got one of those motherboards with the bad
batch of caps on it? Open the case and use a flashlight to carefully
examine all of the components on the motherboard, most especially the
capacitors. In case you don't know, those are the small coke can shaped
components standing upright and soldered directly to the motherboard. You
should probably see many of them on your motherboard, with likely a cluster
of several of them near your CPU socket. Examine all sides of them that you
can see, look CAREFULLY for bulges or deformities. Also look for capacitors
that are leaning to one side or another, with no reasonable explanation,
such as being crowded by another component. Also look for capacitors that
are discolored with a brownish discharge. If you notice any of these
symptoms, that points to a capacitor that has failed, and a bad cap could
EASILY explain the problems you are having.

Unless you see something obvious like a bad cap, this one is going to be
tough to trace. I'm afraid you will just have to replace components until
you have a stable system. Start with the power supply, replace it with
something by Seasonic in the ~400W range. If that doesn't help, consider
picking up a different motherboard, possibly off ebay. (it's always good to
save some money, where possible, and you won't want to spend too much money
on a Celeron motherboard) -Dave

* While a multimeter can be useful in diagnosing SEVERE problems with a
power supply, it doesn't take severe power supply problems to cause a system
to be unstable.
 

frank

Distinguished
Dec 31, 2007
1,588
0
19,780
Archived from groups: alt.comp.hardware.pc-homebuilt (More info?)

I'm wondering if you got one of those motherboards with the bad
> batch of caps on it?

Is this a known issue with this brand & model of mobo ?


> * While a multimeter can be useful in diagnosing SEVERE problems with a
> power supply, it doesn't take severe power supply problems to cause a
system
> to be unstable.
>
But running Prime95 's "torture test" is supposed to simulate a max load on
the psu ?
 
G

Guest

Guest
Archived from groups: alt.comp.hardware.pc-homebuilt (More info?)

"Frank" <anyone_1@anywhere_1.com> wrote in message
news:nDulc.18327$ZJ5.579028@news20.bellglobal.com...
> I'm wondering if you got one of those motherboards with the bad
> > batch of caps on it?
>
> Is this a known issue with this brand & model of mobo ?
>
>
> > * While a multimeter can be useful in diagnosing SEVERE problems with a
> > power supply, it doesn't take severe power supply problems to cause a
> system
> > to be unstable.
> >
> But running Prime95 's "torture test" is supposed to simulate a max load
on
> the psu ?

The bad caps are a known issue with all brands and models of motherboards.
I believe it was at it's worst with boards built a couple of years ago. Max
load or not, an intermittent problem with a PSU could cause really bizarre
symptoms like you were describing earlier. If you've got time to kill, try
running the torture test while watching the PSU with a multimeter. If the
12V rail drops below 11.5V or fluctuates more than .5V, I'd replace that
puppy whether it's the cause of your current problems or not. :) -Dave
 

jad

Distinguished
Mar 30, 2004
1,324
0
19,280
Archived from groups: alt.comp.hardware.pc-homebuilt (More info?)

Norton system works? Norton anti virus? Office Find FAST? Media
sniffer?


"Frank" <anyone_1@anywhere_1.com> wrote in message
news:V5qlc.17446$3Q4.275809@news20.bellglobal.com...
> Winxp Home. Updated except for the buggy KB835732.
> Gigabyte mobo GA-8S650GXM (socket 478).
> Celeron 2400
> DDR 2100 - 128mb.
>
> Experiencing frequent (several times a day) hardware crashes and
automatic
> reboots. I have turned off the control panel option
(system>advanced) to
> automatically reboot after a crash, but it still does so anyway.
>
> Machine_Check_Exception.
> There is a blue screen error called Machine_Check_Exception, just
prior to
> the crash & reboot, but it's impossible to read it as it's only on
for an
> instant. Definitely not a Windows screen. I believe it's an Intel
message
> from the cpu diagnosing a hardware error (perhaps cpu or mobo).
>
> There is no pattern to the crashes and no relation to what programs
may be
> running - sometimes it happens when the machine is standing idle.
>
> On reboot the following is sometimes displayed (not everytime) - you
have
> recovered from a serious error:
> BC Code:9c BCP1:00000000 BCP2: 8005366FO BCP3:CC0000FF
BCP4:20040189
> OSVER:5_1_2600 SP:1_0 Product 768_1
>
>
> Memtest86.
> I ran Memtest86 as I suspected a faulty ram module (128 & 256
installed). I
> replaced the 256 module that was probably faulty but the crashes
continued.
> So I removed the new module and am only running the original 128
module
> which tests good. Crashes continue.
>
> PSU.
> I suspected the PSU - Verified voltages - all seem ok (within 5%)-
checked
> the 3.3, 5 & 12v rails on the main mobo connector with a autosensing
digital
> multimeter - all seem ok,
>
> Fans.
> All fans are working (cpu, case and psu).
>
> MBM5 (MotherboardMonitor).
> The MBM temperature results are different from the Bios readings.
> case 141F/61C
> cpu 30F/-1C
> sensor3 32F/OC
> core0 1.6v
> core1 .00v
> +3.3 3.39v
> +5 5.00v
> -12v -12.27v
> -5 -4.89v
> fan1 5625rpm
> fan2 33750rpm
> fan3 16875rpm
> cpu 2424mhz
> cpu0 0%
>
> Bios:
> system temp=32C/89F
> cpu temp=fluctuates from 39C/100F to 41C/105F
> cpu fan=3125rpm
> system fan2766rpm
> vcore=1.58v
> +3.3=3.39v
> +5=5.02v
> +12=11.97v
>
> Event Viewer.
> I looked at the Event Viewer errors and am continually getting STOP
> 0x0000009c errors, which point to hardware problems.
>
> Dumpchk.
> I used Dumpchk.exe to analyse Minidump files (created by XP in the
> windows>minidump folder) and have copy/pasted one below as an
example, and
> to see if it offers any clues:
>
> C:\WINDOWS>dumpchk minidump\mini042704-02.dmp
> Loading dump file minidump\mini042704-02.dmp
> ----- 32 bit Kernel Mini Dump Analysis
>
> DUMP_HEADER32:
> MajorVersion 0000000f
> MinorVersion 00000a28
> DirectoryTableBase 00039000
> PfnDataBase 81053000
> PsLoadedModuleList 8054be30
> PsActiveProcessHead 8054de78
> MachineImageType 0000014c
> NumberProcessors 00000001
> BugCheckCode 0000009c
> BugCheckParameter1 00000000
> BugCheckParameter2 8053f0f0
> BugCheckParameter3 cc0000ff
> BugCheckParameter4 20040189
> PaeEnabled 00000000
> KdDebuggerDataBlock 8053dde0
> MiniDumpFields 00000dff
>
> TRIAGE_DUMP32:
> ServicePackBuild 00000100
> SizeOfDump 00010000
> ValidOffset 0000fffc
> ContextOffset 00000320
> ExceptionOffset 000007d0
> MmOffset 00001068
> UnloadedDriversOffset 000010a0
> PrcbOffset 00001878
> ProcessOffset 000024c8
> ThreadOffset 00002720
> CallStackOffset 00002978
> SizeOfCallStack 00003000
> DriverListOffset 00005c08
> DriverCount 000000a3
> StringPoolOffset 00008c70
> StringPoolSize 00001680
> BrokenDriverOffset 00000000
> TriageOptions 00000041
> TopOfStack f2ebd000
> DebuggerDataOffset 00005978
> DebuggerDataSize 00000290
> DataBlocksOffset 0000a2f0
> DataBlocksCount 00000003
>
>
> Windows XP Kernel Version 2600 (Service Pack 1) UP Free x86
compatible
> Kernel base = 0x804d4000 PsLoadedModuleList = 0x8054be30
> Debug session time: Tue Apr 27 12:57:06 2004
> System Uptime: 0 days 0:05:41
> start end module name
> 804d4000 806c6980 nt Checksum: 0020230B Timestamp:
Thu Aug 29
> 05:
> 03:24 2002 (3D6DE35C)
>
> Unloaded modules:
> f309f000 f30af000 NAVENG.Sys Timestamp: unavailable (00000000)
> f2dff000 f2e90000 NavEx15.Sys Timestamp: unavailable (00000000)
> f2ea0000 f2eb0000 NAVENG.Sys Timestamp: unavailable (00000000)
> f2dff000 f2e90000 NavEx15.Sys Timestamp: unavailable (00000000)
> f78c8000 f78d8000 NAVENG.Sys Timestamp: unavailable (00000000)
> f4238000 f42c9000 NavEx15.Sys Timestamp: unavailable (00000000)
> f2f30000 f2f57000 kmixer.sys Timestamp: unavailable (00000000)
> f38c1000 f38e8000 kmixer.sys Timestamp: unavailable (00000000)
> f7de7000 f7de8000 drmkaud.sys Timestamp: unavailable (00000000)
> f3abe000 f3acb000 DMusic.sys Timestamp: unavailable (00000000)
> f7cb2000 f7cb4000 splitter.sys Timestamp: unavailable
(00000000)
> f3ace000 f3adc000 swmidi.sys Timestamp: unavailable (00000000)
> f3923000 f3946000 aec.sys Timestamp: unavailable (00000000)
> f7b80000 f7b85000 Cdaudio.SYS Timestamp: unavailable (00000000)
> f757c000 f757f000 Sfloppy.SYS Timestamp: unavailable (00000000)
> end.
>
> Prime95
> I ran Prime95 to put load on the system and it failed the torture
test. But
> no clues as to why - not necessarily cpu ?
> Readout - Beginning a continuous self test to check computer.
> Test1, 4000 Lucas-Lehmer iterations of M19922945 using 1024k FFT
length.
> FATAL ERROR:Writing to temp file.
> Error opening results file to output this message:
> Unable to open log file.
> Torture Test ran 0 minutes_1 error.0 warnings.
> Execution halted.
>
> CPU Stability Test ver.6
> I ran the Normal test mode, and it lasted about 9 minutes before
crashing.
> No telling if it was because of strain on the cpu as the machine
crashes
> like that anyway, even when not under a load.
>
> So, definitely a hardware problem :) But how can I be specific and
sure?
> Would a Post diagnostic card tell me if it's the cpu or mobo or ? I
dont
> have a spare cpu or mobo to swop with known-good parts.
>
>
 
G

Guest

Guest
Archived from groups: alt.comp.hardware.pc-homebuilt (More info?)

I had the same problem. It was the cpu overheating.
check the fan and thermal paste.
 

frank

Distinguished
Dec 31, 2007
1,588
0
19,780
Archived from groups: alt.comp.hardware.pc-homebuilt (More info?)

was your fan working ? Mine is.


> I had the same problem. It was the cpu overheating.
> check the fan and thermal paste.
>
>