Supermicro P6DGU, flaky scsi disk IO - how to diagnose?

G

Guest

Guest
Archived from groups: alt.comp.periphs.mainboard.supermicro,comp.periphs.scsi (More info?)

About 6 months ago I started having SCSI disk problems on my
Supermicro P6DGU. It would take the machine 2-3 minutes to recognize
the disk at all on cold startup, then it would run fine. Then it took
longer and longer, then I started getting disk errors. I figured the
disk was bad and old, so I swapped it out for an identical spare. Same
issues. Hmmm. I figured something else was wrong, so I completely
traded disks as a test. Not even using the same technology (the old
Maxtor Atlas 10K was a U160, the new Quantum can do U320 although of
course the motherboard will only drive it at U160).

But I'm still having problems, and in fact they're getting worse. More
random errors, longer to start up from power-off, etc. I can't believe
3 different drives are all bad, so I'm wondering if this 5-year-old SM
motherboard is giving up the ghost as far as SCSI is concerned. But
I'm at a loss to even begin to debug this...short of buying a
replacement motherboard on Ebay and swapping the current one out.

Ideas? Similar experiences? Thoughts on approaching this? I've like
the P6DGU (this is destined for a server application, eventually) but
at this point maybe I should just junk it and get another mobo?

tia, -jonathan r-

PS: The disk is attached directly to the SCSI connector on the
motherboard, using the SM-supplied cable with an active terminator
built in). It's the only device on the SCSI chain. I'm running a
single PIII-1GHZ coppermine; the second CPU slot has a terminator in
it. OS is Win98SE (yea, it's old, sue me). Not overclocked.

PPS: It's not termination. The bus is terminated with an active
terminator and was running fine before this with the same
cables/hardware.)
 
G

Guest

Guest
Archived from groups: alt.comp.periphs.mainboard.supermicro,comp.periphs.scsi (More info?)

"Jonathan Rogers" <thatseattleguy@hotmail.com> wrote in message news:ba7d6941.0409291910.45f3738@posting.google.com...
> About 6 months ago I started having SCSI disk problems on my
> Supermicro P6DGU. It would take the machine 2-3 minutes to recognize
> the disk at all on cold startup, then it would run fine. Then it took
> longer and longer, then I started getting disk errors. I figured the
> disk was bad and old, so I swapped it out for an identical spare. Same
> issues. Hmmm. I figured something else was wrong, so I completely
> traded disks as a test. Not even using the same technology (the old
> Maxtor Atlas 10K was a U160, the new Quantum can do U320 although
> of course the motherboard will only drive it at U160).
>
> But I'm still having problems, and in fact they're getting worse. More
> random errors, longer to start up from power-off, etc. I can't believe
> 3 different drives are all bad, so I'm wondering if this 5-year-old SM
> motherboard is giving up the ghost as far as SCSI is concerned. But
> I'm at a loss to even begin to debug this...short of buying a
> replacement motherboard on Ebay and swapping the current one out.
>
> Ideas? Similar experiences? Thoughts on approaching this? I've like
> the P6DGU (this is destined for a server application, eventually) but
> at this point maybe I should just junk it and get another mobo?
>
> tia, -jonathan r-
>
> PS: The disk is attached directly to the SCSI connector on the
> motherboard, using the SM-supplied cable with an active terminator
> built in). It's the only device on the SCSI chain. I'm running a
> single PIII-1GHZ coppermine; the second CPU slot has a terminator
> in it. OS is Win98SE (yea, it's old, sue me). Not overclocked.
>
> PPS: It's not termination. The bus is terminated with an active
> terminator and was running fine before this with the same
> cables/hardware.)
>
Try different cables. And check all power supply voltages.
 
G

Guest

Guest
Archived from groups: alt.comp.periphs.mainboard.supermicro,comp.periphs.scsi (More info?)

"Jonathan Rogers" <thatseattleguy@hotmail.com> wrote in message news:ba7d6941.0409291910.45f3738@posting.google.com
> About 6 months ago I started having SCSI disk problems on my
> Supermicro P6DGU. It would take the machine 2-3 minutes to recognize
> the disk at all on cold startup, then it would run fine. Then it took
> longer and longer, then I started getting disk errors. I figured the
> disk was bad and old, so I swapped it out for an identical spare. Same
> issues. Hmmm. I figured something else was wrong, so I completely
> traded disks as a test. Not even using the same technology (the old
> Maxtor Atlas 10K was a U160, the new Quantum can do U320 although of
> course the motherboard will only drive it at U160).
>
> But I'm still having problems, and in fact they're getting worse. More
> random errors, longer to start up from power-off, etc. I can't believe
> 3 different drives are all bad, so I'm wondering if this 5-year-old SM
> motherboard is giving up the ghost as far as SCSI is concerned. But

> I'm at a loss to even begin to debug this...short of buying a
> replacement motherboard on Ebay and swapping the current one out.

Right, when there is no setup utility in the SCSI controller and jumpers on
the drives to fiddle with, and your brain doesn't work, what else can one do?

>
> Ideas? Similar experiences? Thoughts on approaching this? I've like
> the P6DGU (this is destined for a server application, eventually) but
> at this point maybe I should just junk it and get another mobo?
>
> tia, -jonathan r-
>
> PS: The disk is attached directly to the SCSI connector on the
> motherboard, using the SM-supplied cable with an active terminator
> built in). It's the only device on the SCSI chain. I'm running a
> single PIII-1GHZ coppermine; the second CPU slot has a terminator in
> it. OS is Win98SE (yea, it's old, sue me). Not overclocked.
>

> PPS: It's not termination. The bus is terminated with an active terminator
> and was running fine before this with the same cables/hardware.)

Can I have it please?
I have always liked indestructable objects that have unlimited lifetime.
 
G

Guest

Guest
Archived from groups: alt.comp.periphs.mainboard.supermicro,comp.periphs.scsi (More info?)

Jonathan Rogers wrote:

> About 6 months ago I started having SCSI disk problems on my
> Supermicro P6DGU. It would take the machine 2-3 minutes to recognize
> the disk at all on cold startup, then it would run fine. Then it took
> longer and longer, then I started getting disk errors. I figured the
> disk was bad and old, so I swapped it out for an identical spare. Same
> issues. Hmmm. I figured something else was wrong, so I completely
> traded disks as a test. Not even using the same technology (the old
> Maxtor Atlas 10K was a U160, the new Quantum can do U320 although of
> course the motherboard will only drive it at U160).
>
> But I'm still having problems, and in fact they're getting worse. More
> random errors, longer to start up from power-off, etc. I can't believe
> 3 different drives are all bad, so I'm wondering if this 5-year-old SM
> motherboard is giving up the ghost as far as SCSI is concerned. But
> I'm at a loss to even begin to debug this...short of buying a
> replacement motherboard on Ebay and swapping the current one out.
>
> Ideas? Similar experiences? Thoughts on approaching this? I've like
> the P6DGU (this is destined for a server application, eventually) but
> at this point maybe I should just junk it and get another mobo?
>
> tia, -jonathan r-
>
> PS: The disk is attached directly to the SCSI connector on the
> motherboard, using the SM-supplied cable with an active terminator
> built in). It's the only device on the SCSI chain. I'm running a
> single PIII-1GHZ coppermine; the second CPU slot has a terminator in
> it. OS is Win98SE (yea, it's old, sue me). Not overclocked.
>
> PPS: It's not termination. The bus is terminated with an active
> terminator and was running fine before this with the same
> cables/hardware.)

You may want to check the board and make sure all the traces are still
there. I actually had one of them burn. It caused odd problems like you
described, but only once in a while when it the SCSI disks were under load.

Matt
 
G

Guest

Guest
Archived from groups: alt.comp.periphs.mainboard.supermicro (More info?)

It may also be the power supply.

If the power supply is no longer putting out full current, then it would
take longer and longer for the HD's to spin up to speed.

"Matt Reuther" <mreuther@umich.edu> wrote in message
news:CJI9d.720$tM3.422@news.itd.umich.edu...
> Jonathan Rogers wrote:
>
>> About 6 months ago I started having SCSI disk problems on my
>> Supermicro P6DGU. It would take the machine 2-3 minutes to recognize
>> the disk at all on cold startup, then it would run fine. Then it took
>> longer and longer, then I started getting disk errors. I figured the
>> disk was bad and old, so I swapped it out for an identical spare. Same
>> issues. Hmmm. I figured something else was wrong, so I completely
>> traded disks as a test. Not even using the same technology (the old
>> Maxtor Atlas 10K was a U160, the new Quantum can do U320 although of
>> course the motherboard will only drive it at U160).
>>
>> But I'm still having problems, and in fact they're getting worse. More
>> random errors, longer to start up from power-off, etc. I can't believe
>> 3 different drives are all bad, so I'm wondering if this 5-year-old SM
>> motherboard is giving up the ghost as far as SCSI is concerned. But
>> I'm at a loss to even begin to debug this...short of buying a
>> replacement motherboard on Ebay and swapping the current one out.
>>
>> Ideas? Similar experiences? Thoughts on approaching this? I've like
>> the P6DGU (this is destined for a server application, eventually) but
>> at this point maybe I should just junk it and get another mobo?
>>
>> tia, -jonathan r-
>>
>> PS: The disk is attached directly to the SCSI connector on the
>> motherboard, using the SM-supplied cable with an active terminator
>> built in). It's the only device on the SCSI chain. I'm running a
>> single PIII-1GHZ coppermine; the second CPU slot has a terminator in
>> it. OS is Win98SE (yea, it's old, sue me). Not overclocked.
>>
>> PPS: It's not termination. The bus is terminated with an active
>> terminator and was running fine before this with the same
>> cables/hardware.)
>
> You may want to check the board and make sure all the traces are still
> there. I actually had one of them burn. It caused odd problems like you
> described, but only once in a while when it the SCSI disks were under
> load.
>
> Matt
 
G

Guest

Guest
Archived from groups: alt.comp.periphs.mainboard.supermicro,comp.periphs.scsi (More info?)

> "Jonathan Rogers" <thatseattleguy@hotmail.com> wrote:
> > About 6 months ago I started having SCSI disk problems on my
> > Supermicro P6DGU. It would take the machine 2-3 minutes to recognize
> > the disk at all on cold startup, then it would run fine. Then it took
> > longer and longer, then I started getting disk errors...

Following up on my original post and the suggestions provided.
Motherboard has onboard monitors for all power supply voltages and
they all turned out to be in spec. Ditto for for the SCSI cable, which
I had tested and rebuilt (to reduce its length and to try a new active
terminator). A go with a S.M.A.R.T diagnostic tool from Quantum
confirmed the drive was happy and had no internal errors, but that the
system occasionally couldn't even read/write to the drive's buffer. So
I suspected something "upstream"....

I pulled the motherboard out of the server completely and the
problem's source instantly became clearer :). One of the UART ICs near
the SCSI chips didn't look right; I tapped it lightly with a pen and
it (literally) fell off the board into my lap. Ditto for the other two
next to it. I suspect really poor soldering, which is unusual in my
experience for Supermicro - normally of very high build quality. I'm
sending the board out to get the capacitors replaced (some are bulging
slighly - see www.badcaps.com for interesting perspective on this
problem) and to get the UARTs resoldered. Unfortunately it's out of
warranty, but this is the easiest route to get it fixed at this point.

Many thanks to the helpful respondents (L.David, Timothy, Matt) on
this thread for their suggestions on this - appreciated. NO thanks to
the other respondent, whose life is apparently so empty that he feels
a compulsion to fill its yawning void by spending all his waking hours
on Usenet posting snide, unhelpful asides that serve only to show he's
an ignorant Eurotrash wannabe who doesn't even fully read messages
before launching into his infantile little
I-am-so-smart-you-are-obviously-SO-stoopid routine. Or something like
that. :)

yours, /jr/




> > disk was bad and old, so I swapped it out for an identical spare. Same
> > issues. Hmmm. I figured something else was wrong, I completely
> > traded disks as a test. Not even using the same technology (the old
> > Maxtor Atlas 10K was a U160, the new Quantum can do U320 although
> > of course the motherboard will only drive it at U160).
> >
> > But I'm still having problems, and in fact they're getting worse. More
> > random errors, longer to start up from power-off, etc. I can't believe
> > 3 different drives are all bad, so I'm wondering if this 5-year-old SM
> > motherboard is giving up the ghost as far as SCSI is concerned. But
> > I'm at a loss to even begin to debug this...short of buying a
> > replacement motherboard on Ebay and swapping the current one out.
> >
> > Ideas? Similar experiences? Thoughts on approaching this? I've like
> > the P6DGU (this is destined for a server application, eventually) but
> > at this point maybe I should just junk it and get another mobo?
> >
> > tia, -jonathan r-
> >
> > PS: The disk is attached directly to the SCSI connector on the
> > motherboard, using the SM-supplied cable with an active terminator
> > built in). It's the only device on the SCSI chain. I'm running a
> > single PIII-1GHZ coppermine; the second CPU slot has a terminator
> > in it. OS is Win98SE (yea, it's old, sue me). Not overclocked.
> >
> > PPS: It's not termination. The bus is terminated with an active
> > terminator and was running fine before this with the same
> > cables/hardware.)