Archived from groups: comp.sys.ibm.pc.hardware.storage (
More info?)
In article <2nsgfuF47rjnU1@uni-berlin.de>,
Folkert Rienstra <folkertdotrienstra@freeler.nl> wrote:
>"Al Dykes" <adykes@panix.com> wrote in message news:cf9ee2$935$1@panix3.panix.com...
>> In article <cf92g2$pf8$1@omega-3a.right.here>, Robert Nichols <SEE_SIGNATURE@localhost.localdomain.invalid> wrote:
>> > In article <cf8hti$igq$1@panix3.panix.com>, Al Dykes <adykes@panix.com> wrote:
>> > :
>> > : I've just installed a SMART utility (everest) on a Y/O laptop
>> > : and have questions about the SMART numbers it produces.
>
>> > : (I know that disks do ECC error recovery routinely,
>
>You do now, do you? It's about time.
>Oh, and while ECC 'on the fly' error recovery is routine, 'routinely'
>isn't as often as it sounds but more often than that 113 in the statistics.
>
>> > : and an individual event isn't a reason to replace the disk.)
>
>Yeah, you would be replacing them on a daily basis if it did.
>The 'Hardware ECC Recovered' count appears to be linked to the
>ERP count (Read error retries).
>
>> > :
>> > : I can't make sense of the relationship of the "Threshold", "Value",
>> > : "Worst", and "Data" columns because the "data" value is frequenly
>> > : in excess of "worst", but the status is still OK.
>> > :
>> > : I see a few numbers here that might worry me. How is my disk doing ?
>> > :
>> > : [ HITACHI_DK23DA-20 (14L6TL) ]
>> > :
>> > : Threshold Value Worst Data Status
>> > : Raw Read Error Rate 50 100 100 101 OK: normal
>> > : Throughput Performance 50 100 100 4010 OK: normal
>> > : Start/Stop Count 0 98 98 2422 OK: passing
>> > : Reallocated Sector Count 10 100 100 5 OK: normal
>> > : Seek Error Rate 50 100 100 452 OK: normal
>> > : G-Sense Error Rate 0 100 99 145 OK: passing
>> > : Hardware ECC Recovered 0 100 90 113 OK: passing
>> > : Reallocation Event Count 0 100 100 5 OK: passing
>> > : Current Pending Sector Count 0 99 99 1 OK: passing
>> > : Off-Line Uncorrectable Sector Count 0 99 99 4 OK: passing
>> > : Ultra ATA CRC Error Rate 0 200 200 13 OK: passing
>> > : Load Retry Count 0 100 100 384 OK: passing
>> > : Read Error Retry Rate 0 100 1 514 OK: passing
>> >
>> > The numbers in the "Data" column are raw values. Their exact meaning is
>> > arbitrary, but can sometimes be inferred.
>
>'Guessed at' as it is 'vendor specific and proprietary' and hopefully every
>manufacturer uses the same spot and datawidth in the 'Device Attribute Data
>Structure'.
>
>> > For example, 5 defective sectors have been reallocated to spares.
>> > Those raw numbers are then normalized, by formulas known only to
>> > the manufacturer, usually to a range of 0 (worst) to 100 (best).
>> > That result is what is shown in the "Value" column,
>
>> > and an alarm condition is indicated
>
>It depends on the value of the 'Pre-failure/Advisory bit' what type of
>alarm is indicated.
>
>> > when that number drops below the "Threshold" number.
>> > The "Worst" column shows the worst (lowest) normalized value seen
>> > during the life of the device.
>> >
>> > What I see in the above numbers is that in the past something happened
>> > to the drive that caused a high Read Error Retry Rate (Worst = 1).
>> > I would guess that resulted in the reallocation of 5 sectors, with 1
>> > additional bad sector currently flagged for reallocation the next time
>> > that sector is written. The Read Error Retry Rate is now back to a
>> > normalized value of 100 (good).
>
>> > I'd worry if the Reallocated Sector Count continues to grow,
>
>Why? Why not worry now? When it has happened before, so it can happen
>again (unless you happen to know what it was and that it won't happen again,
>if you can help it).
>It may happen again and also stop again and how will that then be different
>from the first time?
>
>Or did you mean to say 'keeps growing steadily' as that would make more sense.
>
>> > but otherwise the drive appears to be in good shape.
>
>Yup, it appears like a temporary event that went by and the predictive sta-
>tistics returned to safe values, either after time all by itself or by the LLF.
>Question is:
>what did happen and can it happen again if you don't do anything about it.
>Maybe the 'G-Sense Error Rate' has something to do with it?
>
>> >
>> >--
>> > Bob Nichols AT comcast.net I am "rnichols42"
>>
>>
>> Bingo. right on.
>>
>> I had a crash BSOD crash that resulted in a unbootable XP system.
>> It would come half-way up and crash and reboot. It smelled like a disk
>> problem.
>>
>> I did a low level format and ran the proceedure that Compaq wanted and
>> it gave an OK so I didn't have a way to get Compaq to give me a new disk.
>> Then I booted Linux and ran badblocks overnight and it didn't show any
>> problems, so I reimaged from a backup and it's been running fine.
>> That was months ago.
>
>Who was that again who said:
>" Yank the drive. Life't too short to have unexpected total failures. "
>
Me.
Since it was a personal machine, I wanted to experiment, and I had the
time, and didn't have a spare (which I always have on a business
site). So I took the time to experiment. The fact that it was passing
tests and Compaq wanted it to fail befre they'd swap it meand I'd be
out $150.
If it comes to keeping an employee productive I'm set up to swap a
machine out and reimage it. Fast and cost effective, but I don't
learn much anount anything but imaging.
(And you must be mistaking me for someone else, I've always stated
that ECC/FEC is heavily used on disks and has been for
decades. There's Maximal Probability stuff that makes reading a data
track a lot like demodualating a radio signal on a noisy channel.)
--
Al Dykes
-----------
adykes at p a n i x . c o m