A Question About SMART numbers

Guest · Aug 9, 2004

Archived from groups: comp.sys.ibm.pc.hardware.storage (More info?)

I've just installed a SMART utility (everest) on a Y/O laptop
and have questions about the SMART numbers it produces.
(I know that disks do ECC error recovery routinely, and
an individual event isn't a reason to replace the disk.)

I can't make sense of the relationship of the "Threshold", "Value",
"Worst", and "Data" columns because the "data" value is frequenly in
excess of "worst", but the status is still OK.

I see a few numbers here that might worry me. How is my disk doing ?

[ HITACHI_DK23DA-20 (14L6TL) ]

Threshold Value Worst Data Status
Raw Read Error Rate 50 100 100 101 OK: normal
Throughput Performance 50 100 100 4010 OK: normal
Start/Stop Count 0 98 98 2422 OK: passing
Reallocated Sector Count 10 100 100 5 OK: normal
Seek Error Rate 50 100 100 452 OK: normal
G-Sense Error Rate 0 100 99 145 OK: passing
Hardware ECC Recovered 0 100 90 113 OK: passing
Reallocation Event Count 0 100 100 5 OK: passing
Current Pending Sector Count 0 99 99 1 OK: passing
Off-Line Uncorrectable Sector Count 0 99 99 4 OK: passing
Ultra ATA CRC Error Rate 0 200 200 13 OK: passing
Load Retry Count 0 100 100 384 OK: passing
Read Error Retry Rate 0 100 1 514 OK: passing

Thanks

--
Al Dykes
-----------
adykes at p a n i x . c o m

Guest · Aug 10, 2004

Archived from groups: comp.sys.ibm.pc.hardware.storage (More info?)

In article <cf8hti$igq$1@panix3.panix.com>, Al Dykes <adykes@panix.com> wrote:
:
:I've just installed a SMART utility (everest) on a Y/O laptop
:and have questions about the SMART numbers it produces.

I know that disks do ECC error recovery routinely, and
:an individual event isn't a reason to replace the disk.)
:
:I can't make sense of the relationship of the "Threshold", "Value",
:"Worst", and "Data" columns because the "data" value is frequenly in
:excess of "worst", but the status is still OK.
:
:I see a few numbers here that might worry me. How is my disk doing ?
:
: [ HITACHI_DK23DA-20 (14L6TL) ]
:
: Threshold Value Worst Data Status
:Raw Read Error Rate 50 100 100 101 OK: normal
:Throughput Performance 50 100 100 4010 OK: normal
:Start/Stop Count 0 98 98 2422 OK: passing
:Reallocated Sector Count 10 100 100 5 OK: normal
:Seek Error Rate 50 100 100 452 OK: normal
:G-Sense Error Rate 0 100 99 145 OK: passing
:Hardware ECC Recovered 0 100 90 113 OK: passing
:Reallocation Event Count 0 100 100 5 OK: passing
:Current Pending Sector Count 0 99 99 1 OK: passing
:Off-Line Uncorrectable Sector Count 0 99 99 4 OK: passing
:Ultra ATA CRC Error Rate 0 200 200 13 OK: passing
:Load Retry Count 0 100 100 384 OK: passing
:Read Error Retry Rate 0 100 1 514 OK: passing

The numbers in the "Data" column are raw values. Their exact meaning is
arbitrary, but can sometimes be inferred. For example, 5 defective
sectors have been reallocated to spares. Those raw numbers are then
normalized, by formulas known only to the manufacturer, usually to a
range of 0 (worst) to 100 (best). That result is what is shown in the
"Value" column, and an alarm condition is indicated when that number
drops below the "Threshold" number. The "Worst" column shows the worst
(lowest) normalized value seen during the life of the device.

What I see in the above numbers is that in the past something happened
to the drive that caused a high Read Error Retry Rate (Worst = 1). I
would guess that resulted in the reallocation of 5 sectors, with 1
additional bad sector currently flagged for reallocation the next time
that sector is written. The Read Error Retry Rate is now back to a
normalized value of 100 (good). I'd worry if the Reallocated Sector
Count continues to grow, but otherwise the drive appears to be in good
shape.

--
Bob Nichols AT comcast.net I am "rnichols42"

Guest · Aug 10, 2004

Archived from groups: comp.sys.ibm.pc.hardware.storage (More info?)

In article <cf92g2$pf8$1@omega-3a.right.here>,
Robert Nichols <SEE_SIGNATURE@localhost.localdomain.invalid> wrote:
>In article <cf8hti$igq$1@panix3.panix.com>, Al Dykes <adykes@panix.com> wrote:
>:
>:I've just installed a SMART utility (everest) on a Y/O laptop
>:and have questions about the SMART numbers it produces.

I know that disks do ECC error recovery routinely, and
>:an individual event isn't a reason to replace the disk.)
>:
>:I can't make sense of the relationship of the "Threshold", "Value",
>:"Worst", and "Data" columns because the "data" value is frequenly in
>:excess of "worst", but the status is still OK.
>:
>:I see a few numbers here that might worry me. How is my disk doing ?
>:
>: [ HITACHI_DK23DA-20 (14L6TL) ]
>:
>: Threshold Value Worst Data Status
>:Raw Read Error Rate 50 100 100 101 OK: normal
>:Throughput Performance 50 100 100 4010 OK: normal
>:Start/Stop Count 0 98 98 2422 OK: passing
>:Reallocated Sector Count 10 100 100 5 OK: normal
>:Seek Error Rate 50 100 100 452 OK: normal
>:G-Sense Error Rate 0 100 99 145 OK: passing
>:Hardware ECC Recovered 0 100 90 113 OK: passing
>:Reallocation Event Count 0 100 100 5 OK: passing
>:Current Pending Sector Count 0 99 99 1 OK: passing
>:Off-Line Uncorrectable Sector Count 0 99 99 4 OK: passing
>:Ultra ATA CRC Error Rate 0 200 200 13 OK: passing
>:Load Retry Count 0 100 100 384 OK: passing
>:Read Error Retry Rate 0 100 1 514 OK: passing
>
>The numbers in the "Data" column are raw values. Their exact meaning is
>arbitrary, but can sometimes be inferred. For example, 5 defective
>sectors have been reallocated to spares. Those raw numbers are then
>normalized, by formulas known only to the manufacturer, usually to a
>range of 0 (worst) to 100 (best). That result is what is shown in the
>"Value" column, and an alarm condition is indicated when that number
>drops below the "Threshold" number. The "Worst" column shows the worst
>(lowest) normalized value seen during the life of the device.
>
>What I see in the above numbers is that in the past something happened
>to the drive that caused a high Read Error Retry Rate (Worst = 1). I
>would guess that resulted in the reallocation of 5 sectors, with 1
>additional bad sector currently flagged for reallocation the next time
>that sector is written. The Read Error Retry Rate is now back to a
>normalized value of 100 (good). I'd worry if the Reallocated Sector
>Count continues to grow, but otherwise the drive appears to be in good
>shape.
>
>--
>Bob Nichols AT comcast.net I am "rnichols42"

Bingo. right on.

I had a crash BSOD crash that resulted in a unbootable XP system. It
would come half-way up and crash and reboot. It smelled like a disk
problem.

I did a low level format and ran the proceedure that Compaq wanted and
it gave an OK so I didn't have a way to get Compaq to give me a new
disk. Then I booted Linux and ran badblocks overnight and it didn't
show any problems, so I reimaged from a backup and it's been running
fine. That was months ago.

Thanks

--
Al Dykes
-----------
adykes at p a n i x . c o m

joeP · Aug 10, 2004

Archived from groups: comp.sys.ibm.pc.hardware.storage (More info?)

"Al Dykes" <adykes@panix.com> wrote in message
news:cf8hti$igq$1@panix3.panix.com...
>
> I've just installed a SMART utility (everest) on a Y/O laptop
> and have questions about the SMART numbers it produces.
> (I know that disks do ECC error recovery routinely, and
> an individual event isn't a reason to replace the disk.)
>
> I can't make sense of the relationship of the "Threshold", "Value",
> "Worst", and "Data" columns because the "data" value is frequenly in
> excess of "worst", but the status is still OK.
>

The 'worst' values relate to the 'threshold' values.

--
Joep

Guest · Aug 11, 2004

Archived from groups: comp.sys.ibm.pc.hardware.storage (More info?)

"Al Dykes" <adykes@panix.com> wrote in message news:cf9ee2$935$1@panix3.panix.com...
> In article <cf92g2$pf8$1@omega-3a.right.here>, Robert Nichols <SEE_SIGNATURE@localhost.localdomain.invalid> wrote:
> > In article <cf8hti$igq$1@panix3.panix.com>, Al Dykes <adykes@panix.com> wrote:
> > :
> > : I've just installed a SMART utility (everest) on a Y/O laptop
> > : and have questions about the SMART numbers it produces.

> > : (I know that disks do ECC error recovery routinely,

You do now, do you? It's about time.
Oh, and while ECC 'on the fly' error recovery is routine, 'routinely'
isn't as often as it sounds but more often than that 113 in the statistics.

> > : and an individual event isn't a reason to replace the disk.)

Yeah, you would be replacing them on a daily basis if it did.
The 'Hardware ECC Recovered' count appears to be linked to the
ERP count (Read error retries).

> > :
> > : I can't make sense of the relationship of the "Threshold", "Value",
> > : "Worst", and "Data" columns because the "data" value is frequenly
> > : in excess of "worst", but the status is still OK.
> > :
> > : I see a few numbers here that might worry me. How is my disk doing ?
> > :
> > : [ HITACHI_DK23DA-20 (14L6TL) ]
> > :
> > : Threshold Value Worst Data Status
> > : Raw Read Error Rate 50 100 100 101 OK: normal
> > : Throughput Performance 50 100 100 4010 OK: normal
> > : Start/Stop Count 0 98 98 2422 OK: passing
> > : Reallocated Sector Count 10 100 100 5 OK: normal
> > : Seek Error Rate 50 100 100 452 OK: normal
> > : G-Sense Error Rate 0 100 99 145 OK: passing
> > : Hardware ECC Recovered 0 100 90 113 OK: passing
> > : Reallocation Event Count 0 100 100 5 OK: passing
> > : Current Pending Sector Count 0 99 99 1 OK: passing
> > : Off-Line Uncorrectable Sector Count 0 99 99 4 OK: passing
> > : Ultra ATA CRC Error Rate 0 200 200 13 OK: passing
> > : Load Retry Count 0 100 100 384 OK: passing
> > : Read Error Retry Rate 0 100 1 514 OK: passing
> >
> > The numbers in the "Data" column are raw values. Their exact meaning is
> > arbitrary, but can sometimes be inferred.

'Guessed at' as it is 'vendor specific and proprietary' and hopefully every
manufacturer uses the same spot and datawidth in the 'Device Attribute Data
Structure'.

> > For example, 5 defective sectors have been reallocated to spares.
> > Those raw numbers are then normalized, by formulas known only to
> > the manufacturer, usually to a range of 0 (worst) to 100 (best).
> > That result is what is shown in the "Value" column,

> > and an alarm condition is indicated

It depends on the value of the 'Pre-failure/Advisory bit' what type of
alarm is indicated.

> > when that number drops below the "Threshold" number.
> > The "Worst" column shows the worst (lowest) normalized value seen
> > during the life of the device.
> >
> > What I see in the above numbers is that in the past something happened
> > to the drive that caused a high Read Error Retry Rate (Worst = 1).
> > I would guess that resulted in the reallocation of 5 sectors, with 1
> > additional bad sector currently flagged for reallocation the next time
> > that sector is written. The Read Error Retry Rate is now back to a
> > normalized value of 100 (good).

> > I'd worry if the Reallocated Sector Count continues to grow,

Why? Why not worry now? When it has happened before, so it can happen
again (unless you happen to know what it was and that it won't happen again,
if you can help it).
It may happen again and also stop again and how will that then be different
from the first time?

Or did you mean to say 'keeps growing steadily' as that would make more sense.

> > but otherwise the drive appears to be in good shape.

Yup, it appears like a temporary event that went by and the predictive sta-
tistics returned to safe values, either after time all by itself or by the LLF.
Question is:
what did happen and can it happen again if you don't do anything about it.
Maybe the 'G-Sense Error Rate' has something to do with it?

> >
> >--
> > Bob Nichols AT comcast.net I am "rnichols42"
>
>
> Bingo. right on.
>
> I had a crash BSOD crash that resulted in a unbootable XP system.
> It would come half-way up and crash and reboot. It smelled like a disk
> problem.
>
> I did a low level format and ran the proceedure that Compaq wanted and
> it gave an OK so I didn't have a way to get Compaq to give me a new disk.
> Then I booted Linux and ran badblocks overnight and it didn't show any
> problems, so I reimaged from a backup and it's been running fine.
> That was months ago.

Who was that again who said:
" Yank the drive. Life't too short to have unexpected total failures. "

>
> Thanks
>
> --
> Al Dykes

Guest · Aug 11, 2004

Archived from groups: comp.sys.ibm.pc.hardware.storage (More info?)

In article <2nsgfuF47rjnU1@uni-berlin.de>,
Folkert Rienstra <folkertdotrienstra@freeler.nl> wrote:
>"Al Dykes" <adykes@panix.com> wrote in message news:cf9ee2$935$1@panix3.panix.com...
>> In article <cf92g2$pf8$1@omega-3a.right.here>, Robert Nichols <SEE_SIGNATURE@localhost.localdomain.invalid> wrote:
>> > In article <cf8hti$igq$1@panix3.panix.com>, Al Dykes <adykes@panix.com> wrote:
>> > :
>> > : I've just installed a SMART utility (everest) on a Y/O laptop
>> > : and have questions about the SMART numbers it produces.
>
>> > : (I know that disks do ECC error recovery routinely,
>
>You do now, do you? It's about time.
>Oh, and while ECC 'on the fly' error recovery is routine, 'routinely'
>isn't as often as it sounds but more often than that 113 in the statistics.
>
>> > : and an individual event isn't a reason to replace the disk.)
>
>Yeah, you would be replacing them on a daily basis if it did.
>The 'Hardware ECC Recovered' count appears to be linked to the
>ERP count (Read error retries).
>
>> > :
>> > : I can't make sense of the relationship of the "Threshold", "Value",
>> > : "Worst", and "Data" columns because the "data" value is frequenly
>> > : in excess of "worst", but the status is still OK.
>> > :
>> > : I see a few numbers here that might worry me. How is my disk doing ?
>> > :
>> > : [ HITACHI_DK23DA-20 (14L6TL) ]
>> > :
>> > : Threshold Value Worst Data Status
>> > : Raw Read Error Rate 50 100 100 101 OK: normal
>> > : Throughput Performance 50 100 100 4010 OK: normal
>> > : Start/Stop Count 0 98 98 2422 OK: passing
>> > : Reallocated Sector Count 10 100 100 5 OK: normal
>> > : Seek Error Rate 50 100 100 452 OK: normal
>> > : G-Sense Error Rate 0 100 99 145 OK: passing
>> > : Hardware ECC Recovered 0 100 90 113 OK: passing
>> > : Reallocation Event Count 0 100 100 5 OK: passing
>> > : Current Pending Sector Count 0 99 99 1 OK: passing
>> > : Off-Line Uncorrectable Sector Count 0 99 99 4 OK: passing
>> > : Ultra ATA CRC Error Rate 0 200 200 13 OK: passing
>> > : Load Retry Count 0 100 100 384 OK: passing
>> > : Read Error Retry Rate 0 100 1 514 OK: passing
>> >
>> > The numbers in the "Data" column are raw values. Their exact meaning is
>> > arbitrary, but can sometimes be inferred.
>
>'Guessed at' as it is 'vendor specific and proprietary' and hopefully every
>manufacturer uses the same spot and datawidth in the 'Device Attribute Data
>Structure'.
>
>> > For example, 5 defective sectors have been reallocated to spares.
>> > Those raw numbers are then normalized, by formulas known only to
>> > the manufacturer, usually to a range of 0 (worst) to 100 (best).
>> > That result is what is shown in the "Value" column,
>
>> > and an alarm condition is indicated
>
>It depends on the value of the 'Pre-failure/Advisory bit' what type of
>alarm is indicated.
>
>> > when that number drops below the "Threshold" number.
>> > The "Worst" column shows the worst (lowest) normalized value seen
>> > during the life of the device.
>> >
>> > What I see in the above numbers is that in the past something happened
>> > to the drive that caused a high Read Error Retry Rate (Worst = 1).
>> > I would guess that resulted in the reallocation of 5 sectors, with 1
>> > additional bad sector currently flagged for reallocation the next time
>> > that sector is written. The Read Error Retry Rate is now back to a
>> > normalized value of 100 (good).
>
>> > I'd worry if the Reallocated Sector Count continues to grow,
>
>Why? Why not worry now? When it has happened before, so it can happen
>again (unless you happen to know what it was and that it won't happen again,
>if you can help it).
>It may happen again and also stop again and how will that then be different
>from the first time?
>
>Or did you mean to say 'keeps growing steadily' as that would make more sense.
>
>> > but otherwise the drive appears to be in good shape.
>
>Yup, it appears like a temporary event that went by and the predictive sta-
>tistics returned to safe values, either after time all by itself or by the LLF.
>Question is:
>what did happen and can it happen again if you don't do anything about it.
>Maybe the 'G-Sense Error Rate' has something to do with it?
>
>> >
>> >--
>> > Bob Nichols AT comcast.net I am "rnichols42"
>>
>>
>> Bingo. right on.
>>
>> I had a crash BSOD crash that resulted in a unbootable XP system.
>> It would come half-way up and crash and reboot. It smelled like a disk
>> problem.
>>
>> I did a low level format and ran the proceedure that Compaq wanted and
>> it gave an OK so I didn't have a way to get Compaq to give me a new disk.
>> Then I booted Linux and ran badblocks overnight and it didn't show any
>> problems, so I reimaged from a backup and it's been running fine.
>> That was months ago.
>
>Who was that again who said:
>" Yank the drive. Life't too short to have unexpected total failures. "
>

Me.

Since it was a personal machine, I wanted to experiment, and I had the
time, and didn't have a spare (which I always have on a business
site). So I took the time to experiment. The fact that it was passing
tests and Compaq wanted it to fail befre they'd swap it meand I'd be
out $150.

If it comes to keeping an employee productive I'm set up to swap a
machine out and reimage it. Fast and cost effective, but I don't
learn much anount anything but imaging.

(And you must be mistaking me for someone else, I've always stated
that ECC/FEC is heavily used on disks and has been for
decades. There's Maximal Probability stuff that makes reading a data
track a lot like demodualating a radio signal on a noisy channel.)

--
Al Dykes
-----------
adykes at p a n i x . c o m

Guest · Aug 13, 2004

Archived from groups: comp.sys.ibm.pc.hardware.storage (More info?)

"Al Dykes" <adykes@panix.com> wrote in message news:cfdb4n$ick$1@panix3.panix.com
> In article 2nsgfuF47rjnU1@uni-berlin.de, Folkert Rienstra <xxxxxxxx> wrote:

You really enjoy posting peoples Reply Addresses on the internet, don't you, Al.
Maybe the abuse department at Panix.com has more influence on you than I have
to get you to conform to good usenet practice.

> > "Al Dykes" <adykes@panix.com> wrote in message news:cf9ee2$935$1@panix3.panix.com...
> > > In article <cf92g2$pf8$1@omega-3a.right.here>, Robert Nichols <SEE_SIGNATURE@localhost.localdomain.invalid> wrote:
> > > > In article <cf8hti$igq$1@panix3.panix.com>, Al Dykes <adykes@panix.com> wrote:
> > > > >
> > > > > I've just installed a SMART utility (everest) on a Y/O laptop
> > > > > and have questions about the SMART numbers it produces.
> >
> > > > > (I know that disks do ECC error recovery routinely,
> >
> > You do now, do you? It's about time.
> > Oh, and while ECC 'on the fly' error recovery is routine, 'routinely'
> > isn't as often as it sounds but more often than that 113 in the statistics.
> >
> > > > > and an individual event isn't a reason to replace the disk.)
> >
> > Yeah, you would be replacing them on a daily basis if it did.
> > The 'Hardware ECC Recovered' count appears to be linked to the
> > ERP count (Read error retries).
> >
> > > > >
> > > > > I can't make sense of the relationship of the "Threshold", "Value",
> > > > > "Worst", and "Data" columns because the "data" value is frequenly
> > > > > in excess of "worst", but the status is still OK.

[snip]

> ..... the drive appears to be in good shape.
> >
> > Yup, it appears like a temporary event that went by and the predictive sta-
> > tistics returned to safe values, either after time all by itself or by the LLF.
> > Question is:
> > what did happen and can it happen again if you don't do anything about it.
> > Maybe the 'G-Sense Error Rate' has something to do with it?
> >

[snip]

> >
> > Who was that again who said:
> > " Yank the drive. Life't too short to have unexpected total failures. "
> >
>
> Me.
>
> Since it was a personal machine, I wanted to experiment, and I had the
> time, and didn't have a spare (which I always have on a business site).
> So I took the time to experiment. The fact that it was passing tests and
> Compaq wanted it to fail befre they'd swap it meand I'd be out $150.

Right.
Your beliefs vary with out of who's pocket the replacement cost has to come.

>
> If it comes to keeping an employee productive I'm set up to swap a
> machine out and reimage it. Fast and cost effective, but I don't
> learn much anount anything but imaging.

Yes, except that is not at all what you meant when you said that quote.

>
> (And you must be mistaking me for someone else, I've always stated
> that ECC/FEC is heavily used on disks and has been for decades.

So it is your opinion then that the "Al Dykes" <adykes@panix.com that
said in the thread :
"Re: More life from hard disk with bad sectors"

> > > When ECC recovery happens the block is redirected to a spare, the data
> > > is written there and the application has know idea anything happened.

was an imposter, it can't have possibly come from you?

Or, alternatively, are you now saying that 'heavily used' ECC recovery,
resulting (by your own words) in 'heavily redirection' to spares 'creating' many
bad blocks, is daily practice (about once in every minute) and therefor quite ok?

Quite a change for someone who's life motto is:

"
> IMHO once I see bad blocks the disk gets yanked and replaced on any
> PC that's being used for business purposes. Life't too short to have
> unexpected total failures.
"

You often haven't got a clue about what you are saying, have you, Al.

> There's Maximal Probability stuff that makes reading a data
> track a lot like demodualating a radio signal on a noisy channel.)

That about confirms it.

Search

A Question About SMART numbers

Guest

Guest

Guest

Guest

Guest

Guest

joeP

Distinguished

Guest

Guest

Guest

Guest

Guest

Guest

TRENDING THREADS

Latest posts

Moderators online

Share this page