Help me interpret Adaptec 71605 reporting SMART errors (RAID6)

pushpull

Honorable
Oct 27, 2013
21
0
10,510
Hi you brilliant guys & girls, would much appreciate some input on this. I recently spent a vast amount of money rigging a new homeserver with the following relevant specs:
-Adaptec 71605 RAID controller
-14 x 6TB Western Digital Red disks (RAID6)
-Windows 2012 R2 Datacenter OS

My issue is that one of the physical disks is in Maxview Storage manager reporting the following:
-State: Optimal
-S.M.A.R.T. Error: Yes
-S.M.A.R.T. Warnings: 1


I am familiar with SMART values, however by looking at the values on the disk through Maxiew Storage Manager I do not see any actual errors, so I have no way of pinpointing what's causing the alert. Certain smart values can be ignored, on certain smart values you can live with a few fails but not a whole lot, on certain smart values the disk should be replaced right away.

So I contacted Adaptec support and they replied the following:
"SMART error interpretation is specific to the drive manufacturer. The controller only alerts you if a SMART error is received. For further information regarding the error you must contact the drive manufacturer or use their disk tools offline to test the drive. In the case of a RAID member reporting SMART, we recommend you fail the drive, replace it and take the drive with errors offline for further analysis."

So if I'm gonna listen to them I basically have to buy a brand new disk, replace the one reporting errors then run offline checks - only to potentially get no errors or an error that won't allow me to get a replacement on warranty. So what to do ??? The hardware itself was so expensive timing is bad to go out and purchase yet another disk. See screenshots below

Summary of disk reporting errors:
http://imgur.com/YZFWbSB

Smart values on disk page 1:
http://imgur.com/fqHIwT0

Smart values on disk page 2:
http://imgur.com/kNx8cau
(raw values on the right are both 0)

All suggestions are much, much appreciated. Are there ways to perform a check without taking the disk out of the RAID, or a way to pinpoint what the alert actually is without messing with my RAID? What would you guys do??? In advance, thank you :)

 
Solution
The way that the 3 main SMART error work when you look at them.

Reallocated Sectors: These are sectors that are extra in case of bad sectors. When this number goes up it means that you had failing sectors else were but it was able to recovery the data and moved it.

Pending Sector Count: There is a sector that it is having issues reading and writing from. It is trying to get the data off that sector and move it to a reallocated sector.

Uncorrectable Sector Count: A sector that is bad and no data can be recovered. This is your typical Bad Sector.

This is how things work.

A sector starts to go bad. It is now marked as Pending Sector. If it recovers the data it goes to a Reallocated Sector.

So what happened was Pending went up...
Hmm I don't really see any kind of SMART errors there though?

Since they are SATA drives and since you are in a RAID 6 i would try this.

Power off the server. Remove the drive reporting the error. Plug it into another PC and then run another SMART Program. I use Crystal Disk info but just started to use Hard Drive Sentenial since it also does SAS and SCSI which this one does not.

See if anything pops up. This shouldn't mess up your raid since 1) the Server is powered off and 2) as long as you don't write any data to the hard drive or format it putting it back after you are done should not cause anything to rebuild.
 

pushpull

Honorable
Oct 27, 2013
21
0
10,510
hi and thanks for your reply. That's the thing, I don't see any errors myself, although running the offline tests on another box while server is powered down is a brilliant idea. Will keep thread updated. Thanks!
 

pushpull

Honorable
Oct 27, 2013
21
0
10,510
Hi again guys, I was planning on doing as drtweak suggested, but been quite busy and I didn't think there was any major rush since these were warning and the disk status was "optimal". I did see a second warning around 1 week later, and it was like this for 1+ week until I checked in again and they are all gone??!!? How is that even possible, or how would you guys interpret this ? I mean smart errors are there on the disk regardless of reboots etc. As of right now all disks are green again and the disk that previously had 2 smart warnings now looks totally fine, and reports 0 warnings:
http://imgur.com/yS11JRg

Should I consider this a flaw in the adaptec software and just leave things the way it is, or would you guys still remove the disk to run tests? I use the box constantly 24/7 so ideally I don't want to take it down for days during a diskcheck if it's not needed. Appreciate all input. In advance, thank you.
 
The way that the 3 main SMART error work when you look at them.

Reallocated Sectors: These are sectors that are extra in case of bad sectors. When this number goes up it means that you had failing sectors else were but it was able to recovery the data and moved it.

Pending Sector Count: There is a sector that it is having issues reading and writing from. It is trying to get the data off that sector and move it to a reallocated sector.

Uncorrectable Sector Count: A sector that is bad and no data can be recovered. This is your typical Bad Sector.

This is how things work.

A sector starts to go bad. It is now marked as Pending Sector. If it recovers the data it goes to a Reallocated Sector.

So what happened was Pending went up by 1. Then it recovered and move it to reallocated. Pending now went down 1 and Reallocated went up 1.

This could have been what happened. I would still suggest pulling the drive out and checking the Smart STATUS on another PC while the drive is not on the RAID card.
 
Solution