NAS server and failing drives

miken2137

Honorable
Jul 20, 2012
31
0
10,540
Greetings,

I am building a NAS server for my company, but I'm running into a problem with the HDDs failing, or at least not being read by the system. Here are the specs:

SERVER CHASSIS: Norco RPC-4224 http://www.newegg.com/Product/Product.aspx?Item=N82E16811219038]
MOTHERBOARD: Asus Z8NA-D6C http://www.newegg.com/Product/Product.aspx?Item=N82E16813131378]
PSU: Athena 2 x 800W Redundant http://www.newegg.com/Product/Product.aspx?Item=N82E16817338062&Tpk=ATHENA AP-RRP4ATX6808]
PROCESSOR: Intel Xeon E5645 http://www.amazon.com/Intel-E5645-P...=1348586998&sr=8-1&keywords=Intel+Xeon+E5645]
HDD: Seagate Barracuda ST2000DM001 2TB http://www.newegg.com/Product/Product.aspx?Item=N82E16822148834]
RAID CARD: Areca ARC-1231ML-2G http://www.newegg.com/Product/Product.aspx?Item=N82E16816151033]

and we're running this on FreeNas, with the drives in a Raid 6 array. Basically what happens is everything runs fine for a while, then one day multiple drives will be down. At first 5 drives "failed", so we RMA'd them. Once we got that batch back, 2 of those 5 and 2 more from the 12 "failed". I sent all 12 drives back and they sent me 12 new drives. Now I'm at a point where I don't know if I should attempt to use these 12 drives, or replace them with a different brand? Perhaps it isn't even drive failure, but another piece of hardware, perhaps the raid card? I don't even know where to begin to test this, so any input would be greatly appreciated.
 
Solution
There generally should be no issue running desktop drives in a raid environment, but that being said there are certain models that have been known to not play well. The aforementioned WD drives would time out and cause problems in raid.

I built a 24 drive 40tb raid 6 system for the business I work at, using standard Hitachi GST Deskstar 7K2000 HDS722020ALA330 drives. The system has been running non stop with no failed drives, the only issue i had was a firmware related bug for xfs filesystems on the 3ware 9650se 24ml raid controller. I applied a driver, firmware update, and repaired the few small corrupt files there were, problem gone.

It would be nice to be able to buy enterprise drives for all of our needs, or for that fact...

ELMO_2006

Honorable
Aug 29, 2012
368
0
10,810


It sounds like the drives are being dropped as they are timing out (due to error recovery) and as such the array will drop those drives as the threshold has been exceeded - similar to the WD TLER set at the HDD firmware level.

I would seriously look into HDD's that are designed for your requirements. The Seagate's that you have listed are desktop HDD's and as such should not be used in anything other than RAID0/1.

Hope this helps.
 

njxc500

Distinguished
Apr 30, 2008
181
0
18,710
Is there a reason you're running a RAID card instead of straight ZFS with Raid-Z or Raid-Z2? Freenas is super popular because of these options, and considering that your issues may come back to compatibility, the solution may be to simplify.

I think that could be an area of potential conflict, the areca card could be interpreting something wrong. Yes 5 drives "could" fail, but I doubt it. Something isn't playing well. I've had two freenas servers up for a couple years with desktop drives and ZFS, I've had no failures yet other than the operating system drive, but that was an easy fix. I'm running 5 drives in one unit, 4 in the other.

Have you checked out the logs in Freenas? There may be clues in there.

Nick

 

vegettonox

Distinguished
Oct 11, 2006
617
0
19,060
There generally should be no issue running desktop drives in a raid environment, but that being said there are certain models that have been known to not play well. The aforementioned WD drives would time out and cause problems in raid.

I built a 24 drive 40tb raid 6 system for the business I work at, using standard Hitachi GST Deskstar 7K2000 HDS722020ALA330 drives. The system has been running non stop with no failed drives, the only issue i had was a firmware related bug for xfs filesystems on the 3ware 9650se 24ml raid controller. I applied a driver, firmware update, and repaired the few small corrupt files there were, problem gone.

It would be nice to be able to buy enterprise drives for all of our needs, or for that fact, multiple systems with redundancies built in, but sometimes small companies can't afford it. I read a report a while back from a German or Russian, I don't recall, data recovery service, on this site, that stated there is no evidence to claim that enterprise drives are in any way more durable or lasting than standard models.

I will tell you this however, before putting this system into operation I systematically tested every single drive with the manufacturers diagnostics software before putting it into operation. I would suspect a cable, or your raid controller to be honest with you, try openfiler, the linux nas distro I use on my home and business raid arrays, absolutely fantastic, use version 2.3 if you can.

I recommend you test these "failed" drives with diagnostics software before you label them as such, if they are indeed all going bad there's probably something else afoot.

In response to a statement from njxc500, regarding why he is using a raid controller instead of zfs. You do realize he is using a roughly 24 port raid chassis right, 99 percent or motherboards don't have that many connections for a start. Raid controllers have dedicated hardware and specialized software to handle all the background tasks, they appear as a single drive to software. You can move raid controllers around through different systems and software without altering the data in any way. If he chose to go with another software distribution later on he could simply install that and configure it.

Software raid has major drawbacks like, speed, throughput, data consistency, data protection (with battery backups), and lets not forget that the entire time the systems processor has to handle all the requests.

In case anyone was wondering here is what our system is using for hardware

24 x Hitachi GST Deskstar 7K2000 HDS722020ALA330
3ware 9650SE-24M8 PCI Express x 8 SATA II (3.0Gb/s) Red Hot RAID 6 Controller Card
Intel Xeon X3360 Yorkfield 2.83GHz
2x Kingston ValueRAM 4GB (2 x 2GB) 240-Pin DDR2 SDRAM ECC
Intel S3210SHLX Server Motherboard

Chenbro RM51924M2-R1350G Server Case (awesome btw)
http://usa.chenbro.com/corporatesite/products_detail.php?sku=44

We use a domain environment with active directory that is connected through openfiler for share administration, file level permissions are controlled by the windows server from this point. Openfiler is a really excellent distribution although I know its not perfect for everyone.
 
Solution