2 RAIDs, 1 chassis

gbm_14

Distinguished
Dec 8, 2010
2
0
18,510
If I build two 8-drive RAIDs, each connected to its own controller, in the same server chassis, will they be more or less reliable if all of the drives spin at the same rate? The answer is not obvious, and here's why:

It's best if all drives in a RAID have the same capacity and rotational speed, because (all other things being equal) the RAID will be faster and larger than otherwise. I don't need to be convinced on this point.

It's also true that RAIDs are best built with drives that create as little vibration as possible, and that can tolerate vibration better than average drives, because excess vibration causes bearing wear and may lead to premature drive failure.

With 16 drives in one chassis, all spinning at almost exactly the same rate, it seems plausible to this lapsed physicist that the peak amplitude of vibrations created by the drives is potentially higher than if there are 8 drives spinning at each of two different rates. But is the difference enough to affect the reliability of the servers, or is it
insignificant, or even cancelled out by something else I haven't considered?

The question is not an idle one, because I need to configure not just one but a rack full of such servers. I'm looking for reliability (because the servers will be physically inaccessible for long intervals), not raw speed, and cost is definitely an issue (otherwise I'd be using terabytes of SSDs!). I'm fortunate to have a climate-controlled space with good power and adequate UPSs. I have the chance to build these from scratch and I'd like to get it right the first time.

My googling suggests this is not a frequently asked question. I haven't found anything relevant -- although there is no shortage of advice to avoid mixing drives of different speeds in a single RAID, which is not what I want to do.

Any relevant advice or experience would be very welcome. I'm particularly interested to know if anyone has actually built servers using multiple RAIDs spinning at different speeds, and if there is any evidence that doing so influences reliability, one way or the other.

Thanks!
 

lysinger

Distinguished
Nov 26, 2010
101
0
18,690
This is a doozy of a question.

What exactly are you going to use a rack of these servers for?

Let me first say that I have not put 2 raids of different speed drives in one server.

If I remember rightly, the vibrational amplitude you are concerned about would only take place if all your drives are synced to spin at the same time. Some RAID controllers can do this but your drives need to support the feature as well. Some SCSI drives can do this and I can assume that SAS drives can do this. I am not sure SATA drives can.

Some additional things to consider:

The heat of 16 drives is going to require some extra fans blowing on the drives to ensure proper cooling. Assuming 10w of heat for one WD Caviar black drive (I run 3 in a RAID 5 and that's their peak wattage consumed) that would be 160W of potential heat (probably less but you do need to keep things cool).

If you mixed drives with other speed drives you could have 5400rpm, 7200rpm, 10K drives or 15K drives. The slower drives put out less heat and less vibration. 10K and 15K drivers tended to wear out sooner due to faster motors.

You could use some rubber washers to reduce the vibrations for each drive you bolt into a chassis but you run into a problem. Swapping out the drive when it goes bad.

Do you shut down your server and remove and replace a drive, or do you get a hot swap chassis and pull the drive and put in a new drive and let the RAID array rebuild itself? The hot swap array costs more, but what is the time cost of shutting down a server?

The next thing to consider are your drives. Do you go with SATA or SAS? SAS drives are built to be sturdier and handle larger workloads so they say. SAN engineers will tell you that you can use SATA drives and they work great, but when one drive goes, the next drive or two tend to die when you slap in the replacement drive and the RAID array is in the middle of rebuilding. I don't know how true that is today as that info is 5 years old and they wanted to sell us SAS drives at a MUCH higher price point at that time.

You may want to consider a SAN and a blade server chassis. They have ridiculous amounts of capacity and redundancy depending on budget, direct interface capability into tape backup units, etc. iSCSI netbooting is fast over fiber channel or 10gb Ethernet. You can use 1gb Ethernet if you are on a budget and speed is not an issue. The list goes on.

 

gbm_14

Distinguished
Dec 8, 2010
2
0
18,510
lysinger, thanks for your thoughtful and detailed reply. What I eventually decided was to use 7200 rpm enterprise drives throughout, since I can't find any evidence that mixing the speeds will make a measurable difference in drive failure rate, and choosing a single speed means my spare-parts inventory can be smaller. If anyone does ever try mixing speeds, I'd still like to know what happens. With any luck, though, SSDs will get cheap enough so I won't need rotating drives to replace these in 5 years!

You raised a number of interesting points to which I've added my comments below, even though my original question is moot. Thanks again for writing.

What exactly are you going to use a rack of these servers for?

They log (large amounts of) experimental data and make them available for remote retrieval. My application is medical research, but you can easily imagine many other situations in which one might want high-reliability data logging in relatively inaccessible settings, such as weather, oceanographic, geophysical, or astronomical studies.

If I remember rightly, the vibrational amplitude you are concerned about would only take place if all your drives are synced to spin at the same time.

This could result in sustained vibration at a high amplitude if the drives were not only in sync but also wobbled in phase. (For the same reason, soldiers don't march in step across bridges.) Without the drives being in sync, their slightly different rotational speeds result in their individual wobbles adding together in a time-varying way (you may recall the concept of "beat frequency" from high-school physics or from learning how to tune a stringed instrument). If you have a RAID of unsynced drives, it's easy to observe this by touching the enclosure for a minute or so. My guess (and it's just a guess -- that's why I asked the original question) is that the peak amplitude of vibration is a more critical factor in drive failure than mean amplitude.

The heat of 16 drives is going to require some extra fans blowing on the drives to ensure proper cooling.

That's certainly true. Anyone considering something similar should do the
math and be sure they have fans that can provide adequate airflow, and
they should be monitoring the temperature in the enclosure to be sure the fans are doing their job.

If you mixed drives with other speed drives you could have 5400rpm, 7200rpm, 10K drives or 15K drives. The slower drives put out less heat and less vibration. 10K and 15K drivers tended to wear out sooner due to faster motors.

Agreed. In my application, reliability is of highest importance and even the slowest available drives are faster than needed.

You could use some rubber washers to reduce the vibrations for each drive you bolt into a chassis but you run into a problem. Swapping out the drive when it goes bad.

I've used the rubber (or silicone) grommets to reduce noise in my workstation, and they work. In rack-mounted servers the drives are usually packed together very tightly and there may be no room for grommets -- and more importantly, drive cooling in these machines relies on large contact areas between the drives and the thermally conductive drive sleds, so I worry that even if I can use grommets, I may be trading one problem for another. The hot-swap drive sleds make replacing failing/failed drives very easy.

Do you shut down your server and remove and replace a drive, or do you get a hot swap chassis and pull the drive and put in a new drive and let the RAID array rebuild itself? The hot swap array costs more, but what is the time cost of shutting down a server?

In my case, using a hot-swap chassis is a no-brainer. The extra cost is about $20/drive, and replacing a drive takes a minute or two at most, with no downtime. With a conventional chassis that requires taking the server out of the rack to swap a drive, it's more like 20-30 minutes of work, and downtime. If either your time or your server's uptime is worth anything, the cost is worth it. I figure that I could use the $20/drive to buy 20% more drives at $100 each, but (once again) reliability is the issue. YMMV.

The next thing to consider are your drives. Do you go with SATA or SAS? SAS drives are built to be sturdier and handle larger workloads so they say. SAN engineers will tell you that you can use SATA drives and they work great, but when one drive goes, the next drive or two tend to die when you slap in the replacement drive and the RAID array is in the middle of rebuilding. I don't know how true that is today as that info is 5 years old and they wanted to sell us SAS drives at a MUCH higher price point at that time.

I looked briefly at SAS, but didn't seriously consider it because of the cost. The same physical drives sold with SATA interfaces are often available with SAS interfaces, so it's not a given that a SAS drive is more robust than a similar SATA drive, although this might be true of drives designed specifically for SAS interfaces, if there are any.

AFAIK, SAS drives might offer a speed advantage (although both SATA and SAS drives top out at 6 gigabits per second, much faster than I need). If I needed more speed, I would want to consider hybrid SSD-cached SATA RAIDs, which appear to be faster and cheaper than SAS and much cheaper than an all-SSD solution.

You may want to consider a SAN and a blade server chassis. They have ridiculous amounts of capacity and redundancy depending on budget, direct interface capability into tape backup units, etc. iSCSI netbooting is fast over fiber channel or 10gb Ethernet. You can use 1gb Ethernet if you are on a budget and speed is not an issue. The list goes on.

I use several NAS units over 1 gbps ethernet for backups. I haven't used tape backup for years (too expensive, too slow, too little capacity, too fragile, too labor-intensive, but aside from that, I've nothing against them :) ).