Sign in with
Sign up | Sign in
Your question

Need Hot Swap Backup advice

Last response: in Storage
Share
February 9, 2011 6:40:14 PM

Hello, Tom's Hardware forums!

I need a bit of advice. I'm organizing a backup solution for a small business, and I need someone to reality-check my idea for me.

The business has very few people (<20), so if someone's hard drive fails catastrophically, it's a huge issue. I want to minimize an employee's downtime as much as possible, understandably.

My idea is to have a huge rack/array of hard drives, with each employee having their own dedicated backup drive. Every week, we will run a backup program (Acronis True Image Home 2011, most likely) that clones the user's hard drive over the network to their dedicated drive. The goal of this is that if someone (let's call him Bob)'s hard drive fails, crashes, BSODs, explodes, has coffee spilled on it, etc...he can simply walk over to the server room, grab Bob's Hard Drive out of the hot-swap array, walk back to his computer, plug it in, and be up and running instantly.

We've got about 10 desktops that need backing up, with hard drive sizes ranging from 75GB to 1TB. Some of the computers are IDE, while others are SATA, so I need some sort of mounting solution that accommodates both types.


Am I a genius? Am I crazy? Does this make sense? Is this possible?

Let me know what you think!

More about : hot swap backup advice

a c 415 G Storage
February 9, 2011 7:14:00 PM

Are you saying you want swappable drives in the desktops too? I don't see how your solution would work unless the desktops are configured to boot right off a swappable drive. If that's true, then you're going to be retrofitting the desktops with a swappable solution so you could probably standardize on the interface type at the same time - in that case eSATA seems like it's likely to be the easiest solution.

If this is the only backup that's going to be made of these machines then IMHO it's not such a great idea. The cloned drives would only be a recent image and wouldn't provide a way for a user to recover from a file he accidentally deleted or screwed up more than a week ago. And it doesn't sound like it provides for offsite copies of the backups to recover from a common-mode failure (like theft, for example) that affects all the drives.
m
0
l
a c 172 G Storage
February 9, 2011 7:41:42 PM

1) Your backups will not be very current, perhaps a week old. Is this OK? Can a week's work be recovered quickly? I suspect not.
2) Clone backups tale longer than incremental backups.
3) If a drive is virus infected, how would you know to not back it up to it's clone and overwrite a good backup?
4) I do not think it is a good idea to have possibly non tech savvy people walking int a server room and taking out hardware.
5) Is there any need for business security? Theft of a hard drive would be easy to do.

-------------------------------------------

I suggest a more conventional solution.
Take an image copy and then do incremental backups frequently, according to recovery requirements..

Make some sort of plan for offsite storage to cover the possibility of premises damage, such as fire.

If recovery is required, you can then do it by file, or the complete drive.

Have a couple of good spare drives of each type available if a hard drive should fail.
m
0
l
Related resources
February 9, 2011 7:47:11 PM

The drives in the desktops will be swappable in that the (hypothetically) failed hard drive can be removed, and the cloned hard drive can be plugged into the same cables. I suppose that's just plain-old swapping, not hot-swapping. I asked about hot-swapping so that if his drive fails and Bob walks into the server room, he doesn't have to shut down the whole backup server in order to pull his backup hard drive out.

The main objective of this backup solution is to provide fast bounce-back in the event of drive failure. In the past, several employees have had drives fail on them, which has cost thousands of dollars in lost productivity (and any WIP projects).

I was brought in to come up with a solution that lets an employee resume work within minutes instead of days, with a minimum of lost time/effort. The weekly-backup timeframe is admittedly arbitrary. I just thought that running backups over the weekend would be convenient, so that computer resources weren't taken up by running the backup during the week.



True, it wouldn't prevent against physical threats of theft or fire unless we were to make TWO weekly clones of each machine, and store one of the clones off-site. However, that sounds to me like it would then take twice as long to back up a user's machine.

---------------------------------------------------------------------------------------------------------------------

EDIT: Ooh, another reply while I was typing. You guys are fast!

Geofelt:
1) A week's work can be recovered more quickly than rebuilding from a total meltdown.

2) True, but incremental backups (to my knowledge) aren't directly bootable, are they?

3) I've been running around all week sticking Avast Antivirus on everybody's workstations. Hopefully that should guard against that.

4) Admittedly, this is Silicon Valley and our company is full of electronics engineers, so the default level of tech-savviness is fairly high. That's still a good point, but what would the alternative be? Having me drop everything and run down to the office every time a drive fails just so I can unplug it from one box and plug it into another? I feel like it's not THAT hard for a layperson to do.

4a) (....said the master juggler to his audience. :??:  )

5) The office building has lots of locks and security on it, and the little office we've rented has its own lock. To my knowledge, physical security isn't an issue.
m
0
l
a b G Storage
February 9, 2011 8:07:43 PM

If these are all desktops you could mirror the drives on the machines and there wouldn't be any downtime from a failed drive. This wouldn't help with other computer failures and doesn't replace the need for proper off-site backups.
m
0
l
a c 172 G Storage
February 9, 2011 8:20:02 PM

Many hard drives have a mean time to failure of >1,000,000 hours.
That is over 100 years. If you are having many failures, get some new drives, and keep them cool.

If you want the cold swap capability, install mobile racks in the 5 1/2 drive bays of each PC. You can then put in the clone and boot.

Or...
If you have room for two, put a second drive in there, and let each individual determine how often they want to do a part or full backup.
m
0
l
a c 415 G Storage
February 9, 2011 9:51:20 PM

I agree with Jim - if all you're trying to do is to protect against drive failure, RAID 1 is the obvious solution. It'll be simpler than dealing with network cloning, will completely eliminate downtime due to drive failure, and the cost should be minimal if the desktop systems have chipsets that support "fake RAID". Even if you need buy RAID controllers for the desktops the cost for 10 of them is pretty small compared to the downtime costs you're talking about.

But if you go that route make sure you have some software in place that will report drive failures to the IT staff so that they can schedule replacement of the drives. RAID-1 that's unmonitored isn't particularly useful.

And don't forget that if there are important files stored on the client machines then you still need to back them up.
m
0
l
a c 115 G Storage
February 9, 2011 10:54:12 PM

Doesn't make any sense to me. An NAS or server in RAID 1 solves this issue easily and cheaply. Do all the people run the same programs ? Same Hardware ? Data stored locally or on central location ?
m
0
l
February 9, 2011 10:57:40 PM

A quick Wikipedia perusal later, I have to agree that RAID 1 sounds a lot like what I'm going for. From what I understand, anything and everything that happens on Hard Disk A gets instantly mirrored onto Hard Disk B. Is that right?

A lot of the machines in our office are old. Only one or two are running Windows 7, while the rest are lumbering around with Windows XP SP2 or 3. Will that still support fancy RAID stuff?

Can RAID 1 be done over a LAN?
m
0
l
a c 415 G Storage
February 9, 2011 11:06:27 PM

Oh dear. I don't mean to be condescending, but if you need to look up RAID in Wikipedia then I'm not sure you're the best person to be designing a fault tolerance strategy for a business...

You certainly don't want to get too fancy. I'd suggest using the RAID capabilities built into the motherboard for the newer machines, and getting cheap RAID cards that have XP drivers available for the older ones. That's probably the path that's the least likely to get you into trouble.

Be sure to test and document the recovery procedures for each system before you commit live data to it. The last thing you want is for a technician to replace the wrong drive after a drive failure.
m
0
l
February 10, 2011 12:02:10 AM

I know what RAID *IS*, but having never actually used it in my personal computer at home, I don't know the ins-and-outs of every RAID variant. I'll also admit that I'm doing this at entry-level, not really any sort of IT guru.

I'll keep poking around overnight and tomorrow, and make a few more posts then as I continue to wrangle my way through this. Everyone's got to start somewhere, right?

Thanks, sminlal and everyone!
m
0
l
February 10, 2011 3:27:09 PM

Another day, and some more research.

I talked with my administrator and he agrees that RAID 1 is a great idea. Unlike the various software backup solutions, a RAID 1 array is automatic, constant, doesn't require a great amount of fiddling, doesn't require purchases of a billion expensive licenses, etc. As we understand it, you just plug it in, do a little bit of config, and just go. The only concern that he brought up is that it might slow down performance on some of these machines because it's actively maintaining two disks at once. Given that some machines have as little as 256MB RAM (reducing me to fits of rage when I try to open a simple web browser), this is certainly a concern. However, when stacked against the benefits RAID 1 can offer, it's minor at best.

I also brought up your points about fire, theft, etc. He agreed that those are certainly legitimate concerns, but our largest priority by far is simply recovering from a failure with a minimum of downtime.

The more I look into it and discuss it, RAID 1 sounds like the perfect solution we're looking for. Props, sminlal!

Now, I'm looking around Newegg and I'm thinking not a lot of purchases will be required to get this done. A SATA/ATA RAID controller card, duplicate drives for the various desktops, and (possibly) some sort of movable drive bay so that the HDs can be swapped quickly without cracking open the case.

To wit:

http://www.newegg.com/Product/Product.aspx?Item=N82E168... For our 3 IDE machines

http://www.newegg.com/Product/Product.aspx?Item=N82E168... For our 5 SATA machines

I'm looking around for some kind of mobile/removable HD enclosure for a 5.25 bay that would let the user swap disks quickly in the event of a failure, without having to open the case up. I'm not finding anything on Newegg, but I admit I don't know how to precisely search for that.

However, I've got a nagging sense of unease....Does a Raid 1 array automatically swap to the other disk in case of the primary disk failing? Or does that failure get mirrored as part of the RAID process?

It also occurs to me that any viruses we get would get automatically mirrored to the backup drive, so we'll need a REALLY good antivirus solution. Does the free edition of Avast work well enough? Or are there other programs you'd recommend instead?
m
0
l
a c 415 G Storage
February 10, 2011 4:05:43 PM

I'd buy only ONE of each of these cards and test it before doing anything else. You want to make sure that the system can boot from it even when a drive is degraded - that depends on whether the cards have a BIOS which supports that capability.

Don't skimp on the testing and documentation of recovery procedures - doing so will assure you of ugly problems down the road. I'd still be leery of letting the users swap their own drives in the event of failure, whether they're technical or not.

RAID 1 is slower for writes but it SHOULD be faster for reads since the system can read information from the disk whose heads are closest to the requested data. That's another reason to buy only one of each RAID card - to make sure it doesn't give you performance issues.

And yes, RAID protects ONLY against drive failure, it doesn't protect against viruses, accidental deletion, etc. I'll say it again: if the client systems have important data on them then you really need a backup strategy for them in addition to RAID.

Here's a swappable drive bay for a SATA drive: http://www.ncix.com/products/?sku=39274&vpn=easySATA&ma...

IDE bays are more complex because the IDE and power connectors aren't guaranteed to be in the same position on every drive, so you typically need a "bay" in the computer and a "drive carrier" that the drive goes into. That's going to be harder to find now that IDE drives are largely obsolete. I think you'll find it easier just to buy SATA RAID controllers and switch everyone over to SATA drives.
m
0
l
February 10, 2011 4:53:17 PM

Theoretically, if RAID 1 automatically switches disks when one disk fails, then there would be no need for the user to swap their own hard drives. However, I don't know for certain if RAID 1 actually does that.

I've been thinking about the old computers in the office with IDE drives and I'm beginning to conclude that trying to retrofit them with RAID setups would be a big pain, not to mention time spent wrangling and setting it all up. It might just be a better idea to buy new computers, plug in the old drives, and then set those up with the RAID.

On top of all this, all this RAID is great for our desktops, but we've got at least 2 employees who use laptops that have critical data. I'm sure I can run Acronis True Image on those to back them up to our NAS server, but that still requires a lengthy rebuild/recovery time to a new drive in case of failure/theft. Perhaps it would be a better idea to simply move everyone to desktop workstations. I'll think on that.



EDIT: Moving everyone over to SATA sounds better and better the more I think about it. One of the newer desktops in our office is an Acer AX3910 ( http://www.newegg.com/Product/Product.aspx?Item=N82E168... ), which is a small form factor. I'm searching around Newegg for a SATA Raid card that might fit it, but most cards I see either look designed for full desktop form-factors, or there's no information about its size/formfactor that I can find.


EDIT 2: Did some more poking, and I've found something possibly suitable:

http://www.newegg.com/Product/Product.aspx?Item=N82E168...

Among the listed features, they say "RAID 1+S (Mirrored-sparing) automatically replaces a failed hard drive and rebuilds the system when the booting hard drive has failed". That SOUNDS great, but I've never heard of RAID 1+S in my life. Is it the same as RAID 1+0?
m
0
l
a c 415 G Storage
February 10, 2011 8:10:22 PM

There are a few laptops with dual bays for hard drives which I'm guessing you could set up with RAID 1, but it's obviously not going to be hot swap.

Every RAID-1 system should be able to keep working if one drive fails. It's not a matter of "swapping" to the remaining drive, it's simply a matter of continuing to use only that drive since it already has all of the data on it.

The "swapping" part comes after a drive has failed and you need to replace it - that allows the RAID controller to copy data from the remaining good drive to the new, replaced drive so that you again have two copies. The problem is that if you "swap" the wrong drive then you've screwed yourself. That's why I think it's a mistake to leave that in the hands of the users.

"Mirrored sparing" means that you have THREE drives in the system - two for mirrored copies of the data and a third, idle drive that sits there doing nothing. If one of the mirrored drives fails, the RAID controller immediately replaces it with the spare drive and starts copying data from the good drive over to it. That minimizes the amount of time the system is at risk due to having only one good copy of the data, but it doesn't eliminate the requirement to go in there at some point and replace the failed drive.
m
0
l
February 10, 2011 9:03:01 PM

Geofelt: SSDs are very nifty indeed (I have one at home for my gaming computer that has the OS on it, with a 1TB holding my actual games). However, I don't think mass deployment of SSDs would be cost-efficient here. Using SSDs would require that every drive in the RAID array also be an SSD, which would get expensive. Also, most of the machines around here are still fairly slow, so even if the HD is blazingly fast, there's not much that can be done if it's limping along with 256-512MB of RAM.

---------------------

Sminlal: I see what you mean about swapping the wrong drive. As I understand it, you need EXACT clones of drives in order to do RAID properly, right down to the individual model number. That makes sense.

"Mirrored sparing" sounds very interesting indeed. The automatic swapping to the empty drive is definitely good, because it means that in the event of a failure then the RAID array does my job for me (going in and doing the driveswap). Then while I place an order for a replacement drive for the failed one, the employee can continue working, and can also continue to be RAID-mirrored.

I talked with my admin about it, and he pointed out that a three-drive RAID bay would require a big honking tower chassis to support, which most of the employees (save for one, who brought his own beastly rig from home) don't have. He also pointed out that it'd be expensive. I told him about how the Mirrored Sparing would continue to protect the employee's data while they took their time replacing the failed drive, and he just shrugged and replied that they'd just make sure they responded and replaced it quickly when it happened.


I opened up the Acer AX3910 and found to my dismay that the unit was designed so compactly that there was absolutely no room for any other drives inside. However, many of the RAID controllers I'm looking at have external ports, which would (theoretically) let me hook up an external hard drive via SATA and still accomplish the RAID 1.
m
0
l
February 11, 2011 12:00:22 AM

Well, orders have been placed for a RAID card, extra HD, and enclosure. When they arrive on Monday we'll test this whole idea out and see if it flies.

Fingers crossed!
m
0
l
February 11, 2011 2:30:47 PM

Finding an extra HD for that Acer was extremely hard, since it was a pre-built box and not OEM. If HDs from premade machines are going to become unavailable this quickly (he only bought the machine 3 weeks ago), this could become a serious problem.

In a RAID 1 array, is it absolutely required that the second HD be a perfect duplicate of the first one? Or can you just get a hard drive of the same size from the same manufacturer and call it good?
m
0
l
a c 415 G Storage
February 12, 2011 5:48:32 AM

It depends on a RAID controller, but generally the drives in a RAID set don't have to be identical. They can be different models, manufacturers, or even sizes. In a RAID-1 set what typically happens is that if one drive is bigger than another the excess space on the larger drive is simply not used. If you have to replace a drive, you can typically use any drive as long as it's at least as large as the drive being replaced.
m
0
l
!