I have a server at home running Ubuntu Server 12.04.1. It's running on an i3 3220T with 4GB of RAM. I have a large case for it that has upwards of 8 HDD bays. Right now I'm running an OS drive and 2x500 HDDs. They're not RAID'd, and currently at midnight the main one rsync's to the other, providing me with a backup.
I've thought about getting a much larger RAID setup going and then putting one of these 500's into an external USB HDD enclosure, and just USB the most important bits over every now and then. The drives I'm looking at are four of the 2TB WD Green HDDs. I've done some reading but I'm still a little unsure with where to go. I was hoping to have a "think out loud" type of discussion, so here I am.
I'll be using mdadm with Linux (software RAID). My server does quite a bit for being a home server... OwnCloud, Subsonic music streaming, file, print, backup, full time video surveillance recording, etc. In time I might also host all of my media from this server and stream it to the HTPC upstairs.
RAID 5 - This isn't really appealing to me in the slightest, since if a single HDD goes down I'm kind of at the mercy of the array rebuild time.
RAID 6 - Sounds solid. Some people suggest that RAID 6's performance with read/write is lesser than other alternatives, but since the majority of what I'm using the server for is network based (gigabit LAN) and based on what I heard, even RAID 6 should have the capability to saturate a gigabit connection, it begins to negate the performance hit RAID 6 presents (if any). Using four 2TB drives, I'd get an effective storage space of 4TB, which is huge for me. On top of that, I get two parity drives. Sounds good...
RAID 10 - I understand RAID 10 is quite a bit faster than RAID 6, however RAID 10 has quite a coin flipping side of chance as well. If I lose a 2nd drive and it's from the same block, I could lose the entire array. If I luck out and lose a 2nd drive of a different block, at least I can move forward and recover the data. I also understand that using four 2TB drives with RAID 10 I'd get 4TB of usable space. So if the usable space with 6 and 10 is the same, the real difference will be performance and level of redundancy...
The more I think about it, the more RAID 6 sounds like the most logical option as it provides the best redundancy. I'm just not sure if I'm considering all of the pros and cons enough to really make the best choice here. I've also heard a lot about URE but I'm not sure I fully know what it is in conjunction with what kind of consequences it might come with.
What do you guys think? Is 6 the ticket with an external HDD for having a 2nd copy of the super important data? Maybe RAID 10 instead? Or no RAID with the nightly rsync's?
I'd second ZFS. It's far superior to conventional RAID. I've never actually used it on Linux, but I gather it can be done with a little work. Works a treat on FreeBSD. I think it would be advisable to use a little more RAM, say 8GB, particularly as it's so cheap nowadays.
That topic came up, but I spoke to a guy who's entire job revolves around high capacity storage integrations. His opinion was that ZFS, while awesome on BSD oriented systems, really isn't too terrific on Linux. I know RAM is pretty easy to come by but I just built this system, so I'd really like to stick with what I have if at all possible. It's already going to sting quite a bit to pick up the HDDs to begin with. Converting it from Linux to BSD just isn't going to be in the cards either. Really the options I'm looking at are either no RAID and just keep rsyncing drives (which gives me a sort of backup but zero redundancy), RAID 6, or RAID 10. Since I have some experience with mdadm, I at least know it's pretty rock solid, however I've only used it for mirrors. I did think about doing a 2x3TB mirror, but eh. I like the idea of having a more modular setup so I can add drives if needed.
EDIT - the more reading I do the more I continually see the URE risk being talked about. I understand if you get hit with a URE, your array just stops rebuilding and could be easily compromised right then and there. That being said, are there any RAID levels that prevent UREs from happening?
Part of me is really starting to lean towards running regular HDDs and rsyncing data. It's just a frustration that I was hoping to get at least a 4TB usable volume out of it... which I could buy a 4TB HDD... but eh. That price stings quite a bit... Having a RAID array I could build onto if needed and expand my space is why RAID looked attractive (along with the redundancy factor of it all). If URE is a decent risk and no RAID can circumvent it entirely, and considering the bacon I'd drop to pick up 4 HDDs, I'm beginning to wonder if it's worthwhile.
I don't have much experience in a demanding situation of software RAID, but hardware RAID is almost 100% reliable IMO. But you have to understand that neither RAID nor mirroring the disks by other means is not a backup strategy. It fails the main purpose of a backup, which is to recover accidently deleted or corrupted files. And if your server were to be destroyes - say it caught fire - well, you know what they say about eggs and baskets.
Redundancy (or reliability) is one thing; a backup strategy is something else altogether.
I know hardware RAID can be pretty awesome, but you're typically talking some serious bacon before you get into that. I've seen hardware RAID controllers die at work which take the entire RAID with it - something that isn't an issue with Linux mdadm software RAID. I've actually built a whole new computer and took over my array, installed mdadm, rebooted, set a mount point, and bingo bango - my array was running and alive again. Because of the versatility of mdadm, if I'm going to use ANY RAID, it'll be mdadm. That being said, I'm not trying to get into software vs hardware, but software vs no RAID at all. I'm just not sure if software RAID makes that much sense for me. I keep reading about this URE thing and the risk that comes with it and I begin to wonder why I would even consider it? It's braindead easy for me to move my backup drive to be the primary drive in a matter of 2-3 commands. That of course doesn't grant me any sort of actual reliability in terms of 24/7 uptime, but dang it would come without a potentially large headache, which is a bonus.
Someone recommended that I just do the RAID mirror and utilize another drive as a backup, which honestly doesn't seem so bad. The only thing is I'm looking at two volumes of 2TB storage space instead of one larger volume, but in reality I guess it doesn't matter as where I store the data won't be seen any way since you can't tell which samba shares exist where. That would give me two mirrored arrays, 2x2TB each.
Alright, I did quite a bit of digging over the weekend. I decided not to go with a 4x2TB HDD setup simply due to cost. I mean, this is for a home server, so some massive RAID array just feels like a fly/bazooka situation. That said, the cost is still the main thing pushing me away from it. I'm just not sure it makes sense for my uses. Anyway, I would however like some sort of redundancy, so I'm opting in for a mirror. Considering the sizes of HDDs these days, it becomes much more do-able to have a massive volume that is simply mirrored. I'm looking at getting a pair of WD Reds @ 3TB. I'll be using Linux mdadm software RAID.
Now, one thing I am a little curious about and I'm hoping somebody can clear this up. I understand that with something like a RAID 6, having two parities helps significantly against UREs. But what about a mirror? For example, let's say I'm running 2x3TB and one drive tanks. Okay, I replace the drive, install a new one, and sync up. Let's say I hit a URE. Then what? Is that just game over since I currently had no parity drive since I was in the sync process? Is my array toast at that point? Am I just stuck with having to blow away the array and redo everything and pull my data from a backup?
If I understand right, a URE during a RAID 1 re-sync can actually tank the array. This confuses me a bit because RAID 1 is meant for redundancy and staying online during a single drive failure. If URE's are more common in larger HDDs, I find it quite the kicker that your server is staying online and running with one drive but due to a URE you might have to go offline anyway to rebuild the array and pull the data from a backup.
Anyway, just curious what you guys think on the RAID 1/URE/re-sync matter. Thanks for your continued insight!