RAID Controller Volatility

commissar_mo

Distinguished
Jan 23, 2011
96
0
18,630
I haven't been able to find a good hardware document on RAID controllers and how they work exactly.

I'm wondering in particular how the volatile memory on the controller works - I've been told that the RAID meta data gets written onto the drives themselves, and isn't stored on the controller...

1. But the controller has a battery - what's that for?

2. More importantly, I recently lost a RAID 0 because I unplugged the drives (I believe this was why, and I've seen similar reports that this can happen; though I unplugged the drives while the system was OFF - which confuses me).

3. What exactly is going on mechanically when one unplugs the RAID controller, or the drives? And how (if it can) can this disrupt the array?

4. If I get a RAID controller card with a backup battery unit - will this solve the issues associated with unplugging the drives, or the controller card?


***Basically, I'm confused about volatility, batteries, and the rules concerning unplugging RAID drives/controller cards in a system, powered ON or OFF
 
My take follows. Be warned that I haven't tried these things in the last fifteen years, so I may well be off.

1. The battery, which is a fairly expensive option, is for the controller cache. The system may think that it has written to the array, but the writes may be in the cache of the controller when the system, or a drive, loses power. If the memory is protected with a battery, that cached data will be written when the power is restored.

2. If you unplugged the drive with power off, then applied power, I would expect you to lose the RAID. The question is: If you powered down, replaced the drive, and then powered up again, did it recognize the drive and re-establish the RAID?

If you unplugged the drive with power off, re-plugged it with the power still off, and then powered up and lost the RAID, that is odd.

3. What happens is that the pieces can no longer communicate with each other. If the system was shut down in an orderly fashion first, with all caches flushed including the cache on the disk drive, the you should be able to take them all apart, ship the pieces to Wisconsin, re-assemble them, and have it work. If you unplug them while it is running, that's pretty sure to have a bad effect, especially on the least resilient array, RAID0.


To ask a potentially annoying question, have you read the manual for your RAID controller? What model is it?
 

commissar_mo

Distinguished
Jan 23, 2011
96
0
18,630
1. Ok that makes sense - the battery protects the cache in the event of acute power loss.

3. Interestingly enough - that's exactly what I did... I shipped the computer (HDDs separate since I wanted them to be extra safe) to Ireland... reassembled it, and it died.

Actually - the RAID 0 was on the Intel mobo controller.


2. I did NOT know this (well I kind of do now since I did it... but now you're confirming it) I indeed unplugged the drives, shipped the stuff around the world, then plugged them back in and powered on.

Now despite my best efforts to label the drives, I believe I actually did reassemble them in the correct configuration... but here's a rub...

I use a Koolance liquid cooling system which needs to be primed when you refill it (this is accomplished by filling the resevoir and using a piece of wire to jumpstart the mobo power connectors so the pump (which is rigged into the power supply of the PC, will start moving liquid.

I performed this action BEFORE THE DRIVES were inserted. I was under the impression (according to Koolance instruction guides) that the 4th and 6th pins of the Mobo 24 pin connector which get jumped only applied power to the fans and peripheral elements of the machine... but I'm guessing now it IS possible that the RAID controller somehow got powered up as well, and didn't see any of the RAID 0 drives.

(presumably, I'm guessing that disconnecting, then reconnecting the drives in the WRONG order would also have the same effect as powering on with the drives not connected?)

** Now this RAID 0 that was lost (I managed to rebuild it) happened on the mobo Intel controller...

Are you saying that even my RAID 10 on HW controller (I have an Adaptec 2405) will suffer this same fate? I have yet to reconnect it and try to boot it on my newly recovered RAID 0. I'll be trying that in the next hour....

 
No idea.

When you rebuilt it, did it find all of the data? If so, that is great. Ireland is even further than Wisconsin, if you are starting from where I am.

You've gone far beyond what I've done playing with these things.
 
Every RAID controller has its own peculiarities - it really isn't possible to give a blanket answer to the kinds of questions you're asking. What you really have to do is to experiment with it yourself and make a note of what you discover so that you know what to expect when a real problem occurs.

Most casual people who set RAID up on their systems don't do this, and many end up paying the consequences when a drive fails and they can't figure out how to recover from it.
 

commissar_mo

Distinguished
Jan 23, 2011
96
0
18,630
From what I've been reading on the forums here and others, that definitely seems to be the case sminlal

It seems RAID is just not a very standardized area, and hardware familiarity is essential.

As a note - When I plugged my Adaptec 2405 back in to the motherboard (I had taken it out, powered up, down, etc. as I tried to fix the RAID 0 OS drive) it booted without a problem.

On the other hand... what if I (right now) flipped the drives around - i.e. took drive 0 and put it where drive 3 is, etc.

I might do it just to find out (I have backups). Hopefully because the RAID controller builds the array based on serial numbers of the drives (SSDs in this case) it shouldn't care which actual plug they're connected to - if indeed that's the case, mass points to the card makers - since that's definitely the way to go... having to tape and label plugs makes me feel like it's the first 10 years since the first pc came out or something... very primitive.

So apparently then, for this card in RAID 10 (4 drives), since I soft-powered down every time (i.e. I think I have to replace the battery - that's a whole 'nother issue - I assume I can do it in powered off mode no problem since the battery is essentially just for power loss caching... but have to consult Adaptec manual about that)...

You can detach it from the motherboard AND detach the drives from it - if you plug them back in correctly (need to check on whether the order matters) AND the mobo BIOS didn't die or get corrupted, etc. the drives will work.

Another interesting concept is (I believe you can do this also judging by others comments) you can take the controller/drives to another mobo and just plug it in - and the BIOS should boot it, since the RAID card has it's own boot loader which it reads...

I'll be doing all this testing - and hopefully I'll write it all up to post somewhere - does Tom's have a place where people can post self-help guides?

Also: Thanks everyone!
 

baddad

Distinguished
Oct 20, 2006
1,249
0
19,310
Jumping the the power connector to prime or purge the air out of your Koolance system has no affect on anything else. My experience with raid 5, 1 or 0 is that if you unplug them with the machine off and say move everything to a new case put them back in in the same order you will have no problem. With on board raid it's always a good idea to go into your bios setting and make sure that they are still setup correctly before you boot.
I've always marked or color coded my drives and double check the port they are connected to and have not had a problem moving drives. I also make a back up of each raid set on a single drive for cover my ass reasons. As others have said there appears to be no standard but there should be because using raid is becoming more main stream.
 

commissar_mo

Distinguished
Jan 23, 2011
96
0
18,630
Fair enough, sounds like I probably put the drives back in the incorrect order the first time... and when I booted perhaps it messed up the Intel controller?

I'm not sure that makes sense. I could just assume the causes are unknown, I'm just trying to figure it out so I don't do it again.

I'd hate to be worried about ever unplugging the drives again. I'll be testing it rigorously with the controller cards... and resolve to NEVER trust the motherboard controller cards again.
 

commissar_mo

Distinguished
Jan 23, 2011
96
0
18,630
Does anyone know if the backup battery unit on a RAID card saves the entire array from failure, or just the data it was writing when it died?

i.e. if a power failure could take out the entire RAID, I'll get a BBU. If however, the worst a power failure could do is lose immediate data that hadn't been written to the array yet... I won't.

 
Just the data that it had cached.

On the other hand, keep in mind that unflushed data can corrupt files or your entire filesystem, if it happens to be just the right unflushed data. How about an update to the directory structure, after a file is changed? That would be bad to lose.
 

commissar_mo

Distinguished
Jan 23, 2011
96
0
18,630
OK - good point. Of course, one could also just disable the write-cache buffers to the drives... I don't know how extensive the performance hit would be on SSDs, but I've read: "you'll feel like your drives were made in the 1980s"...

We'll see...