RAID 10 Hardware Card MAJOR POTENTIAL FAIL | Questions
Tags:
- Hardware
- NAS / RAID
-
Storage
Last response: in Storage
commissarmo
August 11, 2014 8:34:05 PM
I run a RAID 10 on an LSi hardware controller with 4xSSDs for my OS (Win7).
For reasons I simply cannot determine, I powered the machine down, moved the computer around a bit to reposition it under a desk, and when I rebooted the Motherboard sounded an alarm with a code which told me it couldn't detect the OS volume.
Further, the hardware RAID card boot told me the volume was degraded. After MUCH trouble (involving the USB mouse not working) I managed to load the WebBIOS utility for the RAID, and it claimed one of the SSDs was failed. I have a feeling this wasn't true, as it seemingly makes little sense (I had another RAID 10 fail just a few weeks ago, and it occurred during a reboot which seems suspicious).
1. I went to rebuild the RAID10, removed Drive 1 which said was "BAD" (I have them labeled matching the stickers that come with the splitter). But it freaked out and reported TWO DRIVES WERE NOW 'BAD'. It apparently numbers the drives the usual way 0,1,2,3 rather than 1,2,3,4 which is what the splitter sticker says - I neglected to check the numbering before pulling the drive.
2. Horrifically, I assume by doing this I simulated a powered-on DUAL DISK FAILURE which destroyed the RAID 10 ARRAY, even though only 1 drive was actually bad and I was attempting to rebuild it.
3. I have rebuilt arrays before and only passingly considered that pulling the wrong cable could destroy the array - is this true, or am I mistaken? Are RAIDs so fragile as this, or did I do something else wrong?
Is it actually possible that after all the efforts undertaken to create a redundant up-time system, I killed my RAID 10 by misidentifying the 0 and 1 disk to replace? (I have 3 sets of labels on the disks to specifically ensure this didn't happen, and I assumed LSi labeled starting with 1 because of the splitter cable they provide (my Adaptec numbers from 0-3, and is labeled as such).
4. Now the machine posts the RAID controller, went into the utility, 2 disks are reported dead, the virtual drive is gone, and ALL disks are reported as FOREIGN.
5. If indeed, as I believe I have answered all my own questions above (but would love a reality check), are there any recovery options for this OS volume in terms of some data?
*I did not have MAIN data on this volume (of course), but there were some little things that may have been lost. it's possible I have an older backup somewhere, and I might have a bootable clone as well - But I'd like to hear about recovery options.
6. I'm rather devastated by this, as I spent months on these forums building a double RAID 10 with separate HDD live backup clones running in my machine to ensure both uptime and redundancy, and after all that I pulled the wrong cable and killed the OS RAID10... but I'd love to hear some thoughts...
For reasons I simply cannot determine, I powered the machine down, moved the computer around a bit to reposition it under a desk, and when I rebooted the Motherboard sounded an alarm with a code which told me it couldn't detect the OS volume.
Further, the hardware RAID card boot told me the volume was degraded. After MUCH trouble (involving the USB mouse not working) I managed to load the WebBIOS utility for the RAID, and it claimed one of the SSDs was failed. I have a feeling this wasn't true, as it seemingly makes little sense (I had another RAID 10 fail just a few weeks ago, and it occurred during a reboot which seems suspicious).
1. I went to rebuild the RAID10, removed Drive 1 which said was "BAD" (I have them labeled matching the stickers that come with the splitter). But it freaked out and reported TWO DRIVES WERE NOW 'BAD'. It apparently numbers the drives the usual way 0,1,2,3 rather than 1,2,3,4 which is what the splitter sticker says - I neglected to check the numbering before pulling the drive.
2. Horrifically, I assume by doing this I simulated a powered-on DUAL DISK FAILURE which destroyed the RAID 10 ARRAY, even though only 1 drive was actually bad and I was attempting to rebuild it.
3. I have rebuilt arrays before and only passingly considered that pulling the wrong cable could destroy the array - is this true, or am I mistaken? Are RAIDs so fragile as this, or did I do something else wrong?
Is it actually possible that after all the efforts undertaken to create a redundant up-time system, I killed my RAID 10 by misidentifying the 0 and 1 disk to replace? (I have 3 sets of labels on the disks to specifically ensure this didn't happen, and I assumed LSi labeled starting with 1 because of the splitter cable they provide (my Adaptec numbers from 0-3, and is labeled as such).
4. Now the machine posts the RAID controller, went into the utility, 2 disks are reported dead, the virtual drive is gone, and ALL disks are reported as FOREIGN.
5. If indeed, as I believe I have answered all my own questions above (but would love a reality check), are there any recovery options for this OS volume in terms of some data?
*I did not have MAIN data on this volume (of course), but there were some little things that may have been lost. it's possible I have an older backup somewhere, and I might have a bootable clone as well - But I'd like to hear about recovery options.
6. I'm rather devastated by this, as I spent months on these forums building a double RAID 10 with separate HDD live backup clones running in my machine to ensure both uptime and redundancy, and after all that I pulled the wrong cable and killed the OS RAID10... but I'd love to hear some thoughts...
More about : raid hardware card major potential fail questions
-
Reply to commissarmo
Best solution
commissarmo said:
I run a RAID 10 on an LSi hardware controller with 4xSSDs for my OS (Win7). For reasons I simply cannot determine, I powered the machine down, moved the computer around a bit to reposition it under a desk, and when I rebooted the Motherboard sounded an alarm with a code which told me it couldn't detect the OS volume.
Further, the hardware RAID card boot told me the volume was degraded. After MUCH trouble (involving the USB mouse not working) I managed to load the WebBIOS utility for the RAID, and it claimed one of the SSDs was failed. I have a feeling this wasn't true, as it seemingly makes little sense (I had another RAID 10 fail just a few weeks ago, and it occurred during a reboot which seems suspicious).
1. I went to rebuild the RAID10, removed Drive 1 which said was "BAD" (I have them labeled matching the stickers that come with the splitter). But it freaked out and reported TWO DRIVES WERE NOW 'BAD'. It apparently numbers the drives the usual way 0,1,2,3 rather than 1,2,3,4 which is what the splitter sticker says - I neglected to check the numbering before pulling the drive.
2. Horrifically, I assume by doing this I simulated a powered-on DUAL DISK FAILURE which destroyed the RAID 10 ARRAY, even though only 1 drive was actually bad and I was attempting to rebuild it.
3. I have rebuilt arrays before and only passingly considered that pulling the wrong cable could destroy the array - is this true, or am I mistaken? Are RAIDs so fragile as this, or did I do something else wrong?
Is it actually possible that after all the efforts undertaken to create a redundant up-time system, I killed my RAID 10 by misidentifying the 0 and 1 disk to replace? (I have 3 sets of labels on the disks to specifically ensure this didn't happen, and I assumed LSi labeled starting with 1 because of the splitter cable they provide (my Adaptec numbers from 0-3, and is labeled as such).
4. Now the machine posts the RAID controller, went into the utility, 2 disks are reported dead, the virtual drive is gone, and ALL disks are reported as FOREIGN.
5. If indeed, as I believe I have answered all my own questions above (but would love a reality check), are there any recovery options for this OS volume in terms of some data?
*I did not have MAIN data on this volume (of course), but there were some little things that may have been lost. it's possible I have an older backup somewhere, and I might have a bootable clone as well - But I'd like to hear about recovery options.
6. I'm rather devastated by this, as I spent months on these forums building a double RAID 10 with separate HDD live backup clones running in my machine to ensure both uptime and redundancy, and after all that I pulled the wrong cable and killed the OS RAID10... but I'd love to hear some thoughts...
Yep, you botched it by pulling the wrong drive.
Obviously, DNS failover is the only setup that will prevent downtime from this kind of user error, though no data loss should occur if you've got an image backup of the device that you can restore from.
Edit: I reread the last part and if you want these "little things" that weren't backed up it will likely cost you several thousand of dollars to do with a professional recovery lab.
There's nothing you can do safely on your own unless you're willing to make images of all the drives first and use r-studio or RAID Reconstructor to try and do this yourself.
-
Reply to TyrOd
Share
commissarmo
August 12, 2014 6:29:10 PM
1. There was something wrong initially with Windows (even with a single dead disk the RAID 10 should have been able to boot the OS, and it wasn't booting which is why I started looking at the RAID to begin with, saw disk 1 was dead, so I pulled what was labeled disk 1 (turns out that was actually disk 0, a healthy disk).
2. Yes - I have already spoken to some recovery specialists. They are going to assess the damage - its unlikely I know that the volume will EVER be able to boot again. I don't know about costs yet.
3. I have healed a 2-disk RAID 0 before with a similar problem (it got unplugged during a move, and got plugged in the wrong way (50/50 bad luck...) and broke the RAID. But I was able to use the software you mentioned to repair it, image it to a single HDD and it booted again, which was great.
4. I don't know how likely recovery with that type of software is in the case of a RAID 10 (though 3 of the 4 drives are/were healthy, and I'm not even sure the 4th drive actually failed, so, it's purely a logical problem).
5. For now I'm going to see what the data recovery people say about this. Very frustrating. Fairly minimal data loss, but in a system designed to have zero data loss... (I usually run Casper 8.0 which allows the creation of a constantly updated bootable OS volume backup, and I have one that's a few months old, but unfortunately it wasn't running at the time due to maintenance work on the machine...)
2. Yes - I have already spoken to some recovery specialists. They are going to assess the damage - its unlikely I know that the volume will EVER be able to boot again. I don't know about costs yet.
3. I have healed a 2-disk RAID 0 before with a similar problem (it got unplugged during a move, and got plugged in the wrong way (50/50 bad luck...) and broke the RAID. But I was able to use the software you mentioned to repair it, image it to a single HDD and it booted again, which was great.
4. I don't know how likely recovery with that type of software is in the case of a RAID 10 (though 3 of the 4 drives are/were healthy, and I'm not even sure the 4th drive actually failed, so, it's purely a logical problem).
5. For now I'm going to see what the data recovery people say about this. Very frustrating. Fairly minimal data loss, but in a system designed to have zero data loss... (I usually run Casper 8.0 which allows the creation of a constantly updated bootable OS volume backup, and I have one that's a few months old, but unfortunately it wasn't running at the time due to maintenance work on the machine...)
-
Reply to commissarmo
m
0
l
Read discussions in other Storage categories
!