RAID 1 Degraded/Failed – With a Twist

SBCat

Reputable
Jun 25, 2014
4
0
4,510
I apologize up front for the length of this – I think I need to explain my situations thoroughly. Thanks in advance for any help.

Context:
• ASUS P5E Motherboard, with Intel ICH9R RAID Controller
• 2 identical WD 500GB SATA Hard drives, configured as RAID 1 (Mirror)
• Windows 7, 64-bit

Two stages to this story. First, a couple of months ago, I got a blue screen, and on rebooting I found that the RAID volume had degraded, but was bootable. The status of one of the drives was listed as “Member Disk (0)” and the other as “Error Occurred (0).” Continuing with the boot, everything came up fine and as far as I could see all my files were intact. So it appeared that one of the drives had crapped out and the other was taking over on its own. I made a mental note that I would need to replace one or both of the hard drives.

I procrastinated. And then stage 2 of the story began yesterday. I got another blue screen, and on rebooting this time it indicated that the RAID volume had failed and it listed BOTH drives as “Error Occurred (0).” I went to the recover menu in the Intel Matrix Storage Manager, but I was unsure what the result would be of the options listed, so I rebooted several times while I grew more apprehensive. And then, on one reboot, I got the same sort of message I had when I first rebooted a couple of months earlier. The RAID volume had degraded, but was bootable. Once I was able to open up windows I subscribed to an online backup service and backed up all the data files I could.

And everything appears to be there EXCEPT there are no files with timestamps between April 21 and June 24!! Similarly, historical emails between those dates are gone. (I use Outlook and all the emails would have been in a single Outlook.ost file.)

So my theory is that the process of successively rebooting somehow kicked the drive that had been dormant for two months back into action, and now the other drive is dormant.

I really would like to recover files from that two-month gap, and I’d certainly prefer not to reinstall all my software. Questions:
• Does my theory make sense? Is there a plausible way to kick the now-dormant drive back into action, with or without the other drive?
• If I remove either of the drives (and reset the SATA configuration to IDE, rather than RAID), will the remaining drive boot up on its own? Will I lose any data if I try?
• Suggestions?
 
If you reset the RAID array to non-RAID, you WILL lose ALL of your data. Dormant drive? Try rebooting. After 10 failed tries, shut the computer down for about 3 hours. After that, start the computer back up. If BOTH drives load up, that is that. Try unplugging the bootable drive and attempt to boot the dormant drive. If the drive boots, connect the other drive and if it fails to come up, restart the computer. If BOTH drives come up, backup the rest of the stuff, (e.g. Program Files, located on the root of the C:\ disk) and replace BOTH drives. Reinstall Windows 7 and restore your backups.
 
And if you follow CompGee's suggestion you run the risk of never seeing that data again. Work with only ONE drive. Your best shot at recovery is to connect the drive you suspect of having the most current data to another computer as an external data only drive and extract the data.
 

SBCat

Reputable
Jun 25, 2014
4
0
4,510
Many thanks to both of you for your responses.

Compgee, you said, "...If BOTH drives load up, that is that." My concern there is whether the RAID controller will freak out because the two drives have different images. Maybe that's why ex_bubblehead says that would run the risk of never seeing the data again?

ex_bubblehead, you suggested that I work with only one drive. Specifically how do I do that? Should I physically disconnect the other drive and then just try to boot? If so, will the RAID controller freak out because I have it configured for RAID 1, but it will really see only one physical drive?

Thanks.
 

As I already stated. Remove the drive that you suspect holds the most recent data and connect it externally to a different computer and treat is as a data drive only.
 
The RAID controller could freak out and take it as the array has failed because it is NOT seeing the other drive. If that happens, the controller could destroy all your data by trashing the filesystem on the non-dormant drive. (e.g. copy from the beginning of disk to a random spot on the disk (e.g. start sector 1152, LBA 354, cylinder 3445)), irrecoverably destroying all data on the disk, heck even corrupt the system area. If the system area is trashed, your disk is dead and your data will never see the light of day again.

EDIT: Follow ex_bubblehead's post as well.
 

SBCat

Reputable
Jun 25, 2014
4
0
4,510
OK, so let me lay out this action plan step by step. Sorry to be pedantic, but my understanding is limited and I want to make sure I don’t screw it up:

Step 1: Remove the currently dormant drive (the one that crapped out most recently and therefore has, I hope, data covering my two-month gap – I’ll call this drive Alpha), and install it as a slave in another machine. I should leave the first machine turned off so the RAID controller doesn’t get confused by the absence of a drive. The second machine would have its own boot drive, of course, so the newly installed drive will be recognized as a data-only drive and will not be altered in any way.

Step 2: From that second machine, hopefully it can actually read drive Alpha and I can then backup the files I am looking for to other storage that I can access as needed.

Step 3a: Reinstall Drive Alpha back in the first machine, and use the Matrix Storage Manager to rebuild by mirroring the image of Drive Beta (the currently working drive) onto Drive Alpha. Hopefully, the controller will then recognize Drive Alpha as a “Member Disk” for the array.

---OR---

Step 3b: Buy a new 500 GB drive (Drive Gamma), install it in the first machine, and have the Matrix Storage Manager rebuild by mirroring the image of Drive Beta (the currently working drive) onto Drive Gamma.

Question: Am I understanding correctly what “rebuild” means here? I am assuming it means to copy an exact image of one drive onto another and then use the pair as a RAID 1 volume, but I can’t find that definition spelled out anywhere.

And, how do I actually execute such a “rebuild?”

Step 4: Boot up into, hopefully, a fully functional RAID 1, and then copy in the files from my two-month gap.

Step 5: Remove Drive Beta (since this had crapped out at one point and could be compromised), and replace it with a new drive (Drive Delta). Have the Matrix Storage Manager mirror the image of Drive Gamma (or Alpha) onto Drive Delta.

I really appreciate the help you guys have provided and I hope you have the patience to stick with me here. PLEASE tell me about anything I am misunderstanding or mis-inferring.
 
If you mirror drive Beta onto Alpha, drive Alpha will have all current data on it overwritten and will be irrecoverable. Rebulid in RAID terms mean to restore the array. To do that, you will need to have the failed member disk present and running to complete the rebuild. After that, the failed member disk can be reset to non-RAID and be removed. From there, I HIGHLY recommend that you replace the still working drive (again, have the drive present with the new blank disk) and rebuild to the NEW drive. After that, error scan both old drives for bad sectors. Anything above 100MB in bad sectors is unacceptable. If that is the case, low-level format the drive(s) and toss them. This freeware tool can be downloaded here:
http://hddguru.com/software/HDD-LLF-Low-Level-Format-Tool/

WARNING! THIS TOOL WILL PERMANENTLY DESTROY ALL DATA ON A PHYSICAL DISK! I AM NOT HELD RESPONSIBLE FOR ERASURE OF A DISK CRITICAL TO YOUR ARRAY!
 

SBCat

Reputable
Jun 25, 2014
4
0
4,510
Compgee -- thanks for your response.

In response to your first sentence, I would be OK with having drive Alpha overwritten, since the idea (per my step 2) would be to first extract all the data files I am interested in (from drive Alpha) copied to a third place (USB drive, external hard drive, whatever). If that doesn’t work, then presumably my two-month gap is totally unrecoverable anyway (which would suck).

In your second sentence you say that “rebuild” means to “restore the array.” Forgive me, but can you define this more specifically? Does it mean just having two physical drives that the controller recognizes as being “Members” Or does it ALSO mean that the data on one of the drives has been maintained and copied/mirrored to the other?
 


Rebuilding the array means re-writing the data to the drive/array.