Rebuilding RAID1 array after strange failure

fauxpride

Distinguished
Feb 17, 2013
15
0
18,510
Hi,

System specs are:

CPU: i7-3770K
Mobo: Asus Maximus V Formula
RAM: 1 x 8GB DDR3/1600mhz
OS: Windows 7 SP1 64bit, up to date
Storage: 1x 500GB Samsung EVO 850 SSD connected to one Intel SATA3 port, 2x2TB WD black set up in hardware RAID1 and connected to two of the Intel SATA2 ports. I also have several other HDDs connected throughout (Intel and ASMedia SATA ports).

The situation is:

I was trying to troubleshoot a BSOD by running the faulty program again in order to generate a new minidump, that would hopefully give me more info about the culprit. Upon first restart, Windows entered recovery mode automatically. I then foolishly reset the computer during the time the recovery console was starting up, trying to skip it at the next boot and login to Windows so I can read the minidump.

However, at the next boot chkdsk said that drive H (a partition I didn't recognize) needed to be checked for inconsistencies, only to fail and telling me to reload from a previous restore point.

When I got into Windows, strangely enough, my 2x2TB RAID1 drives were showing as separate partitions - one appeared to have the original data on it, while the other (the one that chkdsk wanted to check) had less free space, showing disparity between them.

I rebooted the machine, switched from AHCI to RAID, entered the Intel RAID utility to see that it reported the RAID array to be OK.

I restarted again, and followed this procedure: http://www.thewindowsclub.com/check-disk-runs-at-every-startup-windows

It implies running a deep chkdsk so that Windows would untag the drive as being "dirty"- I did that in hope that after that is done, the array would function properly at the next boot.

Question:

chkdsk is still in the process of scanning the "damaged" 2TB disk, but I'm not sure if that was the best action to take. I'm reluctant that having chkdsk running on only one of the disks would cause further disparity and also break the HDD that has the data intact.

So the question is: What would be the best way to deal with this situation? I want to make sure that I rebuild the array with the same two drives, and clone the data from the disk that's intact over to the second one.

Is there any method of triggering a rebuild without taking out one of the HDDs? Maybe formatting the faulty drive and trying to rebuild it from Intel Rapid Storage via Windows?

Anticipated thanks.
 
Solution
Restarting from Windows Recovery shouldn't have damaged the array, normally that only operates on the boot drive. Something else might have corrupted the array which is odd since you're not playing with a striped or parity array (raid 0 or raid 5/6)- it's a straight mirror. Data written to one drive should have been written to another. That's why it would have been better to wipe the 'dirty' drive. You don't want the Intel drivers to decide on their own which drive to use as the new 'source', if it uses the 'dirty' drive as source, it can hose your files.

With using a good drive and a completely blank drive, usually the drivers are smart enough to ask 'which drive to use as source for rebuild?' Then you tell it which one is good...

Rookie_MIB

Distinguished
If your drives are set up in RAID1, that's mirror. What you should be able to is take the 'faulty/dirty' drive (the one that doesn't have the data safely stored), Wipe it completely, as in delete the partition(s) on it.

Then you should be able to format it using NTFS. Run a full chkdisk on it, including scanning blocks. Then, pull the SMART data of it, verify it doesn't have any reported SMART problems.

Then, delete the partions again, and you should be able to re-insert it into the RAID1 array and rebuild it with the clean data off the 'good' 2TB drive. I've never used Intel Raid tools, so I'm not sure exactly how that would work, but that would be the general process to fix a RAID1 problem.

BTW, if your CHKDISK or SMART report any serious problems (Uncorrectable pending sectors, failing sectors, etc) then replace it - I would hope that would be obvious.
 

fauxpride

Distinguished
Feb 17, 2013
15
0
18,510
Thanks for the answer.

Update on this:

I haven't got the luxury of a new machine right now that I could use to wipe the dirty disk on. What I did is leave chkdsk to finish (it said it repaired a couple of indexes), then restart the computer.

For reference, before this happened, the normal RAID partition had 50GB of free space, and the partition letter was G. After running chkdsk and restarting, I'm left with H: (the dirty drive) with 11GB of free space - I know chkdsk is sometimes eating up free space, but I'll dig info on how to fix that later.

After seeing this, I downloaded the latest Intel Rapid Storage drivers, restarted the computer and switched from AHCI to RAID from
UEFI. Upon boot, the IRST console says that it's verifying and repairing the array.

I can see that most of the stuff on the HDD is intact, but double clicking on some of the folders spews an error message that they're corrupt - is this normal during the raid verification process, or can I just say goodbye to my data?

PS: I'm almost sure that the HDD has no mechanical/hardware fault - it's just a filesystem integrity problem that happened when I abruptly restarted the computer during the time the Windows Recovery Console was booting up.
 

Rookie_MIB

Distinguished
Restarting from Windows Recovery shouldn't have damaged the array, normally that only operates on the boot drive. Something else might have corrupted the array which is odd since you're not playing with a striped or parity array (raid 0 or raid 5/6)- it's a straight mirror. Data written to one drive should have been written to another. That's why it would have been better to wipe the 'dirty' drive. You don't want the Intel drivers to decide on their own which drive to use as the new 'source', if it uses the 'dirty' drive as source, it can hose your files.

With using a good drive and a completely blank drive, usually the drivers are smart enough to ask 'which drive to use as source for rebuild?' Then you tell it which one is good as the source.
 
Solution