rtothedizzy

Distinguished
Nov 29, 2009
29
0
18,540
I've got a very unique and annoying problem.

I had a RAID 5 array of 4 1TB disks. I have a bunch of very important info on there (like 3-6 months of work). I backed up the parts I can't bear to lose but I can't back up the over 2TB of data that I would just really really not like to lose.

Anyway, I wanted to expand the array and add another disk. I'm using intel matrix RAID so I just put another disk in, modified the existing array to expand it, and let the expansion begin. This was looking to take about 2 days or something crazy so I didn't pay close attention to it.

In the first half hour of the expansion the computer crashed. I booted back up and found that two of the disks in the array (not the newly installed disk) were marked as failed and thus the whole array was marked as failed. This seemed super unlikely to me that they physically crashed.

I marked the two disks as safe and it asked if i wanted to reinitialize the array. I said OK. This took about 8 hours. After it was done i could see all of my files but couldn't open any of them. I restarted and windows did an extensive disk check (took an hour or two) and said it fixed a bunch of orphan files. Now I can again open all my files but they are corrupted in a very unique way.

Each file has a bit of it's data, then a bit of another files data, then a bit of crap, etc. Probably like you might expect from a RAID 5 array data storage pattern.

It seems there has got to be a way to get my data back since it looks like it's all there just spread out between all the different files. I'm guessing that the data is still constructed in the manner it would be if I had only 4 disks as it didn't have nearly enough time to expand the array. Can I trick it somehow into going back to 4 disks? Any other suggestions?
 
Solution
You permanently destroyed your data by:
- using FakeRAID5 without a full backup
- expanding the array (very dangerous operation!)
- probably you did not surface checked all the drives prior to expanding, and checked your RAM
- after the expansion failed, you initiated yet another rebuild; this one destroyed your data. Now your data is there, but all kind of gaps and parts have corrupted data written to them.
- launching disk check will write to the array; again this permanently destroys opportunity to recover otherwise recoverable data.

So, in my eyes, you made some bad calls that leaded to the destruction of your data. That's the risk with RAID, and that's why RAID cannot replace a good backup. Using RAID5 is very unsafe; expanding a...

gtvr

Distinguished
Jun 13, 2009
1,166
0
19,460
If it was me, at this point, I'd be looking to call vendor support. These are really complex, low-level issues. Not to be a jerk, but if the data is that important, I'd go that route. Personally, I would have done that before marking the disks as safe and re-initializing. I believe you hosed yourself at that point.

Lesson learned - back stuff up BEFORE messing with an array. A little paranoia goes a long way.
 

sub mesa

Distinguished
You permanently destroyed your data by:
- using FakeRAID5 without a full backup
- expanding the array (very dangerous operation!)
- probably you did not surface checked all the drives prior to expanding, and checked your RAM
- after the expansion failed, you initiated yet another rebuild; this one destroyed your data. Now your data is there, but all kind of gaps and parts have corrupted data written to them.
- launching disk check will write to the array; again this permanently destroys opportunity to recover otherwise recoverable data.

So, in my eyes, you made some bad calls that leaded to the destruction of your data. That's the risk with RAID, and that's why RAID cannot replace a good backup. Using RAID5 is very unsafe; expanding a RAID5 is like gambling with your data; think of a 50% chance your data is gone and 50% chance everything works out ok. You can improve those chances by doing proper tests prior to expanding. Expanding a RAID5 is an extremely delicate operation where a minor error will cause a lot of headaches.

Perhaps you should look for ZFS to prevent data loss in the future. Right now, i don't think there's any hope to get your files back uncorrupted.
 
Solution

rtothedizzy

Distinguished
Nov 29, 2009
29
0
18,540
After doing some more reading I'm beginning to come around to your point of view.

I backed up what I could but I don't have the controller or disk space to back up everything.

I checked all disks before the expansion. My RAM has been checked before but you're right, I didn't check it right before the expansion.

The disk check was inadvertent. I didn't get back to the computer in time to stop the automatic one.

In the end it seems that I did a couple of stupid things to make a bad situation worse. You live and learn I suppose. Hopefully this doesn't happen to to many folks out there.

I really appreciate your reply.