Help! AMD Raid 5 broke down, now windows (On SSD) won't start

Angelusz

Distinguished
Jul 20, 2009
20
0
18,510
Hi Tomshardware. I'm a frequent reader but infrequent poster. Now I'm in such trouble that I can't figure it out myself. My google-fu has also proven to be too weak, so I turn to you for help.

I've been using a RAID5 setup in my NAS/mediacenter/server pc for a while now. Always been dandy, 15TB of storage and happy usage overall.

Some time ago, it started bugging a bit, sometimes the PC wouldn't properly reboot after an update. After a few tries it went back to normal and everything functioned as it should.

However, today it didn't start anymore after boot. I started diagnosing. What I found so far:
- Windows 8 broke because of the crash. I 'refreshed' it, meaning a clean windows install on the SSD.
- Windows still wouldn't boot with the raid disks attached. When I pull them out, sometimes windows did boot (WEIRD!). This didn't happen anymore recently:
- After trying to boot with the disks attached once, windows is broken afterwards. Can only start in safemode (probably because of the RAID drivers?).
- The raid controller reports a broken raid. I can 'see' only 3 of the 5 disks in the manager. (CRAP! I need 4 to recover it!)

I've opened up the case and checked all the cables. Pulled it apart and reattached for good measure. I've also updated the mobo BIOS, to no avail.

I'm at a loss. I can boot into Win8, but only without the disks attached. I have no idea where to go from here. How can I recover my disks? What broke down?

Thanks in advance for your help! Any is appreciated.

Specs:
APU: AMD A10 5800k
Mobo: Gigabyte F2A85XM-D3H (newest BIOS, F4e)
SSD: Corsair Force 3 180GB
HDD's: 5x Seagate 3TB (have to check)
 

Paperdoc

Polypheme
Ambassador
If your RAID management system in BIOS tells you that only 3 of the 5 HDD's in the RAID5 array are working, you have NO hope of recovering your data from the array. Your only recourse is to recover the data from a backup dataset. Do you have a recent good backup?

RAID5 can recover from one drive failed out of an array, but not more. RAID6 can recover from two simultaneous failed HDD units, but that is not what you have.
 

Angelusz

Distinguished
Jul 20, 2009
20
0
18,510
Update: I kept at it (6AM here now) and found that when I disconnect one of the disks, the PC once again boots and even detects the RAID array, I can access my data again!

Looks like this disk broke to such an extent that it wrecks my startup when connected. Time to get it replaced.

It's the Seagate Desktop HDD, 3TB. Stay away from these disks, it's the second one in this system that broke. I've got another in my main PC, and it's also giving trouble every now and then. Getting ready to replace that one too. Wish me luck.

EDIT: Thanks for your reply Paperdoc. That was exactly what I was afraid of. Fortunately, it seems that the HDD simply 'confused' my system to the point where it gave all kinds of random errors, which didn't have a direct relation to the problem.
 

Angelusz

Distinguished
Jul 20, 2009
20
0
18,510
The defective HDD was part of the RAID array, but it messed with my mobo/controller to such an extent that even the SDD which was not part of the array, started malfunctioning when it was connected. Now I took the defective HDD out and not only my SSD works again, but the array (now with only 4 disks) works again too.

So I'm still not sure how this came to be, but whenever this defective disk is connected, my PC goes Fubar.

Thanks for your time!
 

Paperdoc

Polypheme
Ambassador
You need to check into this further. When a RAID5 array "loses" a HDD unit due to failure, that unit needs to be replaced. ALSO, normally the RAID5 array cannot function without that unit. It can only rebuild itself AFTER the failed unit has been replaced, and only after that rebuild of data is at least partially done can it let you access its data. So I can imagine two possible explanations:
1. Your RAID5 management system can "rebuild" parts of the data on demand, giving you apparent access to anything you need right now, even though there is no replacement HDD unit installed; or,
2. Your RAID 5 system actually used only four HDD units PLUS a "Hot Spare" that was kept unused but in readiness, and it has automatically been substituted into the array to replace the failed unit, leading to automatic rebuilding of the array.

I don't know if either of these actually has happened. But whatever has happened, you still have one failed HDD that has been disconnected and needs to be replaced, so the RAID5 system can be completely restored.

Oh, I guess there is a third possibility: the unit that failed never was part of your RAID5 array, but its failure caused the system enough confusion to generate faulty messages and temporarily disable a RAID5 array that actually had NO problem. Or maybe it actually happened to be the "Hot Spare" unit that was not really being used in the array.
 

Angelusz

Distinguished
Jul 20, 2009
20
0
18,510
Thank you for your input Paperdoc. I'm not sure what's happening then. It has happened to me before that one disk of a Raid 5 (5 disks) array broke down and that I was still able to access the data. RAID 5 as I understand it:
It encodes the data with parity, in such a way that all disks contain enough information to decode whatever was on the disk if ONE breaks down. However, if more than one disk would break down, the data would be lost. When one disk breaks, the software can decode this information real-time, making the data accessible, even though a disk is missing. To restore the RAID completely, a new disk has to be introduced, after which the repair process can be initiated.

I've tested the defective disk in another system now. Started SeaTools and it cannot even 'find' the disk. I cannot initialize it in windows either. Tried HDTune. That tool can find the disk, but it gives 100% error rate.

I'm guessing the electronics of the disk broke down, which is in line with the fact that it mucked up my system when it was installed with the RAID enabled.

I'm now scanning the second disk for errors, since my machine was unable to restore the RAID. I'm going to do this one by one, hoping to recover my data.

Best guess right now: MoBo is also giving me trouble, and I've got multiple points of faillure: Disk and MoBo.

To be continued.