Sign in with
Sign up | Sign in
Your question

Degraded RAID 5, No Detectable HW Failure, Next Steps?

Last response: in Storage
Share
July 30, 2009 7:56:38 PM

I am familiar with the basics of RAID, but not what to do when an advanced array such as this has problems beyond basic troubleshooting.

The system this is happening on was built around 11/20/2008. Specs (Pulled from HW info link for convenience, modified to remove future changes and irrelevant info):

Infinity Rising (A/V & VM Powerhouse):
PSU: Silverstone 800W Modular (DA800)
Mobo: EVGA 780i SLI FTW (132-YW-E178-A1, manual HERE)
CPU: Core 2 Quad Q9650 (BX80569Q9650)
GPU: BFG Tech GeForce GTX 280 OC2 1GB (BFGEGTX2801024OC2E) <-Temporarily an Asus 9600 GSO, 280's cooling fan failed and is on RMA
RAM: 8GB Corsair XMS2, DDR2 1066, (TWIN2X4096-8500C5 x2)
Sound Card: Creative SoundBlaster X-Fi Elite Pro (70SB055A00000)
HDDs: 1x WD Caviar SE16 640GB (WD6400AAKS), 3x WD Caviar Black 1TB (WD1001FALS) in Kingwin Hot-Swap Rack (KF-4000-BK)
Opticals: 2x Lite-On 20x DVD-+R/RW & DVD-RAM Drives (DH-20A4P-04)
OS: Vista Ultimate SP2 64-bit and XP SP2 64-bit

Screenshot from nVidia Control Panel:


The degraded array is the 3x WD 1TB drives in RAID 5 on chipset-based NVIDIA Media Shield RAID. There are no hot-spares configured currently.

So far I have not found anything of use to get a more specific idea of what the problem is via the RAID controller or nvCpl. I have done basic troubleshooting such as trying various SATA and power cables with the drive and trying different SATA ports, with no progress. Just to ensure it's not drive hardware failing, I ran them through a gauntlet of nondestructive testing individually (with RAID disabled, and without booting Windows so as to not destroy the array):
  • HDAT2 - Pass
  • MHDD - Pass
  • WD DLG - Pass
  • SpinRite 6 - No errors before SR's ~540GB limit was reached, resulting in SR crashing (known SR6 issue)

    I have since disconnected the array from the data and power cables until I could ask for advice to protect it from any further damage. There were no pre-failure signs. I shut down my computer about 10PM the night before, went to an air show to shoot photos, and when I came back the RAID BIOS screen was flashing "Error" for drive 3. Upon starting into Vista, I was able to watch video stored on the array successfully without any errors (done just to check parity integrity, then stopped) and then immediately began the troubleshooting steps above.

    I believe my next step should be to do a rebuild on the array, but I don't want to lose anything due to factors I'm unaware of as I don't know what is involved in the rebuild process. My backup drive is currently down for RMA as well, so I would not be able to do a pre-rebuild backup via the parity fault-tolerance.

    My question at this point is whether my knowledge/instincts is indicating a correct next step, or if I should do something different, such as waiting for my backup drive. Any thoughts? And is there any possibility that this could be a controller issue and not a file system one?

    EDIT: Should I connect the error-state drive on it's own and do an array deletion on that drive before doing the rebuild? Or will the rebuild work without this step?


    Side notes: I know the perils of southbridge-based RAID now through my research into this problem, please DO NOT sidetrack the thread into that issue unnecessarily. I intend to disable RAID in the near future anyways so I can get my board's hot plug feature to work.

    And does anyone know if IDE emulation can be turned off for the SATA ports on this chipset?
    a c 127 G Storage
    July 31, 2009 12:55:28 AM

    First, my compliments to your nice and well formatted first post. :) 

    You seem to be a fairly knowledgeable user, and have followed correct steps. The disconnecting to prevent damage was a good thing. As long as no writes happen to your disks, your data is still there. You tested this by booting from the degraded array as i understand, and found no obvious problems.

    Normally, you can just rebuild the "failed" disk. I'm assuming the disk is just fine but some I/O timed out and it was kicked out of the RAID. Sometimes you need to remove the failed disk so it shows up as a free/unused disk member, and then restart a rebuild.

    A feature called TLER can reduce these errors, but it may be a simple hickup or minor bug as well. TLER is often found on "RAID-edition" disks, but also normal disks of WD appear to support this. But these have it disabled by default, so you can enable it manually with their utility on the website.

    However, stop for a minute thinking how vulnerable your data really is, because you're trusting one "RAID-system" to be 100% reliable. Although RAID5 can help, its not that difficult for a user to behave incorrectly when such an issue appears, causing total data loss. For example those who remove the RAID5 and re-create it, may find it being created with a different disk order. The rebuild then destroys all data on the drive making recovery impossible.

    I found - by hard experience - that i had to take further steps than just running one big RAID5. I ended up with two RAID5 arrays of each 8 disks, in 2 seperate systems. But now i use a combination of ZFS + RAID-Z on the backup server and plain RAID0 (4x1TB WD Green) as main array which is online 24/7. This works pretty well, for several reasons:

  • you have two filesystems, so filesystem corruption cannot affect the other filesystem
  • bad memory (RAM bitflips) cannot corrupt both filesystems, which can if its in the one system
  • physical trauma/impact/fire/power issues are less likely to affect both systems
  • viruses/accidental deletions and ordinary disk failures are covered, up to 2 three disks can fail without dataloss (1 on the RAID-Z and 1 one the RAID-0 still has the RAID-Z operational)
  • best of all: because of ZFS cool snapshot features, all backups are incremental; meaning if a virus wiped all my main array data and the automatic nightly backup script (rsync) would trigger, it wouldn't delete everything on the backup array as well. It would, but i can go back in time and my files are still there. Incremental backups only store changes you make to a filesystem, much like "restore points" in windows. So having many snapshots won't eat up data if you haven't changed much in that meantime.

    Back to your situation, if something happens to your RAID5 you're still facing dataloss. Where i have the impression you thought to be quite safe with a RAID5. These issues also persist with enterprise-level hardware RAID like Areca, as i've also experienced. I rather use software RAID to deal with this as that works the best. Your mileage may vary.

    Your disks for onboard RAID controller can run in three modes: IDE, AHCI and RAID. IDE would be compatible with pre-Vista operating systems as XP doesn't support AHCI without using floppy drivers during installation. AHCI is "native SATA" mode which is what you're looking for if you say "disabling IDE". As you've got Vista you should be able to just switch it to AHCI, my guess is Vista will just boot.

    So do not delete your array and re-create it and start a rebuild - that is dangerous and can destroy otherwise salvageable data.
    a c 127 G Storage
    July 31, 2009 1:06:11 AM

    So you have two options:
    1) wait until you can backup the data while in degraded mode
    2) delete the seperated disk (on top in your screenshot), so it becomes free again, then attach the disk to the array by starting a rebuild. Check the manual to be sure.

    Personally i would never risk really important data to such a procedure, as i know windows RAID drivers generally suck. Always backup data you never want to loose, and don't compromise on that as it would only be a painful lesson should it go wrong. In the past i made some bad mistakes causing me to loose irreplaceable data. With todays networking its easy to drop a large drive into another pc on the network, and use it as a backup. That can save you alot of headaches. Cheers. :) 
    Related resources
    Can't find your answer ? Ask !
    July 31, 2009 3:23:06 PM

    sub mesa said:
    So you have two options:
    1) wait until you can backup the data while in degraded mode
    2) delete the seperated disk (on top in your screenshot), so it becomes free again, then attach the disk to the array by starting a rebuild. Check the manual to be sure.

    Personally i would never risk really important data to such a procedure, as i know windows RAID drivers generally suck. Always backup data you never want to loose, and don't compromise on that as it would only be a painful lesson should it go wrong. In the past i made some bad mistakes causing me to loose irreplaceable data. With todays networking its easy to drop a large drive into another pc on the network, and use it as a backup. That can save you alot of headaches. Cheers. :) 


    Thanks for all the info sub mesa! :) 

    Unfortunately I lost a fairly long reply to a Firefox crash, but here's the summary of what I was going to say:

    -The Media Shield RAID manual is pretty useless in this situation, it's Rebuild steps are aimed at physically replacing the drive and don't go into any detail whatsoever, especially with regard to reusing a drive.

    -I'm working on setting up a modified copy of the UBCD right now with the TLER utility. Would you use the default of 7 seconds on both read and write, or should I feed it a custom command with the 7s read 0s write values used by WD RE3 drives?

    -Should I turn on TLER before or after the rebuild? (I'm thinking before, to avoid more issues, but is there a risk to this?)

    -Unfortunately something like you run is incredibly outside my budget now, and wouldn't work very well in my dorm, but I'll keep it in mind for the future as it sounds like a very very good idea.

    -I don't have the IDE/AHCI/RAID option at all on the 780i, I was expecting to see that but instead I only have RAID on/off selectable. Oh well....

    -I never intended to delete the entire array and recreate it, I wouldn't do that unless ALL other options were exhausted.

    -The risk of doing a backup first is Windows' insatiable appetite for writing to a disk, NTFS Last Access timestamps for example...

    -I have been fully intending to do the separated disk deletion through the RAID BIOS to avoid the perils of the Windows drivers as you mentioned

    As of right now, I'm intending to do a rebuild tonight or tomorrow, since the full backup option seems like it would actually risk more damage at this point. I have some of my most important data from this drive sitting on my laptop and will pull the last few key files before proceeding, so even though it would suck to lose data, it's not critical that everything survive.
    a c 127 G Storage
    July 31, 2009 5:05:19 PM

    Sorry to hear you lost a long post - i hate it when that happens. Normally Firefox saves and can restore form data even after firefox crashes, but it would need proper caching headers on the website. This website doesn't allow caching. It sets the page as expired to force requesting a new copy. This might cause your form data to not be restored. You can try the "Bettercache" firefox addon, which removes anti-caching headers. It also makes the back-button go instantanious, without having to re-load the pages.

    With regard to your questions:

    I wouldn't disable error recovery altogether, but limit it to 7-10 seconds or so. If you're using good software RAID, you should leave it as unlimited or the maximum setting. TLER is no silver bullet that can prevent disks from splitting from the array; that's done by the RAID driver who is rejecting a faulty component.

    What TLER can do, is stop high-end company servers like an important financial transaction database server, doesn't have 'hickups' of over a minute because your disk is trying to fix the damage. You rather have that disk kicked out of the array and replaced by another one as they got plenty of hot-spares in those kind of servers. Without TLER, that database would stop working for over a minute which can be catastrophic depending on its importance.

    For home users, TLER may result in the same thing: a broken array because one disk has a bad sector and the RAID engine kicked it out of the array. As its not sure what happened in your case sure you can try setting TLER to 7 seconds or so. But its no remedy for split arrays if there's a hardware failure of some kind. Bad cables for example or interferrence (EMI) could perhaps also be responsible for temporary glitches, timeouts etc. So it could be that all your drives are just fine and have always been.

    About "my setup"; its really not that expensive since no RAID controller is required; just many onboard ports plus some cheap PCIe x1 cards can make a 8-disk RAID-array, with 1TB disks in a RAID5 thats already 7TB, or 5TB when using the motherboard 6xSATA ports. The OS can be on the disks themselves, or a cheap flash PATA card which plugs directly in the PATA connector on the motherboard.

    So in terms of cost, it would be:
    motherboard - 75
    cpu - 35
    mem - 40
    casing&psu - 100
    = 250 EUR or $300 i guess. with 4x1TB WD Green drives it would be like $350 extra.

    Ergo, it doesn't need to cost you multiple thousands of dollars or euro, you can do it the cheap way with Linux/BSD free software, for example FreeNAS is a real easy way to let BSD store your data, and version 0.7 also has initial support for ZFS.

    Anyway, if you ever want to make a cheap but powerful NAS to store data safely, you can always PM me or open a topic on the forums here.

    Good luck with your rebuild. :) 
    July 31, 2009 5:30:58 PM

    Thank you for the quick reply, Firefox tip and additional RAID info. :) 

    I guess I'll be proceeding with the rebuild in a few hours, now that I feel I have a much better understanding of the process and risks.

    With regard to the Firefox issues, that plug-in will help a lot for now, but there are much more serious underlying issues with my Firefox and Vista installs right now (a very long story, will save it for somewhere else). I'm currently just waiting to get an external drive with a failed controller and a Seagate 7200.11 1TB drive with a dead I/O board back from RMA to use for temp storage of data and will be doing a full nuke and fresh start, which is when I intend to move to a more basic storage solution and simple regular backups until I can implement a self-built NAS with a solution such as yours. (And yes, I have really, really bad luck with my personal hardware and software. The day someone can explain it I'll be amazed... :p  )
    August 1, 2009 1:42:58 AM

    So, uh, I'm really not sure what to think now. The rebuild seemed to go excellently, I followed every procedure for nVidia MS RAID (confirmed by other sources and what little the manual says) to the letter. I was somewhat surprised when it finished in about 6 hours without errors or any oddities, as I expected it to take longer. And then I see this:



    What the ****?! :??: 

    I've deassigned the drive letter and I'm in the process of gathering my file and data recovery tools (unfortunately the installed copies were sitting ON the RAID array, dumb move) as I write this post, and will be RMAing my external drive ASAP so I have somewhere to dump recovered data to.
    August 1, 2009 2:24:22 AM

    Update: After initial scans, it looks like all my data is still there, it's just a matter of restoring the partition table.
    August 1, 2009 2:41:32 AM

    Partition's still there, scan will probably take another 7 hours or so, may be able to recover in-place (worst case scenario this app backs up the partition data and can restore it, and I could use filesystem-based recovery to the external drive):

    a c 127 G Storage
    August 1, 2009 9:08:30 AM

    Without knowing what prompted the seperation of your array, its possible some "damage" happened to your array causing minor corruption. Data should still be salvageable with the right tools. However, i'm sure this wasn't what you expected.

    I hope you can make a full recovery.
    August 1, 2009 12:16:18 PM

    Recovery still looks good, as I have the partiton records stored, but I'm not so sure what exactly is going on with the array now. I turned on TLER before the rebuild last night, and this morning I wake up to a completed partition scan and this:



    I'm going to look into what the heck happened to drive 1 now, after getting myself out of the "land of confusion."

    -Bill

    EDIT: Note that this time it's drive 1 (0.1) and not drive 3 (1.1), and that this time it has a status of Error instead of Healthy. I'm starting to suspect an I/O board failure on drive 1 now. :pfff: 
    August 1, 2009 12:34:04 PM

    From the department of "WTF?!"...



    Drive 1 (0.1) SMART says read element of the test failed on two SMART self-tests. Research time, and maybe time to buy a spare drive...

    (This is really starting to irk me, it's almost a year to the day since I had a Seagate 7200.11 1TB drive fail and lose everything on it due to I/O board failure.)
    August 1, 2009 1:05:44 PM

    Could RAW copying the failing drive to another one with backup software do any good in recovering the RAID array? I can order another 1TB drive and do that if it'll help the situation.
    August 1, 2009 4:42:16 PM

    Just looked up the SMART logs, and this just gets more strange by the minute:

    Log:

    Level Date and Time Source Event ID Task Category
    Information 7/31/2009 9:13:21 PM NVRAIDSERVICE 1024 None Disk(s) were polled for SMART status.
    Information 7/31/2009 9:17:02 PM NVRAIDSERVICE 1009 None Array NVIDIA RAID 5 1.81T rebuild finish.
    Information 7/31/2009 9:18:21 PM NVRAIDSERVICE 1017 None SMART status for disk WDC WD1001FALS-00J7B0 returned OK.
    Information 7/31/2009 9:18:21 PM NVRAIDSERVICE 1017 None SMART status for disk WDC WD1001FALS-00J7B0 returned OK.
    Information 7/31/2009 9:18:21 PM NVRAIDSERVICE 1017 None SMART status for disk WDC WD1001FALS-00J7B0 returned OK.
    Information 7/31/2009 9:18:21 PM NVRAIDSERVICE 1024 None Disk(s) were polled for SMART status.
    Information 7/31/2009 9:26:12 PM NVRAIDSERVICE 1017 None SMART status for disk WDC WD1001FALS-00J7B0 returned OK.
    Information 7/31/2009 9:26:12 PM NVRAIDSERVICE 1017 None SMART status for disk WDC WD1001FALS-00J7B0 returned OK.
    Information 7/31/2009 9:26:12 PM NVRAIDSERVICE 1017 None SMART status for disk WDC WD1001FALS-00J7B0 returned OK.
    Information 7/31/2009 9:26:12 PM NVRAIDSERVICE 1024 None Disk(s) were polled for SMART status.

    (continues like this until...)

    Information 8/1/2009 2:46:33 AM NVRAIDSERVICE 1017 None SMART status for disk WDC WD1001FALS-00J7B0 returned OK.
    Information 8/1/2009 2:46:33 AM NVRAIDSERVICE 1017 None SMART status for disk WDC WD1001FALS-00J7B0 returned OK.
    Information 8/1/2009 2:46:33 AM NVRAIDSERVICE 1017 None SMART status for disk WDC WD1001FALS-00J7B0 returned OK.
    Information 8/1/2009 2:46:33 AM NVRAIDSERVICE 1024 None Disk(s) were polled for SMART status.
    Information 8/1/2009 2:51:33 AM NVRAIDSERVICE 1017 None SMART status for disk WDC WD1001FALS-00J7B0 returned OK.
    Information 8/1/2009 2:51:33 AM NVRAIDSERVICE 1017 None SMART status for disk WDC WD1001FALS-00J7B0 returned OK.
    Information 8/1/2009 2:51:33 AM NVRAIDSERVICE 1017 None SMART status for disk WDC WD1001FALS-00J7B0 returned OK.
    Information 8/1/2009 2:51:33 AM NVRAIDSERVICE 1024 None Disk(s) were polled for SMART status.
    Error 8/1/2009 2:52:08 AM NVRAIDSERVICE 1006 None Access failure: Critical error on disk WDC WD1001FALS-00J7B0 (Port: SATA 0.1).
    Error 8/1/2009 2:52:14 AM NVRAIDSERVICE 1006 None Access failure: Critical error on disk WDC WD1001FALS-00J7B0 (Port: SATA 0.1).
    Error 8/1/2009 2:52:14 AM NVRAIDSERVICE 1006 None Access failure: Critical error on disk WDC WD1001FALS-00J7B0 (Port: SATA 0.1).
    Error 8/1/2009 2:52:19 AM NVRAIDSERVICE 1006 None Access failure: Critical error on disk WDC WD1001FALS-00J7B0 (Port: SATA 0.1).
    Error 8/1/2009 2:52:19 AM NVRAIDSERVICE 1006 None Access failure: Critical error on disk WDC WD1001FALS-00J7B0 (Port: SATA 0.1).
    Error 8/1/2009 2:52:22 AM NVRAIDSERVICE 1006 None Access failure: Critical error on disk WDC WD1001FALS-00J7B0 (Port: SATA 0.1).
    Information 8/1/2009 2:52:22 AM NVRAIDSERVICE 1001 None New disk detected: WDC WD1001FALS-00J7B0.
    Warning 8/1/2009 2:52:22 AM NVRAIDSERVICE 999 None Disk WDC WD1001FALS-00J7B0 has been removed from array NVIDIA RAID 5 1.81T.
    Information 8/1/2009 2:52:23 AM NVRAIDSERVICE 1017 None SMART status for disk WDC WD1001FALS-00J7B0 returned OK.
    Information 8/1/2009 2:52:23 AM NVRAIDSERVICE 1017 None SMART status for disk WDC WD1001FALS-00J7B0 returned OK.
    Information 8/1/2009 2:52:23 AM NVRAIDSERVICE 1017 None SMART status for disk WDC WD1001FALS-00J7B0 returned OK.
    Information 8/1/2009 2:56:33 AM NVRAIDSERVICE 1017 None SMART status for disk WDC WD1001FALS-00J7B0 returned OK.
    Information 8/1/2009 2:56:33 AM NVRAIDSERVICE 1017 None SMART status for disk WDC WD1001FALS-00J7B0 returned OK.
    Information 8/1/2009 2:56:33 AM NVRAIDSERVICE 1017 None SMART status for disk WDC WD1001FALS-00J7B0 returned OK.
    Information 8/1/2009 2:56:33 AM NVRAIDSERVICE 1024 None Disk(s) were polled for SMART status.
    Information 8/1/2009 3:01:33 AM NVRAIDSERVICE 1017 None SMART status for disk WDC WD1001FALS-00J7B0 returned OK.
    Information 8/1/2009 3:01:33 AM NVRAIDSERVICE 1017 None SMART status for disk WDC WD1001FALS-00J7B0 returned OK.
    Information 8/1/2009 3:01:33 AM NVRAIDSERVICE 1017 None SMART status for disk WDC WD1001FALS-00J7B0 returned OK.
    Information 8/1/2009 3:01:33 AM NVRAIDSERVICE 1024 None Disk(s) were polled for SMART status.

    (continues like this until I run the SMART Self Test, after noticing the drive seperated...)

    Information 8/1/2009 8:21:53 AM NVRAIDSERVICE 1017 None SMART status for disk WDC WD1001FALS-00J7B0 returned OK.
    Information 8/1/2009 8:21:53 AM NVRAIDSERVICE 1017 None SMART status for disk WDC WD1001FALS-00J7B0 returned OK.
    Information 8/1/2009 8:21:53 AM NVRAIDSERVICE 1017 None SMART status for disk WDC WD1001FALS-00J7B0 returned OK.
    Information 8/1/2009 8:21:53 AM NVRAIDSERVICE 1024 None Disk(s) were polled for SMART status.
    Error 8/1/2009 8:23:01 AM NVRAIDSERVICE 1035 None The SMART self-test on disk WDC WD1001FALS-00J7B0 on port SATA 0.1 has detected a failure. A read element of the test failed. Please back up your data and replace the hard disk.
    Information 8/1/2009 8:26:01 AM NVRAIDSERVICE 1030 None The SMART self-test on disk WDC WD1001FALS-00J7B0 on port SATA 1.0 completed without error.
    Information 8/1/2009 8:26:53 AM NVRAIDSERVICE 1017 None SMART status for disk WDC WD1001FALS-00J7B0 returned OK.
    Information 8/1/2009 8:26:53 AM NVRAIDSERVICE 1017 None SMART status for disk WDC WD1001FALS-00J7B0 returned OK.
    Information 8/1/2009 8:26:53 AM NVRAIDSERVICE 1017 None SMART status for disk WDC WD1001FALS-00J7B0 returned OK.
    Information 8/1/2009 8:26:53 AM NVRAIDSERVICE 1024 None Disk(s) were polled for SMART status.
    Information 8/1/2009 8:28:35 AM NVRAIDSERVICE 1030 None The SMART self-test on disk WDC WD1001FALS-00J7B0 on port SATA 1.1 completed without error.
    Error 8/1/2009 8:31:25 AM NVRAIDSERVICE 1035 None The SMART self-test on disk WDC WD1001FALS-00J7B0 on port SATA 0.1 has detected a failure. A read element of the test failed. Please back up your data and replace the hard disk.
    Information 8/1/2009 8:31:54 AM NVRAIDSERVICE 1017 None SMART status for disk WDC WD1001FALS-00J7B0 returned OK.
    Information 8/1/2009 8:31:54 AM NVRAIDSERVICE 1017 None SMART status for disk WDC WD1001FALS-00J7B0 returned OK.
    Information 8/1/2009 8:31:54 AM NVRAIDSERVICE 1017 None SMART status for disk WDC WD1001FALS-00J7B0 returned OK.
    Information 8/1/2009 8:31:54 AM NVRAIDSERVICE 1024 None Disk(s) were polled for SMART status.

    (continues like this through when I pulled the log at 8/1/2009 12:22:07 PM...)


    I just tested drive 1 (0.1) once again for good measure, and it returned the same failure, so this leaves me wondering why the automatic check keeps returning an OK.
    August 3, 2009 4:09:51 AM

    At this point, my next steps (unless advised otherwise) are going to be to RAW sector by sector copy the drive, see if it will reintegrate with the array, and start partition recovery to two WD Green 1TB drives from an external enclosure if it does integrate and bring the array back online. If not, then I think I'm just going to try and pull individual files using filesystem-based recovery to the Green drives, before attempting partition based recovery in degraded mode if possible. If that fails, I'll do a rebuild once more (in hopes it did a raw rebuild to drive 3 despite the partiton table damage) and attempt in-place partition restoration. If anyone has any suggestions/warnings, PLEASE let me know.

    Thanks,
    Bill
    a c 127 G Storage
    August 3, 2009 9:45:25 AM

    If your disks have swapped, for example the "failed" disk is now in operation and another one is disconnected, you have serious problems. Disk members shouldn't swap like that; since you written to the array in degraded mode, the data from the initially failed drive should not be used anymore. Doing so might yield corruption to the areas you've written to in the degraded state.

    As of this point, i'm not sure if your data can be recovered. It would be possible to write all array data to a single disk of 2TB, so you can do recovery without the RAID-part. That could improve the chance of recovery, but i must say its getting complicated at this point. I do wish you alot of luck and feel sorry this happened to you.
    August 3, 2009 7:43:56 PM

    Thank you for all the help you've been so far, and for sticking with this. :) 

    As far as I know there was no actual writing to the array since it's not an OS drive and no pagefile is stored there. I don't know where drive letter associations are stored by Windows (and have yet to find any useful info regarding that), but I would assume from what I've seen that they are stored via the registry, which would reside on the OS drive. The partition recovery software I'm working with is Active@ Partition Recovery, which as far as I know does not do any writing to the drive/array it is working on, unless explicitly told to do so once the scan is complete and the recovery process is begun. Any table info pulled from the array was saved to my OS drive as a standalone file type used by A@PR, nothing was rewritten to the array, I never got to that point before drive 1 dropped. Unfortunately I have a snowball's chance you-know-where in coming up with the money for a 2TB drive right now and don't know anyone who has one, but I could get my external RMAed and set it to run RAID 0 to have the same effect. Also, with nVidia MS RAID there's no way that I can tell of to force drive 1 back into the array (which would be really handy right now...) without a second rebuild, unless I do the RAW duplication that I mentioned. I guess my number 1 question right now is if you have any opinion on the chances of that succeeding. It's not life-or-death if I don't get this recovered, but it'd be really nice to be able to, since I have about 400GB of working video on the array (replaceable, just time consuming). If it comes down to it, most of the content is in formats that would tolerate a small amount of corruption, I just want to recover what I can and say good riddance to the rest if I have to. None of this is super time-critical, so if I have to I might be able to let the disks sit idle until I have the money for a 2TB drive in a few weeks.
    August 6, 2009 7:21:50 AM

    Based upon the circumstances in the post above, does anyone have any other thoughts on this before I proceed?
    August 10, 2009 11:13:31 PM

    Looks like everything will be recoverable based on initial filesystem-based recovery of a few video files. I need to take care of a few RMAs still before I can do a full recovery but it seems like everything is there and in good condition. Samples coming later tonight, just "because I can" mainly.
    !