Another RAID 0 Failure

After scanning the web regarding RAID 0 failures, I will preface that I DO have a valid backup of my data, and can recover it easily. My system came with AID 0, and failed about 3 months ago after 2 years of perfect service. The Intel Matrix Storage Console informed me of an error, and there was plenty of time to complete a backup before the system was unbootable 2 days later. I attributed the error to excessive heat in the cabinet (despite the presence of multiple vetilation fans) and ordered two new hard drives (with hard drive coolers) to resetup the system with new components. I successfully created the new AID 0 array, and had zero data loss. The failure also prompted me to create an even more comprehensive backup system of all of my machines, and significantly improve ventialtion. Now I am faced with exactly the same situation 3 months later. Everything ran perfect (and very cool) for three months, and then I got the dreaded Intel notifcation of a RAID 0 drive failing. Backup was already complete....2 days later unbootable, Acronis cannot find partitions. I have not reformatted the drives yet for a reinstall. I am thinking of setting up the reinstall with one SATA drive for OS, and other for data and forgoing RAID altogether.

Question remaining is: If the SATA drives are okay after a reformat...what could the issue be? Driver, controller...what is happening? Can the mobo (or any other hardware for that matter) have an issue that is sporatic and spontaneous? Why does the machine function for days after the console informs me of an issue and not fail instantly?Opinions on cause please.
3 answers Last reply
More about another raid failure
  1. Have you checked the cables? maybe you have a bad one and when the Intel Matrix Storage manager has too much trouble it flags the array as bad, then eventually you just loose it due to to many mis-transfers across said cable.

    Most times for me a bad cable has given lots of errors in the windows even viewer about a drive not being ready.
  2. Check the SMART data. UDMA CRC errors indicate cable errors, recoverable ECC means media errors.

    Your disk might just be thrown out of the array because the disk was healing itself. Some ATA/SATA disks just don't respond for 20 secs to a few minutes.
  3. good point nukemaster. It is unlikely the problem is the HDDs (although not impossible). I would change those cables when you set up your new format. And watch for any problems you might have.
Ask a new question

Read More

NAS / RAID Backup Storage