Sign in with
Sign up | Sign in
Your question

degraded RAID 1 array!

Last response: in Storage
Share
July 25, 2004 1:50:39 AM

I built a system about 3 weeks ago, and now I'm getting a degraded array warning. How do I fix this? Why would this happen so quickly? Is there a hardware problem somewhere?

I have a VNF3-250 motherboard with 2 WD 800JB drives in a serial RAID 1 array. I have the most recent drivers I could find at the time of installation.

One element of note - I got this system up and running initially, then decided to wipe it clean and do it again to tweak a few things, and I had just received a degraded array message when I redid the whole process. Is this indicative of a bad drive, motherboard, or what exactly? I'm new to RAID. I also don't know how to rebuild it to preserve my data. I messed with it the first go round (before the second clean install) but it made both inaccessible. I'm sure it was my fault, but I'd like to know the right way to go about fixing this.

Thanks all.

<font color=purple> Do <i> you </i> have a chicken hat? </font color=purple>

More about : degraded raid array

July 25, 2004 5:48:44 AM

It is likely that the message you are receiving is being issued by the RAID software that you are using.

From your post it was unclear whether you were actually using the RAID controller on the motherboard, or if you were using Windows to achieve RAID?

I would presume, you are using the onboard RAID controller.

Generally, any message in RAID talking about "degraded" or "critical" situations often is referring to one, or more, drives being inaccessible.

This could be the result of an actual physical drive going down, a drive temporarily being not communicating properly, or even a routine drive check determining the data on the drive is NOT what was expected.

Since, again I am presuming, the drives that you are using are relatively new it is likely that the problem is temporary and NOT permanent.

The first thing to do is to find and install, unless you have already done so, the RAID Management software that came with your system. (Should be the NVidia RAID Management software I would presume.)

This software should be able to tell you what has/is happening in more detail.

If a drive has physically failed the software should point this out. Replace the drive, with another to see if things work. If they do then RMA the failed drive.

If the drive is still online then there is a "REBUILD" or "SYNCHRONIZE" option available. This will scan the drive that the system presumes is correct, and make sure ALL of the data is copied to the other drive. (This will, probably, take less than 1 hour.)

NOTE: This software is ALL running under your OS. I am NOT recommending restarting your PC and using the RAID Controller BIOS for this.

NOTE: I think what happened last time, is one drive failed, partially or whatever, then when you tried to recover maybe you copied the data from the failed drive to the good drive. Hence nothing worked. Just a thought?

You will also need to track this. Does this re-occur after "REBUILDING" every so often? If you replaced the drive, did the provlem come back? Is the problem related to the physical location of the drive in your system, or to the physical drive?

On my systems, currently using Promise RAID Controllers but soon to be using LSI Logic, I have the mirrored pairs "Synchronized" one every week. Additionally, one a month I break the mirroring by removing one of the drives and "Rebuild" onto a spare drive. Just to make sure everything is still functioning as expected.

NOTE: These are actual servers though, so this may be overkill for general purpose home systems.

NOTE: Many of the, shall we say, "less-expensive" RAID controllers lock up if you take a drive offline while running, and replace that drive with one that has substantially the same data on it. This "problem" is overcome on ALL controllers, at least that I have ever used, if you make sure to delete any active partitions from a drive before it is put into a running RAID configuration.

As far as what can cause this?

Could just be a bad drive?
Could be the internals in the case are getting too hot?
Maybe one drive is getting hotter than the other?
(More lickely if the drives are mounted in plastic removeable carriers, of if installed in the 3.5" bays in a case one on top of another.)
Could be bad cables?
Maybe an improperly seated cable?
Could be a driver problem?

Most likely, heat, bad drive(s).

Additionally, ALWAYS make sure that the Motherboard BIOS is current, and the RAID Controller BIOS is current, and the RAID driver software is current. (Of course when changing this on a running/functioning system be extra cautious.)

I hope this gives you some ideas, and let us know what the outcome is!
July 25, 2004 11:29:14 PM

Well, the next time I booted after my post (the next day) the RAID array "fixed" itself and now says it's healthy, but isn't. Each boot I'm getting checkdisk requests at boot, for all 4 partitions on both drives. Once in Windows, I get a message from the NVRAID software that the array is starting the rebuilding process, but the process never ends, and resets frequently, even when the system is left unattended for long periods of time. As I write this, the process is at around 10%, but I never expect it to get all the way to 100. When I go into the raid software, I can see no way to find out which disk is acting up, or how to rebuild in any way other than hitting the "rebuild" button on the whole array, or one or the other drive, which may rebuild something in the wrong order.

Thanks for the diagnosis possibilities. I wouldn't be shocked if heat were to blame, as my case (Antec P160) has one fan, though it is a large one (120mm I believe.) I have two system temperature sensors running from the case, one to the motherboard, and one directly beneath (touching) one of the two drives. The temp readings I'm getting right now are 39C/41C (no idea which is which, or how to find out), the norms are in the 43ish range, with the highest I've seen being one at 48C. Are these high? I can order a second 120mm fan to install in the front of the case, right in front of the hard drives.

Ah geez, there goes the "Rebuild In Progress" button again. I'll check back later. At least my system still works ok. I don't know what I'd do if I were using RAID 0.

<font color=purple> Do <i> you </i> have a chicken hat? </font color=purple>
Related resources
July 26, 2004 4:05:50 AM

This is getting rediculous. The array will rebuild all the way up to 96.66% (which takes probably 3 or 4 hours), then fail to access and stop. I can't use Photoshop now without re-registering it because it thinks I have a new hardware configuration. This is a big deal.

<font color=purple> Do <i> you </i> have a chicken hat? </font color=purple>
July 27, 2004 2:43:01 AM

I don't know?

While I haven't used the Nvidia RAID software extensively, I have not run into ANY RAID software that didn't give some indication as to the drive that failed?

There are some controllers that have a "1gb boundry"? I am under the impression that this is to compensate for drives that are slightly different in size. In your case, since both drives are the same manufacturer, this should not be an issue.

I don't think your temperatures are too far out of wack. 48 celsius shouldn't be a problem. (By specification this drive can go from 5C to 55C operational temperature. 48C should work.)

Make sure that you have, and have installed, the most recent BIOS's and Drivers.

It does seem that it is taking a long time to rebuild the drives, but I don't have any experience with those drives or that RAID controller.
July 27, 2004 4:35:54 AM

I appreciate the information. I need to find out how to update my bios now, which is surprisingly difficult to find online in my experience. Anyone else know what's going on here? Is it safe to install new software in a situation like this?

<font color=purple> Do <i> you </i> have a chicken hat? </font color=purple>
!