I've worked with a variety of different RAID setups - including software RAID on Linux and BSD, as well as hardware RAID controllers such as 3ware.
One question I have is : how exactly does a RAID controller (either software or hardware) really know if a drive has failed?
In my experience, most of the time when a non-RAIDED single-drive fails, it doesn't simply just stop working all of a sudden. It continues kicking along - the BIOS continues to detect it, it continues to work mostly. But random I/O error start happening. You know your drive is toast when you type ls on the shell and you get an I/O error. But sometimes the problems are more subtle, such as random, weird behavior - like everything works fine except you get an "out of disk space error" even though df -h reports space is available, or I/O operations work but are extremely slow.
The point is - the symptoms of a bad drive range from subtle errors to outright failures (where the drive isn't even detected by the BIOS.) How can a RAID setup compensate for this whole range of potential behavior?
One question I have is : how exactly does a RAID controller (either software or hardware) really know if a drive has failed?
In my experience, most of the time when a non-RAIDED single-drive fails, it doesn't simply just stop working all of a sudden. It continues kicking along - the BIOS continues to detect it, it continues to work mostly. But random I/O error start happening. You know your drive is toast when you type ls on the shell and you get an I/O error. But sometimes the problems are more subtle, such as random, weird behavior - like everything works fine except you get an "out of disk space error" even though df -h reports space is available, or I/O operations work but are extremely slow.
The point is - the symptoms of a bad drive range from subtle errors to outright failures (where the drive isn't even detected by the BIOS.) How can a RAID setup compensate for this whole range of potential behavior?