Verify RAID on crash-BSOD, what to do

MikeTampa

Distinguished
Nov 28, 2007
1
0
18,510
I have a 4-disk 1.5TB RAID 5 array using mobo RAID (Intel ESB2). Works ok. But is dramatically faster if I use write-back caching option.

Any time box crashes or gets BSOD, on subsequent reboot the RAID s/w needs to "verify" the array, takes about a day & slows everything down significantly. This has happened once even when I didn't have write-back caching enabled, but is much more likely when caching is on.

I'm guessing main memory is used for disk caching here, so data in cache cannot be guaranteed to write, maybe not even guaranteed to write across the different disks in sync. As opposed to, say, Compaq/HP raid adapters with their own cache RAM and battery.

2 questions:

1) Any suggestions on what I can do to improve current state of affairs? I know I can disable the caching, but it's so much of an improvement that I always re-enable it eventually. I can't be only one who's had to deal with this.

2) If switching to a different RAID adapter would solve this problem, do you know of some inexpensive ones? I see lot of inexpensive adapters reviewed here, but not sure they'll address this problem. In all of the reviews/tests I've read here and on other sites, I've never seen any discussion of what happens when, say, power is disconnected in the middle of a big test...how does the array deal with it? I've only used either mobo RAID or cheap adapters at home for past several years, and all of them pretty much stink at this. But at work we have Compaq/HP adapters that never skip a beat, and do not suffer this problem.

One Suggestion - I think this would be a good test to add to mix when performing reviews. Pull the power cord. And if you have a way to force a BSOD that would also be good. Does the array come right back up, or does it spend the next day or two verifying (or God forbid, rebuilding).

PS - Interestingly, I have another array on this box, RAID 1 SAS using onboard LSI Megaraid, and it never suffers from this problem (well, only once). Even though both arrays suffer exactly the same crashes. Either LSI is doing something better, or maybe RAID 1 or faster 15k SAS system is just fast enough to keep data synced before disaster strikes.
 

JefUK

Distinguished
Dec 11, 2004
25
0
18,530
I'm afraid I can't help you much with your questions but I find this verifying action by the IMSM very annoying. Unfortunately this verification is not well reported or very well known, otherwise I may not have used an IMSM RAID setup. It did not happen with some of the earlier versions of IMSM/RAID. Other than system crashes, if you have a power failure or do a system reset it starts to automatically verify the array after a reboot. Although the system can be used, disc access is slowed to a small fraction of what it is normally.

I built a system with a 1TB RAID10 array and the verification porocess takes about 3 1/2 hours. Fortunately the Vista 32 system is extremely stable and so it normally only happens should there be a power cut.

One method of reducing the annoyance value is to stop the verification process, and restart the process later so that it runs overnight. It would be better if the option to postpone the verification was presented to the user at system reboot and a reminder given at closedown, with the messages only removed after the verification had been actioned.