Gigabyte P35-DQ6 RAID Errors when Formatting

basspig

Distinguished
Aug 7, 2007
73
0
18,640
I assembled a system with the below-listed components last week.
I have four WD hard drives in here. Two are conventional and the other two are in RAID0, on the Intel controller of this board.
When I format the RAID0 array from inside Windows XP SP, at about 27% completion there is an error writing some data to a drive on the array. The Intel Matrix Storage Console just reports an error on Port 0. Port 1 is fine.

After deleting and rebuilding the RAID a couple of times, this problem is consistently on Port 0.

Interestingly enough, I can write and read data from the array and it behaves just fine. However, I don't trust it.

At first, I suspected a bad drive. So I disabled the RAID and ran WinDLG, which found no errors on either of the two RAID members (in non-RAID mode). Same controller chip, same cabling.

Not being satisfied with that, I shut down the machine and pulled the two drives and swapped them, so that the suspect drive would be on the "good" port. I rebuilt the RAID and started formatting. It's a 1TB volume, BTW. At 27% completion, the error appeared again. Matrix showed the error to be Port 0, again. So it's not the drive. It seems to be the controller chip, but only in RAID mode. It works fine in non-RAID. I was able to run WinDLG and both drives passed diagnostics, in non-RAID mode on this controller.

I'm not sure what to make of the problem. The idea of RMA'ing the motherboard looms ahead, though I'd hate to trash a perfectly good Windows/Applications environment with a reinstallation after swapping motherboards--considering there's no garantee that a new board won't exhibit the same error, if it's a compatibility problem with the drives.

Has anyone seen this error before, and do you know whether it is a critical error that would endanger data on that volume, or whether it's a false positive that should be ignored?

I'm going to try one more thing, which is to swap the SATA cables, just to rule out a bad cable. Beyond that, looks like a lot of hassle and headaches changing out the motherboard...



[fixed]Gigabyte GA-P35-DQ6
XFX PVT80GTHF9 GeForce 8800GTS 640MB
Intel Core 2 Quad Q6600
Mushkin Hp2-6400 Ddr2 4gb Kit
Western Digital Caviar HD 500G|WD 7K 16M SATA2 WD5000AAKS
SILVERSTONE TEMJIN SST-TJ06S-W Silver Aluminum
Seasonic S12 Energy Plus SS- 650HT Power Supply
Pioneer BDR-202A
HP LP3065 30" LCD Display
Turtle Beach Montego DDL 7.1 Dolby Digital
Pioneer DVR-112D
NEC Beige 1.44MB 3.5" Floppy Drive Model FD1231H
ZALMAN 9700 LED 110mm 2 Ball CPU Cooler
Black Magic Design Blackmagic Design BINTSPRO Intensity Pro
Contour Shuttle-Pro [/fixed]


I've been moving cables around and now suspect one of the cables to be bad. I reached this conclusion when I physically switched cables, instead of just swapping drives. After switching cables, the next boot into RAID BIOS, showed the error on Port 1, and Port 0 was now showing "Member drive". That suggests that the cable is bad.

So I changed out BOTH cables with new ones from a new package of spares that I had bought. After reassembling the computer and booting, I went into the RAID BIOS and noted that Port 0 was once again showing an error, and Port 1 was showing "Member drive". So either I must clear the error by rebuilding the array and then formatting, or possibly one of the new cables is bad, just like the original (not too likely). I'm going with the theory that not deleting and rebuilding the RAID volume allows it to retain the last error, although if it's static, then why did the error move to another channel when I swapped cables earlier, but not disappear when I replaced both cables with new?

Ah, no good. New cables, but failed at the same point--27%--in the format. "Some data requests to a hard drive in RAID 0 volume failed, but a backup may be possible." is now popping up again from the taskbar.

What are the odds of having two bad sets of cables in a row?

I have one more unopened pair of SATA II cables. If this doesn't fix it, then I don't know what could be causing this error. If swapping the drives didn't move the error, but swapping the cables did, then that suggests a bad cable. But after replacing the cables--both the suspect and the non-error channel cable, produces the error again, then I'm at a loss to imagine what could be the problem..

Could it be cable dress? Should I NOT have zip-tied the cables in neat bundles and routed them neatly from motherboard to drives? Is there a crosstalk problem between cables that the format is bringing out?

 

basspig

Distinguished
Aug 7, 2007
73
0
18,640
UPDATE:

I tried setting the Intel RAID controller in BIOS to Native mode, instead of Legacy IDE mode, and I noticed that the array benchmarks 5% faster and more consistent on read/write operations, so I decided to do a little torture test, and used Adobe Premiere Pro CS3 to write a 965GB uncompressed HD video at 1920x1080. The drive was 2/3 filled with this huge file, and at no point during the writing did the RAID controller generate any errors. Then the next test was to play back this file in Windows Media Player. I let it play for 90 minutes, keeping the drives really busy. Again, no hint of errors, and no corruption of the file was noted.
It seemed that the problem was solved. But, in the interest of really knowing for certain, I shut down, booted to BIOS and deleted the RAID volume. Then I built a new RAID volume and booted into Windows XP SP2. Next was to init the volume and format it NTFS. I thought perhaps this time it would be clear sailing, but, with extreme repeatability, right at 27% into the format, Intel's Matrix Storage Manager popped up that RAID volume error from the taskbard. Opening the Matrix manager, once again, it was port 0 that had an X on it. Always port 0 that marks the drive bad.

As far as I can tell, from testing the drive all afternoon and well into the evening, this error seems to have no effect on the array's performance or data integrity. I am starting to believe that it's an anomaly, a false hit, perhaps caused by some timing issue somewhere. I have no idea. All I know is that I was able to write and read back a 965GB uncompressed video file from this volume with no errors. I marked the drive "normal" and will cautiously use the volume and back it up frequently.