Weird Intel RAID issue

Lantorax

Reputable
Apr 23, 2014
4
0
4,510
Here's the specs:
Processor: Intel i7 875k
Motherboard: MSI P55-GD80
Memory: Corsair Dominator 4x2GB (forget exact model, 1600mhz I believe, though)
Graphics: Diamond AMD HD5870 1GB
PSU: Antec TruePower Blue 750watt
HDD: 4xWD black 500GB (WDC WD5001AALS-0) drives in raid 10(0+1 so says bios) mode
Sound: On-board Realtek
NIC: On-board Marvell
DVD: Cheap Lite-On thingy, connected via sata

I've spent a couple days trying to figure out what part went bad. At the bottom I'll list my entire debug procedure, but I'll get to the meat of the problem here. Rebooted this machine on sunday to find it took over an hour to boot up, no warnings in the logs about anything failing before hand, no updates, software, or drivers installed anytime recently. My testing actually shows that it seems to be the raid chip that’s gone slightly bad as well as windows drivers are failing on this hardware piece. I've run this machine from MiniXP, WinPE, Win7 Recovery Disc, ESET Recovery Disc. All have the same problem regardless of the raid driver I use- extremely slow access times and slow read speed. I decided to try a Linux distro, Knoppix, and, lo and behold, the drives run just like they're supposed to. Have a look-

Windows MiniXP and HD Tune 2.55

On this one I’m not sure why it’s reading both volumes as 1, but its neither here nor there, as all windows kernel boots were the same speed, and the drives were accessible from windows explorer as separate, so the speed’s the important bit in that pic.

And Knoppix palimpsest


Now I can only figure the raid chip did die, at least a little, but somehow the Linux drivers are bypassing whatever function did fail. I may have missed something, but I'm unsure what, and I'd love some feedback to figuring this one out.
The motherboard is literally 6 months out of warranty so RMA is out of the question. I don't know much about compiling my own drivers, so this kind of problem is quite beyond me. I'd love to save the board since I wasn't quite ready for an upgrade yet.

And the boring part, my testing:
First reboot, ran SpyBot, nothing wrong, antivirus timed out and would not load because drive access was too slow
Made an ESET antivirus rescue disk, which runs from WinPE, set it to scan, cancelled after 19 hours and only about 20% of the drive scanned, no errors.
Swapped sata cables, no change, swapped ports, no change, SMART scan, no problems.
Fiddled bios settings a bunch, no difference
Did a BIOS update from 1.0a to 1.0c, no change
Made a Knoppix USB flash drive, found that a 51.5gb transfer was taking near the normal amount of time. Installed bit defender, scanned full drive, nothing suspicious.
Made a MiniXP flash, retested the drive, still snail slow on a windows kernel. Ran HD Tune 2.55, recorded results, then loaded back up Knoppix for palimpsest, and recorded those.
Additional info:
I have also tried with and without the DVD drive in when I was swapping ports
Drivers I’ve tried: MSI: 8.9.0.1023; 12.0.0.49974
Intel: 12.9.0.1001; 8.9.0.1023; 12.0.0.49974; 10.1.0.1008

Thanks for reading
 
Solution
It is the best diagnostic step at this point, seeing that you've tried quite a lot to diagnose the problem. Linux running perfectly fine tells me that there is nothing wrong with the drive itself or the sata controller and that this is likely a software issue.

The 10-15 minutes this would take is well worth the trouble. Just make sure you delete all partitions on the drive during install.

It is also less tedious than the next step which would be to recreate the raid array.

EDIT: since you are still able to boot windows, run a simple "chkdsk c: /f" for each drive letter that is part of the array, and if that does not help, "chkdsk c: /f /r" which will take the better part of a day to complete.

Lantorax

Reputable
Apr 23, 2014
4
0
4,510
Thanks for the reply, but I believe that a reinstall will not work. I should have seen some kind of improvement from one of the alternate windows boots, and I did not.
 

jbseven

Distinguished
Dec 2, 2011
646
0
19,160
It is the best diagnostic step at this point, seeing that you've tried quite a lot to diagnose the problem. Linux running perfectly fine tells me that there is nothing wrong with the drive itself or the sata controller and that this is likely a software issue.

The 10-15 minutes this would take is well worth the trouble. Just make sure you delete all partitions on the drive during install.

It is also less tedious than the next step which would be to recreate the raid array.

EDIT: since you are still able to boot windows, run a simple "chkdsk c: /f" for each drive letter that is part of the array, and if that does not help, "chkdsk c: /f /r" which will take the better part of a day to complete.
 
Solution

Lantorax

Reputable
Apr 23, 2014
4
0
4,510


I had run a chkdsk, sorry, I did forget that up top, ran it before the antivirus, though not with the /r switch.... that would probably take over 10 days, and WinPE (where I ran it from) only has a 72 hour uptime. It didn't find any errors, not even the usual ones. I have also run various Linux hard drive tests on it but I can't be sure how well they do their job. I have not reformatted... I'm curious to know why you think that might help? If Linux is able to access the MFT and partition table I'd think they'd be OK.

Thanks again
 

Lantorax

Reputable
Apr 23, 2014
4
0
4,510
All right, went more into the SMART testing, turns out it was a harddrive. First SMART test passed, but went back and ran it again, and Data Lifeguard froze 5 times in a row trying to test it. Unplugged that drive and windows boots and everything works great. I wonder if Windows and Linux might be using different of the mirrored drives for reading? Interesting, but at least it's solved, thanks again for all the help.
 

jbseven

Distinguished
Dec 2, 2011
646
0
19,160
My experience with windows tells me that it can sometimes be downright retarded compared to linux when there are problems with the partition table or file indexes. And in cases where the conventional chkdsk /f /r fails, deleting the partition and formatting the drive(s) is the only solution.

Glad you were able to find the bad drive!