Assembled a storage server, had it running for 4 months, then all hell broke loose...
PSU: 500 watt Antec Earthpower which I had used prior to my upgrading my desktop's PSU and never had an issue with
Mobo: Gen. Intel something or other (don't have exact part number handy, appears to have 82801 chipsets and onboard 945 graphics...more than likely not of concern for my purposes here)
Proc: E5200 2.5 gHz DC
2GB of Ram
Hdd 0: WD 40 GB IDE
Hdd 1: WD Caviar Black 1TB OEM
Hdd 2: WD Caviar Black 1TB OEM
Hdd 3: WD Caviar Black 1TB OEM
Hdd 4: Seagate ST310003 33AS 1TB Recert.
Hdd 5: Seagate ST310003 33AS 1TB Recert.
Hdd 6: Seagate ST310003 40AS 1TB Recert.
Optical: LG GDR-8164b
PCI 1: Syba SD-SATA-4P 4 Port PCI-Sata controller
PCI 2: Syba SD-SATA-4P 4 Port PCI-Sata controller
External Drive 1: Seagate Freeagent 1.5TB
External Drive 2: Seagate Freeagent 1.5TB
External Drive 3: Cavalry 1TB USB external (WD 1TB "Green" inside, code EACS")
OS: Server 03 RC 2 32-bit
Drive configuration: Hdd's 1-6 as dynamic volume, raid 5 (Opted for software RAID as it's easier to recover in the event of controller failure, performance is not a concern)
Externals used to backup array (and thank god they were)
And this thing has run fine for about 4 months. Then all hell broke loose. One day, I noticed that i was getting some really crappy performance out of the array, checked it out, hdd #3 was reporting errors. Removed drive, tested it, determined it had failed, but due to WD somehow not having RMA replacements in stock, I am awaiting my replacement. No problem, I have my data backed up anyways. Ran server 1 drive short. Less than 24 hours later, another WD exhibits the similar symptoms, and the system begins locking up for minutes at a time. OK, bad luck, or so I thought...
Purchased 2 new Seagate 1TB drives to replace the 2 failed WD's, put them in, go to recreate the array, and no sooner did the array start to format than I was greeted with the oh so wonderful "FT Orphaning" message from windows telling me it lost contact with a drive. Check diskmgmt, and it now comes up as hdd #4 with errors.
Crack the server open, check cables and such, replace a sata cable that I didn't like the looks of, find nothing wrong. boot back up, look at the event log, and see errors indicating one of my controllers "did not respond within the timeout period". Now, in looking at the physical configuration, I noticed all the drives reporting errors were attached to one of the Syba cards, which I have since removed, using the other and the 4 onboard sata ports for my drives (that being why I bought 2 of them).
For humor's sake, I took the hdd #2, the WD which seemed to be failing, hook it up to the onboard sata, and it's currently running like a champ. formatted it as a single drive no problem, copied stuff to and from it properly (stuff being a pile of ISO backups totaling 239.4GB) proper byte count and such. But now, one of the other seagates is throwing errors after the following:
- Changed Controller port
- Changed SATA cable
Same issues. I also deleted the volume, formatted all other drives individually and all format fine except the one. This has been driving me nuts, and of course, I am unable to register on the Syba forums currently as they tell me 3 e-mail addresses, one of which was my work e-mail was banned. I also have a "pleasant" e-mail to their webadmin about that.
I haven't pulled the Seagate drive out to test it on another box yet mainly because I am not so sure it's the problem. I am having a hard time believing that 2 OEM WD's and a recert seagate would all fail within 72 hours of each other.
And yes, I know that I don't have "enterprise class" drives and they aren't designed for RAID and such, but the server sits idle for better than 70% of it's life. So much so I am considering shutting it down during the day to conserve power as it's only used when I am home.
So here I sit in week 2 of this nightmare, which less brain cells, hair, and liver function than a week ago, and no definitive answers as to what the hell is going on. My questions stand at the following:
- Does anyone have a phone # for Syba tech support? They want you to print and fax or e-mail their support, and I just don't have time for that crap. I'd rather spend an hour on hold because at least then I can get some work done and I will get an answer sooner than later. Plus a live head to bite off is sounding good right about now.
- I am sure this started when the one WD drive blew, but the cascade of crap since has me just stunned. I have never had this much trouble with any other machine I've ever built. Is there something I am overlooking here?
- I am not inclined to think this, but could it be a resource issue in server 03? given the errors populate when working with the drives individually also, I am not sure it is.
Here are the changes in my system config from the original
HDD's 2 & 3 are now ST31000528AS drives
previous hdd 2 was re-added as hdd #7
a new cavalry external 1TB was added, which i believe has the same green drive in it was added as my 4th external
I'm honestly scratching my head over this, my only suggestion is that you try running the drives on another computer independent of each other, and see if failures continue. I'm thinking there could a problem with the PSU on your server, causing fluctuations leading to failure.
It turned out to be my PSU after all. I had a 550 that was just running out of juice. EcoGreen or not, with an OC'd proc, depending on your video card, you're out of juice. Those drives, being power efficient at idle or not, are still using a bit of juice under load, and in a raid configuration, the odds are those drives aren't idle much if at all. That was why I strayed away from the "green" drives in my server.
Here's the quick math I did out off the top of my head to figure that out:
Q6600 = 95w core stock, I don't know how far you OC'd, but you're using more than that, so for argument sake, we'll say 105w
550 - 105 = 395w left
Mobo is going to use some power, say 20w or so depending on usb device count
395 - 20w = 375w left
This board does not have on-board video, so depending on your GFX card, you're using at least 300w (going off my old 8800 GT OC2 here) anything newer you're using more
375 - 300w+ for video = 75w
and that 75w has to power your hdd's (which even if they use 15w each under load) = 60w, leaving you with maybe 15w left for your case fans (which I am guessing you have multiple of, and will use anywhere from 3-5w a piece depending on size. my 120mm thermaltake stock fans in my desktop here use 7.2w a piece) and since you overclocked, I am guessing you aren't using a stock cpu fan, which also uses more power. Oh, now throw the DVD burner on top.
In short, I think the reason you're seeing this issue is not the syba controller, but your PSU is too small. Here's what you can try providing you have the technical know-how to do this without zapping anything:
Hook up the 4 hdd's to the 430w psu by themselves, leave them connected to the syba board
Jump the 430w psu with a paper clip
fire up the machine
see what happens. If they work fine, no errors, that's your issue.
If not, try:
- not using raid 10, leave them as individual drives, format them, chkdsk, etc.
- hook the drives up to your mobo, do software raid through windows (I don't know if 7 can do raid 10, but you can do a 4 drive stripe) and copy / download some BS files to it like the xp sp3 offline installer and copy it like 20 times (easiest done via batch file. use copy "<array drive letter>:\<path to file>\<filename>" "<array drive letter>:\<destination path>\<filename><number>" incrementing <number> for each line) and check disk management and the system event log to see if you get errors
If you happen to be interested, here are clickies for the 2 new psu's I bought, one for the server, one for my new desktop courtesy of the egg:
http://www.newegg.com/Product/Product.aspx?Item=N82E168... - this is what I put in my server. Beware though, in hindsight after 1 DOA (modular rails had no juice) and reading some horrid reviews posted after I bought it, I wouldn't buy one. Since it's A) only partially modular, and B) well, DOA on round one. In it's defense, the second one works like a champ, and I now have more than enough juice.