File server: HDD click on startup, slow boot, not consistent

SirXena

Commendable
Apr 20, 2016
6
0
1,510
Ok this is the puzzle: different HDD clicks each time.

Full history- FreeNAS system built with random used parts. Worked great for a month or so, then a HDD would become "removed" randomly after a period of time. This would fix on reboot. I changed out and tested the supposed removed drive 3 times, so I know it isn't a HDD issue. I then thought it was a HDD designation issue within the software- it was always drive assigned ada1. I dummies up ada1 with a drive I didn't intend to use, and all was well for about a day. Now the server went offline again citing a drive removed and I rebooted. This time the drive came back but a different drive was missing from the start.
Trying to track down the issue, I again shut down the pc and unplugged and replugged each data and power cable and powered up the machine, but a drive started clicking right away. Previously when this all started and I replaced 3 drives, I heard clicking off and on during post-error bootups . Now, I have disconnected all the drives at different times, still getting the click on startup and the screen saying error loading one of the drives, a different one each time, into the volumes to boot FreeNAS. I thought then maybe my PSU was overburdened with the 9 HDDs installed, and I tried unplugging a couple at a time with no success. There was less clicking but still couldn't finish loading disks into FreeNAS. Unplugged 5 disks this time and no clicking, so thought it was the PSU , but I still can't complete the boot, and the s/n of the drive having an issue as well as the port # are different every time.
I've checked all the fans in case, on heatsink, and in PSU, and they are not the click. Have used different cables on drives and different mobo data ports to make sure that's not the issue. I'm at a loss now. Last night I was trying to figure why sometimes it would take nearly an hour to load the server, but I hadn't been monitoring the display at startup to see the issues going on there, and it always booted up whether it took 10 or 50 minutes.
Hardware is my thing, software not so much, but I just can not figure where else to look here for the clicking and various hard drive issues encountered within FreeNAS or it's startup.
 
Solution
Willing to set aside the PSU for the time being but 9 or 10 drives at once just seems to be problematic to me. Subjective at this point but may have some more objective thoughts later on.....

Good idea to keep checking before wiping everything.

Try running Task Manager and Resource Monitor and watch the drives.

May spot something with respect to one drive or another. Or even some other issue may show up.

Be sure to be methodical and consistent in your testing. Change only one thing at a time and keep some notes.

Hopefully a pattern will appear and something fixable will be noted.

SirXena

Commendable
Apr 20, 2016
6
0
1,510
Raidmax 530w, not sure model but it's modular and wasn't very old when I retired the machine it was in prior.

I think the FREENAS part of the hrs issue could be a WD firmware issue of the era...the drives are variety 5-12years old, smaller being in the 10yr region and that's what's having the issues in the server, not all WD drives having issue, but most, but then this morning was a nonWD drive, then a WD drive.

Far as PSU goes, if that's the issue, then the fewer drives should help, but doesn't explain why I only get HDD clicking after a reboot, not after a drive "removed" before a reboot. If it's the PSU I think it's an overload of capacity issue, but the drives aren't running all the time and though most failures were after a lengthy transfer, some were within minutes of boot, before a transfer could even begin.

Have had no performance issues w/ the other hardware. Phenom iix6 cpu and 10gb 1066ram have been running great, temps have stayed around 32C, fans haven't kicked into high gear, no heat buildup in the case when I open it even after running a couple days. There's no vid card in it to suck up power and half the drives are only 2.5" so it's not even a full 9hdd draw.

Every time I think I found the answer and fixed it, machine runs great to make me think it's all good, then 1-3 days later it starts all over again, with each cycle speeding up the fail rate.

Removing the first 2 offending drives did this first time, having gotten to fails within 30minutes. Then machine great for 3 days, with almost nonstop transfers to rebuild the data lost. Then cycle began again, speeding up along the way for a couple days til I decide it's the ada1 assignment, put the dummy drive in to block that slot from use by drives I want, and start back up. Again ran great, for about 36 hrs, making me think I'd fixed it, when the cycle began all over again. Now it finally loaded after about 45 minutes failing to load something over and over, with half my drives disconnected. There are still older WE drives in it, there is again the ada1 slot in use. I won't know until it fails again if it's going to or not. If it doesn't, in about a week I guess I'll know I'd overloaded the PSU, but until it's been at least 4 or 5 days I'll just be expecting it to go back down.

The false starts after repairs each time is what gets me. If it wasn't fixed, why start the cycle over and fake being fixed, with different types of fixes altogether? I know there's more early on I tried too and thought was fixed but can't remember now as this process has gone on for a month now.
 

Ralston18

Titan
Moderator
Just strikes me as being PSU related. 530 watts trying to power older drives seems to be pushing the limits some. (9+ drives plus everything else in the FreeNAS box.)

Worked with FreeNAS sometime back but only with a couple of drives installed. Not sure if your are surpassing some limitations or specification with your setup as I understand it.

All sorts of "flakey" (serious technical terminology here) things can happen with a failing and/or faulty PSU.

Is it viable to try another PSU. Do you know how to use a multi-meter to test your PSU?

Would really like to be sure that the PSU is not at all the culprit here.
 

SirXena

Commendable
Apr 20, 2016
6
0
1,510
Here's an update: more confusion.

Found and disconnected a clicking drive (not sure how, is disconnected all of them at some point during the clixking) HOWEVER...

I then tried to load In FreeNAS, it kept cycling the same operation and saying failed on hrs this or that. Never completed boot.

So, I tried openmediavault, which I had previously installed on another sub. This loaded fine, but returned errors Every time I'd try to access (and thus create any storage) anything pertaining to the drives, other than simplyviewing the HDDs.

So.... I said screw it, (or rather don't do any more screwing or unscrewing if it's to do with the kernel) and went back to Windows. I tried installing home server, got to the 51min remaining mark, went afk, come back to bsod. Think I tried a second time to same result, maybe not. Then I tried xp64, bsod before even asking for cdkey 3 tries. Ran seatools though not sure if any on the mobo are Seagate or just on the raid card (it didn't read the card, but none on card ever showed issue in FreeNAS like onboard did) and it failed a drive so i unpligged it and plugged in another. Running 4 drives at a time for testing, the 5th sata port used for cddrive. This time other drives that passed failed, all WD, again. Samsing and hitachi all passed.

I thought maybe that's because it's for Seagate drives, though I've read elsewhere that seatools works perfectly on we. So at this point I unplugged all but 1 drive, a non-WD one, and tried xp64 install again. Same bsod during initial preparing files. This is using a second xp64 iso, not even just a newly burned disc of the first one.

PSU is plenty big for 1 1tb hrs with this board and 10igs of ram. The mobi came with 8gb and a 1tb drive along with this very do i and a much smaller p su (380?) So unless p su is bad, that's not it. If it is the p su, why is it always we?

Am currently running memtest, will run files i through a loved. Pieces so far have worked fine, except one part of this loved froze. Can't remember which, or what lives is even in here..its 47mb and has a mini do install as well as memtest and some other stuff. No issues so far on memtest 1hr in.

Planning to do, burn, and run several other livecds tonight, but I don't really know where to look right now. Going to run a couple other hrs test/repair utilities to see if same result. If it is, then I lose about 70% of my available HDDs for this server, and still without a cause. If that many are failing, it certainly points to a bad psu with surges frying the hrs boards, but why only the add? Oh, the seatools test flagged my 1tb wd black drive bad too, which is several years newer than all my 320gb blues and the yellow, and thw iriginal hdd for this mobo. So now its not just the old drives but the moddle aged too.

I'm debating ordering a psu tester. I used to have one when I had my pc shop but it disappeared years ago. My problem with that is that if it isn't the issue, that's the end of my cash to deal with this problem. I do live with an electrician I could have manually test all the rails, but I'm hoping someone here has other ideas.

Memory seems fine, no errors yet. HDDs return errors in weird instances and never specific, always WD drives too. Too many drives to be a drive issue itself, these drives were well within their usual lifespan when this mobo was manufacturers.

I do have another mobo I could swap out, using same cup as it's cpu/fan are bigger energy hogs and runs loud. Does anyone think this would help? It's a biggish undertaking, prefer repping mobos last, but at least it's easy in this case, the tray comes out the back.

I've unplugged and replugged everything, I'm using the onboard sata for cdrom, so it's not that (and I've swapped that around to be sure) memtest says clean. I figure its either psu or mobo, but not sure which to try first.
 

Ralston18

Titan
Moderator
Do two things:

First, add up the total wattage load being imposed on the existing PSU. Drives, video cards, CPU, motherboard, memory, fans - everything. Remember that the PSU's working wattage is not likely to truly be full wattage. And that all of the load devices may be actually using more wattage that they claim.

Most power ratings or specifications, either way, were established using ideal conditions. Your NAS may simply be pushing the thresholds: too much load and/or not enough available power.

Second, consider that even if the PSU is apparently "pretty big" that the PSU may be failing in some manner.

Testing will certainly help. Especially if you can do so under load conditions. If anything, the PSU may be eliminated as the problem source.

 

SirXena

Commendable
Apr 20, 2016
6
0
1,510
Every pay calculator I use recommends 285 to 310w psu which isn't surprising at all. No vid card, no monitor keyboard or mouse ordinarily (but I left them in calculator) 5 2.5" drives and 5 3.5" 7200rpm drives . One of the 2.5s is powered by sub, but I counted it as both 2 used and a 2.5 hdd in calcs, 1 wimpy pci raid card, not pcie just pci. Running 24x7, no gaming etc, no video needs whatsoever other than this testing. Not a high-end gaming board but ok- from an $800 dell xps from about 2012. 2 or 3 fans plus cpu fan. Case fans all controlled on case.( not sure if there really is a 3rd fan in front of case or not but I assume it is there by looks etc, it's inaccessible) no liquid cooling, 4 sticks of ddr3.

Anyway, haven't tried testing pay yet, but have been running live CD testing and all drives testing fine except one...and it wasn't a problem previously and only showing bad in partition magic and very first sector on physical test elsewhere, so can bypass that when I finish zeroing the drive. Unfortunately it's one of my 2 big drives so gotta try to rescue if it's just a bad sector at beginning.

Still checking everything I can eliminate before messing with psu bc that's not likely something I can replace in near future, even with the low red. The spare I have (original to this mobo/cpu) doesn't have nearly enough rails for the 9 drives in the case with or without adapters, and I'd hate to use messy unsheathed and unmodular ones in this project with all the data cables to route.

Haven't had anything pointing to psu during all this testing so idk. No revving of its fan, no heat buildup, no performance drop from multiple device usage...except that I'm running xp32 bit off the live CD to facilitate the tests and it's "installed" in the same 3.5gig of ram it's limiting itself to using.

Hoping wiping all the drives will somehow fix things, I know it's possible, but if it doesn't, I may end up swapping mobos before seriously looking at the PSU, BECAUSE I can't fix that and I'd rather exhaust the fixable options first. Maybe by then if it's not fixed, I'll be able to buy new psu instead of new/bigger hdds.
 

Ralston18

Titan
Moderator
Willing to set aside the PSU for the time being but 9 or 10 drives at once just seems to be problematic to me. Subjective at this point but may have some more objective thoughts later on.....

Good idea to keep checking before wiping everything.

Try running Task Manager and Resource Monitor and watch the drives.

May spot something with respect to one drive or another. Or even some other issue may show up.

Be sure to be methodical and consistent in your testing. Change only one thing at a time and keep some notes.

Hopefully a pattern will appear and something fixable will be noted.

 
Solution