external USB enclosure - multiple HDD - controller errors

virgule

Honorable
May 4, 2013
3
0
10,510
I've been struggling with a home-made system for the past few weeks (it was running fine for years before that). The symptom is that the system hangs in the middle of the night (ie after a very long period of time without user interaction).

Hanging means frozen, unresponsive system. No BoD, no reboot, no crashed application, no obvious errors in event log prior (timestamp) to freezing . The keyboard numlock does not respond, nothing works. The system is not asleep : fans are on, LED on Wifi-USB is on (but not blinking), system is powered. All my drivers are up to date, Win7/64 is up to date, Avast/S&D/CC/etc do not find anything unusual.

After spending some time running diagnostic tests, I discovered a failing HD in my external, powered, USB enclosure (4 x 1TB disks). Haha. Ran manufacturer diags, confirmed bad health and took it out.

My freezing symptoms continued, and the event viewer still showns occasional HD controller errors. The error message is in the typical cryptic form "the driver detected a controller error on \device\harddisk2\DR14" that doesn't tell you precisely which disk (with USB, the disk number is dynamically assigned when connected, and DR14 has no clear meaning; I do not have 13, 14 or 15 partitions or disks. Forums are full of people complaining that they can't make sense out of the message). All I know is that Win7 is not happy with somethign on some disk(s).

I suspected something might be wrong with the external enclosure - already 3 years old, and went out to buy a new one (Probox, externally powered, USB3). Symptoms continued :-(

Suspected somethign was wrong with the motherboard - already 4 years old, and went out to buy a new one. Symptoms continued :-(((

I can run burn-in tests on the system all day without seeing any system malfunction or any unusual system event. Then the system just freezes, roughly once a day. I've ran Seatools and DLGdiag, and get conflicting reports. SMART always passes, while self-tests occasionally fail in one but not the other, and vice versa - and the error codes are meaningless (eg "aborted by host" - no description of this on manufacturer website)

I'm beginning to suspect the controler in the enclosure: it's the only component that is transparent to the OS (HD controller is visible by the HD diags, and motherboard controllers must be visible to the OS or diag software; I should see event logs or diag errors if it was mboard related. However, the controller that mixes 4 x SATA interfaces into a single USB is inside the enclosure, and I can't figure out any way to identify, test or diagnose it.). It seems crazy that both an old one and a new one would fail, but it's the only option left I can think of.

I would be very interested in:
- feedback if anyone else has experienced such symptoms on Win7/64 (I've never had a system freeze before - only BoD and event log errors which always lead eventualyl to the root cause)
- ideas on testing the enclosure's controller.

Thanks
Gus

 
Solution
I would start by bare bone your pc. Take out extra wifi cards and unplug all USB stuff other then keyboard and mouse. If the system locks up. Then it could be power issues or bad main hard drive controller. When people with ext hard drives think the cases controller is bad or the drive it the devices is out of warranty I pull the drive and hook it up to mb Sata port.
I would start by bare bone your pc. Take out extra wifi cards and unplug all USB stuff other then keyboard and mouse. If the system locks up. Then it could be power issues or bad main hard drive controller. When people with ext hard drives think the cases controller is bad or the drive it the devices is out of warranty I pull the drive and hook it up to mb Sata port.
 
Solution

virgule

Honorable
May 4, 2013
3
0
10,510
Smart man! Forgot about that one... Indeed, I can remove all the extra stuff, INCLUDING mounting all the HD directly on mboard sata controllers to take out the enclosure controller out of the equation. Will definitely help to pinpoint whatever the problem is. Will try and report asap. Thanks!

 

virgule

Honorable
May 4, 2013
3
0
10,510
This is an update to close the thread. I haven't found anything wrong with my external USB enclosure, nor with its embedded SATA to USB controller.
What I did do is follow the tried & tested advice to strip the PC down to bare bones, and put back all the components one by one until I could spot the problem. I still have not found anything conclusive - however the PC does work fine now, so here's what I ended up doing, in case someone comes across this thread:
- I had (2 x 2GB) + (2 x 4GB) = 4 sticks of RAM, of 2 different makes (old PC + new PC). While they were compatible in speed and specs (according to mobo manuals), I read somewhere on the net that some overclockers find that the default 1.5V for RAM is sometimes insufficient for gaming purposes. I'm not a gamer and I don't overclock - but I do have an 8GB ramdisk where I set the TMP and browser cache. I set the voltage to 1.6V, and ran memtest86+ for a night without any problems. System seems very happy since.
- I had an uncorrected/unreadable sector on one disk. It did not cause any I/O errors, but somehow a full chkdsk would not remap the faulty sector either. Backed-up the disk, ran a Seagate low-level disk R/W cycle on the entire disk, and voila, back to 100% health.
- I changed the USB3 cable just in case

Everything seemed perfect at that point, until the PC died in front of me, without any warning. The only thing running was utorrent, so I decided to look into what might be the causes there. Realized I didn't have any ipfilters setup, and started suspecting some anti-p2p activity. Installed "David Moore's ipfilter updater"...and haven't had any blue screen since! I do have a shitload of chinese IPs being blocked every second.

Bottom line: nothing wrong with the controller. But it pays to keep your system clean!