I have 3xSATA hard drives and they are all failing

alex_ncfc

Honorable
May 3, 2012
16
0
10,510
Hi all, my first post here, which I can't believe considering how often I visit this site.

Anyway I have what is now an ageing system, but still performs very well (based on an Asus A8N-SLI Deluxe board, 2GB RAM, 4200+ Dual Core CPU) It's always done very well and I see no need to upgrade it as I don't need the latest games etc, and Windows 7 flies along perfectly well.

However, in the system I have three hard drives, 2xMaxtor (160GB and 80GB) and my main boot drive, a 320GB Seagate Barracuda. The Maxtor drives have been in the system since I built it, circa 2005, and I could understand if one or both of them had started to go a bit dodgy (which they have), but my Seagate drive is also on its last legs it seems, and all of this has happened fairly recently.

I've ran SeaTools on the Seagate drive, and it found errors and said it had 'repaired them' and when reading the help file I assume it did this by using spare sectors, but it won't test the Maxtor drives (despite Seagate now owning Maxtor and the website saying to use Seatools on legacy drives too)

Is it just a bad coincidence that my drives could all be failing at this time? I would usually back up files to one of the other drives, but I don't trust any of them. What sort of factors could cause faults like this, I have tried replacing the SATA cables as I thought maybe they had gone bad (they were the originals that came with the motherboard) but I am not sure if this was the issue. I'm at this stage where I'm thinking, is it the drives that are completely knackered, or is the SATA controller/OS causing problems? I have S.M.A.R.T enabled in the BIOS and all the drives pass on this.

Any help would be brilliant.
 
Something like that happened to me, once. I suspected the southbridge and replaced the motherboard. Unfortunately, one of the drives was unrecoverable - the damage done by the failing controller was too great. The others had actual errors caused by the previous motherboard but could be re-used.

So my most-likely, but totally unfounded, guess is the motherboard. If you have an SATA controller lying around, you could try installing it and moving the drives to it.

In the meantime, if you haven't already, back up everything that is important to you to an external drive. Preferably with the source drives attached to a spare machine!
 

alex_ncfc

Honorable
May 3, 2012
16
0
10,510
Thanks for the response - unfortunately I do not have either a SATA controller card to try these on, or another machine to put the hard drives in and scan/backup.

Oddly, I've disconnected the two Maxtor drives and just left the 320GB Windows drive in place. I booted Windows fine and there were no odd spin-down sounds being heard (as there were before, which would also be accompanied by a long, 2 minute or so, delay in Windows where I wouldn't be able to click on anything or use the keyboard)

I do have 4 SATA connectors on the board, but I also have 4 red SATA connectors which use a seperate controller (Silicon Image) but these are for RAID I think and I have no idea how any of that stuff works, or if I'd be able to use these as standard SATA connectors?
 

ram1009

Distinguished
Recently I had a HDD failure that cost me money to recover data. This was a first and a last for me. My new policy is to replace HDDs at 3 years regardless of symptoms or the lack thereof. Recent spikes in HDD prices make this a little less palatable but I believe this is already subsiding.
 

Paperdoc

Polypheme
Ambassador
This may not help, but it might, and it costs nothing. On older machines I have found often that intermittent performance can be caused simply by slightly dirty contacts at cable / plug junctions, caused by slow oxidation of the metal surfaces. To solve this (at least for a while), you disconnect power and open up the case. For each connector, you carefully unplug it, then reconnect, repeating several times. This action "scrubs" any oxide film on the metal pins and re-establishes good contact. In your case with SATA drives, there are thee connections per drive to address: the power input, the SATA data connector at the drive, and that same cable at the mobo port. When you have done this, re-check the case to be sure you did not loosen something else by mistake. Then re-check the connections to the SATA drives - some connectors are a bit loose and can droop under their own weight and loosen.

All of this is in the hope that the root of your problems is intermittent poor connections that cause the drives to experience read / write errors when used, producing long delays as the operation is re-tried until it works. This kind of problem will not show up at a SMART error. Often it will not show up in a diagnostic test because it just does not happen when that test is running, but sometimes it does.
 

Dogsnake

Distinguished
Since HD prices are falling, get yourself a 1T drive and back up everything as soon as possible. You can do separate partitions to preserve the separations. If you want disconnect it after backup if you suspect it is a MB issue. However all HDD have a life cycle and it has always been my practice to back up and plan a replacement when they shows signs of failure (new louder noises of any kind and a sudden increase in bad sectors...).
 

alex_ncfc

Honorable
May 3, 2012
16
0
10,510
Well this is bizarre - since trying the 320GB on its own (the drive that was making clunky sounds) SeaTools (which found loads of errors before) now finds none and the drive passes without any box coming up at the end saying that it has failed some of the tests.

So I decided to run a full CHKDSK /R too and again, this found nothing bad, nothing new - but it did report I had bad clusters in the end report summary after all the checks were done.

So then I tried CHKDSK /B which is to re-analyse bad sectors and I saw many messages that "bad cluster has been removed" or something similar that I can't recall. The end report now said this:



Checking file system on C:
The type of the file system is NTFS.

A disk check has been scheduled.
Windows will now check the disk.

CHKDSK is verifying files (stage 1 of 5)...
174848 file records processed. File verification completed.
327 large file records processed. 0 bad file records processed. 0 EA records processed. 92 reparse records processed. CHKDSK is verifying indexes (stage 2 of 5)...
228176 index entries processed. Index verification completed.
0 unindexed files scanned. 0 unindexed files recovered. CHKDSK is verifying security descriptors (stage 3 of 5)...
174848 file SDs/SIDs processed. Cleaning up 4 unused index entries from index $SII of file 0x9.
Cleaning up 4 unused index entries from index $SDH of file 0x9.
Cleaning up 4 unused security descriptors.
Security descriptor verification completed.
26665 data files processed. CHKDSK is verifying Usn Journal...
36523728 USN bytes processed. Usn Journal verification completed.
CHKDSK is verifying file data (stage 4 of 5)...
174832 files processed. File data verification completed.
CHKDSK is verifying free space (stage 5 of 5)...
52076417 free clusters processed. Free space verification is complete.
CHKDSK discovered free space marked as allocated in the volume bitmap.
Windows has made corrections to the file system.

312568831 KB total disk space.
103887944 KB in 145544 files.
87812 KB in 26666 indexes.
0 KB in bad sectors.
287407 KB in use by the system.
65536 KB occupied by the log file.
208305668 KB available on disk.

4096 bytes in each allocation unit.
78142207 total allocation units on disk.
52076417 allocation units available on disk.

Internal Info:
00 ab 02 00 be a0 02 00 67 c3 04 00 00 00 00 00 ........g.......
83 65 00 00 5c 00 00 00 00 00 00 00 00 00 00 00 .e..\...........
00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 ................

Windows has finished checking your disk.
Please wait while your computer restarts.

So now it's telling me there are no bad sectors!!

The drive seems to be running smoothly and I have heard no unusual clicking sounds or any spinning down, but this does beg the question, are the other hard drives being connected causing my boot drive to fail? I can't really test them independently as they have no OS on them.

The one thing I have noticed is in Defraggler, the health (SMART) of my 320GB Seagate (still running on it's own) doesn't look too good:

Driveonitsown.jpg


Could someone with a bit more knowledge please help me understand what the massive 4 billion numbers are towards the bottom of the report? I have trouble understanding all of these SMART parameters.

Thank you for all your ideas and responses, I may well look into getting a 1TB drive (I only wish SSD drives were cheaper!!) Also Paperdoc, thanks for your suggestion. I am thinking of dismantling the entire insides of the machine and even the motherboard and refitting it all. I am starting to wonder if my hard drives are too close together (when I am running the main boot Seagate drive on its own, I have unpowered the other Maxtor 2 completely so they do not warm up) but the trouble is I have a long graphics card which is sticking right into the area near one of the hard drive bays.

Any more help would be fantastic on this very weird occurance.
 

Dogsnake

Distinguished
First bad sectors are not repaired. Markers are place on them so they are excluded from the drives usable sectors. Once this is done, they are no longer seen by the drive. So when reanalyzed the analysis sees no bad sectors. I will say it again, get a new drive(s), back up your data and consider replacing the drives. It is far less costly in time and dollars to do this before you have a failure. The clunking noises can come and go as a drive deteriorates. GL
 
alex_ncfc, your drive looks very sick. However, the author's interpretations of your drive's SMART data are incorrect in several places.

Firstly, your drive cannot possibly have 4 billion pending sectors. That would amount to a capacity of 2TiB. Instead, the number is most likely 0xFFFF, ie the lower 16 bits of the 48-bit raw value. That amounts to 65535 sectors.

You can use Google's calculator for your hexadecimal arithmetic, eg:
http://www.google.com/search?q=0xFFFF+in+decimal

The other error is Airflow Temperature. In fact this is a misnomer. The actual meaning of this attribute is ...

100 - Temperature

Therefore a normalised value of 65 actually equates to a temperature of 35C (= 100 - 65). Furthermore, the raw value is composed of several byte values, namely 0x24 (= 36C), 0x15 (= 21C), and 0x23 (= 35C). I suspect that these are the max/min/current temperature readings for the current power cycle.

See this article for an explanation of the SMART attributes:
http://en.wikipedia.org/wiki/S.M.A.R.T.

The following article is my attempt to understand Seagate's peculiar SMART data.

Seagate's Seek Error Rate, Raw Read Error Rate, and Hardware ECC Recovered SMART attributes:
http://www.users.on.net/~fzabkar/HDD/Seagate_SER_RRER_HEC.html