Several consecutive new hard disks crashing

andresbl

Reputable
Mar 26, 2014
4
0
4,510
Hello. I'm using Ubuntu, and during the last two years I've had several consecutive hard disk crashes (fortunately covered by the guaranty, but now it's running out). They were WD and Seagate disks, unsealed in front of me by the seller. Everything works fine for a few months, but then the system starts freezing for about 30 seconds, more and more often, until the hard disk fails. Before the latest crash I found entries like these appearing in syslog when those "freezings" happen:

Mar 23 14:14:58 CYA-Ubu kernel: [18704.992050] ata1: lost interrupt (Status 0x50)
Mar 23 14:14:58 CYA-Ubu kernel: [18704.992077] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Mar 23 14:14:58 CYA-Ubu kernel: [18704.992085] ata1.00: failed command: READ DMA EXT
Mar 23 14:14:58 CYA-Ubu kernel: [18704.992095] ata1.00: cmd 25/00:40:00:cd:52/00:00:59:00:00/e0 tag 0 dma 32768 in
Mar 23 14:14:58 CYA-Ubu kernel: [18704.992095] res 40/00:00:00:4f:c2/00:00:00:00:00/40 Emask 0x4 (timeout)
Mar 23 14:14:58 CYA-Ubu kernel: [18704.992101] ata1.00: status: { DRDY }
Mar 23 14:14:58 CYA-Ubu kernel: [18704.992115] ata1: soft resetting link
Mar 23 14:14:59 CYA-Ubu kernel: [18705.172386] ata1.00: configured for UDMA/133
Mar 23 14:14:59 CYA-Ubu kernel: [18705.172400] ata1.00: device reported invalid CHS sector 0
Mar 23 14:14:59 CYA-Ubu kernel: [18705.172418] ata1: EH complete

I searched for the error and found it could be related to a bad SATA cable, so I replaced it for a new one, but the error kept appearing and eventually the crash happened. A new hard disk was installed and now after three months the 30-seconds freezings and the errors are back. It seems like there is another component in my computer that is affecting the hard disk. I already replaced the SATA cable, and don't know what to do to detect the root of the problem. Thank you very much for any help you can bring.
 

TyrOd

Honorable
Aug 16, 2013
527
0
11,160


Gotcha.

The error reporting is showing that the drive is having problems reading system info and timing out, then resetting itself, this is usually due to failing heads.

Considering the time period were talking about is 2 years and modern drives are becoming more prone to these kinds of issues, it's not unlikely you've just been particularly unlucky with these drives.