When I came home today, my Windows 7 installation was displaying an error along the lines of "Windows has detected a hard disk problem. Back up your disk before it fails and send the computer back to its manufacturer."
In the "Details" section of the dialog box, it listed the failing disk as, to my surprise, my 60GB system SSD, an OCZ Agility 3 on a SATA III port.
Figuring that I may have just reached the cycle threshold or whatever the limit is at which SSDs begin to fail, I checked the disk's SMART data, finding that all attributes seemed fine, except for one: "Temperature."
The Temperature attribute is confusing to me. As far as I know, the word "temperature" refers to little other than a measure of the kinetic energy of particles, as can be easily measured with a thermometer. However, it was my understanding that SSDs have no thermal probe in them, as they do not produce or respond greatly to heat.
The SMART data shows a temperature of 9 somethings, where the threshold is 10.
I have not made any system modifications in the last couple weeks. The other temperatures in my system are being reported a bit on the warm side: my core temperatures (on my i5-2500k) are being reported as 50, 52, 52, and 49C, and my GPU (a Radeon HD 6850) at 65C. I'm using an old Lian Li case with little ventilation, so these tend to be around normal temperatures; I haven't had any stability issues with the rig since I built it nearly a year and a half ago.
What may be causing the sudden SMART panic, and is it an actual concern to my SSD's health?
EDIT - additional information:
The Windows event log reports first:
The driver has detected that device \Device\Harddisk2\DR2 has predicted that it will fail. Immediately back up your data and replace your hard disk drive. A failure may be imminent.
Windows Disk Diagnostic detected a S.M.A.R.T. fault on disk OCZ-AGILITY3 ATA Device (volumes C:\). This disk might fail; back up your computer now. All data on the hard disk, including files, documents, pictures, programs, and settings might be lost if your hard disk fails. To determine if the hard disk needs to be repaired or replaced, contact the manufacturer of your computer. If you can't back up (for example, you have no CDs or other backup media), you should shut down your computer and restart when you have backup media available. In the meantime, do not save any critical files to this disk.
Both of these events were recorded a hair under two hours ago.
More about :ocz agility ssd smart temperature threshold
That's what I thought, but the drive now seems to be having actual problems.
A couple hours after my last post, I came into the room and the PC had rebooted, displaying a "Disk may be not available!" message in the SATA controller. I presume the reboot was bluescreen-induced, as the system would have practically never rebooted otherwise.
The SATA message is what I was expecting, since it was reporting some sort of SMART error that didn't actually exist, but I booted into an Ubuntu 12.04 live CD, where I was greeted by a screen full of "Hard Disk Problems Detected" dialog boxes. When I investigated the SMART data in Ubuntu's vastly-superior disk manager, I found the same "failing" Temperature attribute, alongside ANOTHER "Temperature" attribute reporting 30C. I don't know what either of those would mean, but the one that's reporting 30C has no threshold.
Ubuntu also reported a "warning" on the reallocated sector count, which led me to believe that the drive had failed more, but the value is still ten sectors, as it was in the above-linked SMART data; Ubuntu is just being more picky.
Just to be on the safe side, I'm going to try to clone the disk to a partition on one of my mechanical drives, then flash the firmware you linked above to try to make it stop whining about some sort of imaginary Temperature.
Unfortunately, I was actually planning on using the computer sometime in the next few months, rather than spending them trying to get CS6 to install properly.
Seriously, though, I'll see how it goes. The cloning process seems to be working, which is promising for the condition of the SSD.
UPDATE: I wasn't able to burn the SSD Tools ISO from the Live CD, so I'm now downloading it in Windows. I was able to boot into said Windows; I'm not sure which drive the partition is on (as I have identical partitions on two drives now), but I can't tell what happened last night to make the thing reboot. All the information it gives me is "The previous system shutdown at 9:29:54 PM on 4/23/2013 was unexpected."
After booting into the Toolbox, the drive's SMART data was reporting "Temperature" at 30C, as is the correct value. The 9-with-threshold-10 was being shown as "SSD Life Left."
After booting back into my OS, I noticed that I am actually seeing two Temperature attributes: one reporting 30 and one reporting 9.
This means that the assumption in my first post, that temperature is invariably a measure of kinetic energy, was incorrect; the word "temperature," apparently, can also refer to the inverse of a percentage of write/erase cycles used on an SSD. Now I know.
I'll go ahead and replace the drive. I've read that the reported value of this attribute can change after the firmware is flashed, but it had gotten to be too small for me anyway and I don't want to take any chances.