Storing Data For The Next 1000 Years

Santa Cruz (CA) – Have you ever thought how vulnerable your data may be through the simple fact that you may be storing your entire digital life on a single hard drive? On single drive can hold tens of thousands of pictures, thousands of music files, videos, letters and countless other documents. One malfunctioning drive can wipe out your virtual life in a blink of an eye. A scary thought. On a greater scale, at least portions of the digital information describing our generation may be put at risk by current storage technologies. There are only a few decades of life in tape and disk storage these days, but a team of researchers claims to have come up with a power-efficient, scalable way to reliably store data with regular hard drives for an estimated (theoretical) 1400 years.

Data storage has several huge challenges. It isn’t quite the available storage capacity anymore that represents a problem. Especially hard drives are very cheap these days and if you aren’t recording and saving lots of high-definition movies on your home network you are unlikely to be forced to upgrade your 500 GB, 640 GB or even 1 TB standard storage capacity that comes with today’s PCs. Security is a greater problem: How do you safeguard your data? How do you archive it and how do you preserve over the years in a manageable, reliable, cost-efficient and power-efficient way? Think in larger proportions - corporations, libraries, governments – and each of those individual questions turn into huge problems.

Researchers from the University of California Santa Cruz have come up with a new idea that could allow individuals and larger organizations to efficiently and reliably store their data over a longer time frame – and at least offer a way to preserve data for future generations. "There is a risk that an entire generation’s cultural history could be lost if people aren’t able to retrieve that data," Storer said. "Everyone is switching to digital cameras, but we’ve never demonstrated that digital data can be reliably preserved for a long time," said Mark Storer, a graduate student at UCSF.

At this time there are no systems in place and much of the data is based on test runs of a prototype system and estimates. And at least those estimates are promising. A 10 PB storage system could be built for about $4700 with an annual operational cost (power for running and cooling the system) of about $50. To come up with an indication of the system’s likely reliability, the researchers used a metric of the expected mean time to data loss (MTTDL) of a deployed Pergamum system. The estimate assumes that each active device transfers a constant 2 MB/s and an on-disk sector error rate is of 1/13245 hours. Each disk in the system was estimated to fail at a rate of 1/100000 hours and was subject to a full “scrub” every year or every 8640 hours. The rebuild time of a single device in this simulated system was put a 100 hours or 3 MB/s.

In a simulation that uses 1 TB hard drives in a 10 PB Pergamum system structured with three inter-disk parity segments per 16-disk reliability group and 3 intra-disk parity blocks per segment, the estimated reliability came up at a MTTDL of 1.25×107 hours, or about 1400 years. Of course, that is just an estimate. But this estimate is far beyond of what we have heard so far and certainly the first true long-term storage idea we have come across.

Wolfgang Gruener
Contributor

Wolfgang Gruener is an experienced professional in digital strategy and content, specializing in web strategy, content architecture, user experience, and applying AI in content operations within the insurtech industry. His previous roles include Director, Digital Strategy and Content Experience at American Eagle, Managing Editor at TG Daily, and contributing to publications like Tom's Guide and Tom's Hardware.