Storing Data For The Next 1000 Years

Santa Cruz (CA) – Have you ever thought how vulnerable your data may be through the simple fact that you may be storing your entire digital life on a single hard drive? On single drive can hold tens of thousands of pictures, thousands of music files, videos, letters and countless other documents. One malfunctioning drive can wipe out your virtual life in a blink of an eye. A scary thought. On a greater scale, at least portions of the digital information describing our generation may be put at risk by current storage technologies. There are only a few decades of life in tape and disk storage these days, but a team of researchers claims to have come up with a power-efficient, scalable way to reliably store data with regular hard drives for an estimated (theoretical) 1400 years.

Data storage has several huge challenges. It isn’t quite the available storage capacity anymore that represents a problem. Especially hard drives are very cheap these days and if you aren’t recording and saving lots of high-definition movies on your home network you are unlikely to be forced to upgrade your 500 GB, 640 GB or even 1 TB standard storage capacity that comes with today’s PCs. Security is a greater problem: How do you safeguard your data? How do you archive it and how do you preserve over the years in a manageable, reliable, cost-efficient and power-efficient way? Think in larger proportions - corporations, libraries, governments – and each of those individual questions turn into huge problems.

Researchers from the University of California Santa Cruz have come up with a new idea that could allow individuals and larger organizations to efficiently and reliably store their data over a longer time frame – and at least offer a way to preserve data for future generations. "There is a risk that an entire generation’s cultural history could be lost if people aren’t able to retrieve that data," Storer said. "Everyone is switching to digital cameras, but we’ve never demonstrated that digital data can be reliably preserved for a long time," said Mark Storer, a graduate student at UCSF.

At this time there are no systems in place and much of the data is based on test runs of a prototype system and estimates. And at least those estimates are promising. A 10 PB storage system could be built for about $4700 with an annual operational cost (power for running and cooling the system) of about $50. To come up with an indication of the system’s likely reliability, the researchers used a metric of the expected mean time to data loss (MTTDL) of a deployed Pergamum system. The estimate assumes that each active device transfers a constant 2 MB/s and an on-disk sector error rate is of 1/13245 hours. Each disk in the system was estimated to fail at a rate of 1/100000 hours and was subject to a full “scrub” every year or every 8640 hours. The rebuild time of a single device in this simulated system was put a 100 hours or 3 MB/s.

In a simulation that uses 1 TB hard drives in a 10 PB Pergamum system structured with three inter-disk parity segments per 16-disk reliability group and 3 intra-disk parity blocks per segment, the estimated reliability came up at a MTTDL of 1.25×107 hours, or about 1400 years. Of course, that is just an estimate. But this estimate is far beyond of what we have heard so far and certainly the first true long-term storage idea we have come across.

Wolfgang Gruener
Contributor

Wolfgang Gruener is an experienced professional in digital strategy and content, specializing in web strategy, content architecture, user experience, and applying AI in content operations within the insurtech industry. His previous roles include Director, Digital Strategy and Content Experience at American Eagle, Managing Editor at TG Daily, and contributing to publications like Tom's Guide and Tom's Hardware.

  • wavetrex
    "A 10 PB storage system could be built for about $4700"

    That is 4.7$ per 1TB disk, without considering ANY redundancy or extra hardware to handle the wear spread...
    I wish I knew where to buy such cheap 1TB drives...

    Something is wrong with that sum, or the storage size.
    Reply
  • rkhpedersen
    Year, they probably mean a 10 terabyte system.
    Reply
  • alangiv
    "throughput of 3.25 MB/s"???

    Kind of slow... Is that a typo by any chance?

    Reply
  • ntrceptr
    And how did they calculate 1400 years? I guess I'm not as imaginative to come up with it. It's about as fast as some old ATA33 drives that need defragging.
    It's a good thought, we need some type of longer term storage but maybe they should look to new mediums (crystal, holographic, etc...)
    Anything with moving parts will fail!
    Reply
  • So in each group of 16 drives 6 of the drives are providing redundant parity stripes? So, for 10 petabytes you'd need 1639 1-Terabyte harddrives, each with their own dedicated ARM 9 proc, ethernet port, and NVRAM. So hardware costs for a single unit:
    ARM 9 Board with SATA and Ethernet: $229
    1 TB SATA Harddrive: $159
    8GB SD Card NVRAM: $30
    -------------------------
    $418
    To hit 10 PB 1639 units
    --------------------------
    $685,102.00 Dollars

    You still need to add in network switches, and a chassis for all this, etc.
    Reply