Storing Data For The Next 1000 Years

Santa Cruz (CA) – Have you ever thought how vulnerable your data may be through the simple fact that you may be storing your entire digital life on a single hard drive? On single drive can hold tens of thousands of pictures, thousands of music files, videos, letters and countless other documents. One malfunctioning drive can wipe out your virtual life in a blink of an eye. A scary thought. On a greater scale, at least portions of the digital information describing our generation may be put at risk by current storage technologies. There are only a few decades of life in tape and disk storage these days, but a team of researchers claims to have come up with a power-efficient, scalable way to reliably store data with regular hard drives for an estimated (theoretical) 1400 years.

Data storage has several huge challenges. It isn’t quite the available storage capacity anymore that represents a problem. Especially hard drives are very cheap these days and if you aren’t recording and saving lots of high-definition movies on your home network you are unlikely to be forced to upgrade your 500 GB, 640 GB or even 1 TB standard storage capacity that comes with today’s PCs. Security is a greater problem: How do you safeguard your data? How do you archive it and how do you preserve over the years in a manageable, reliable, cost-efficient and power-efficient way? Think in larger proportions - corporations, libraries, governments – and each of those individual questions turn into huge problems.

Archiving data, which means that you don’t just backup data, you also want to have fast access to it, is primarily done through simple hard drives today. Depending on the hard drive you purchase, the device will have an indicated reliability of somewhere between 300,000 and more than 1 million hours of meantime between failures (MTBF). This MTBF number is a bit misleading for customers who don’t purchase a large number of drives, as it suggests that even a 300K MTBF drive will work for more than 34 years. However, the MTBF number is only intended to be used across a large number of drives; for individual use, the service life of a hard drive is a better guideline of how reliable a drive really is: Typically, the service life of a general hard drive is somewhere between five and seven years.

Tape drives are usually used for backups, since data can’t be easily accessed through such a system, which, by the way, gives tape media a service life of somewhere between 10 and 30 years. Archiving data, of course, can only be done through optical discs, which also have an expected life of somewhere in the range of 30 to 50 years.

Researchers from the University of California Santa Cruz have come up with a new idea that could allow individuals and larger organizations to efficiently and reliably store their data over a longer time frame – and at least offer a way to preserve data for future generations. "There is a risk that an entire generation’s cultural history could be lost if people aren’t able to retrieve that data," Storer said. "Everyone is switching to digital cameras, but we’ve never demonstrated that digital data can be reliably preserved for a long time," said Mark Storer, a graduate student at UCSF.

Together with Kevin Greenan and associate professor Ethan Miller and Kaladhar Voruganti, a along with researcher at NetApp, he developed the idea of Pergamum, a new disk-based approach for archiving data.

Pergamum, named after the ancient Greek library that made the transition from fragile papyrus to more durable parchment, is designed as a distributed network of individually fully functional network storage devices. Compared to current MAIDs (Massive Arrays of Idle Disks), NAND flash memory (described as on-volatile random access memory - NVRAM) within the project has been added to each node with the purpose to store data signatures, metadata, and other small items, allowing deferred writes, metadata requests and inter-disk data verification to be performed while the disk is powered off. Since the NVRAM can run frequent searches without the need to spin up a hard drive, the disk media can remain powered down more often, effectively reducing wear as well as the power consumption of a MAID.

According to the project group, Pergamum uses both intra-disk and inter-disk redundancy to guard against data loss, relying on hash tree-like structures of algebraic signatures to efficiently verify the correctness of stored data. If failures occur, Pergamum uses a staggered rebuild to reduce peak energy usage while rebuilding large redundancy stripes. In a typical scenario, 95% of the disks would remain spun down at any given time in such a setup, the researchers claim.

According to a published research paper called “Pergamum: Replacing Tape with Energy Efficient, Reliable, Disk-Based Archival Storage”, a data archive built from this technology would be created through individual independent storage appliances or “tomes”. Each tome consists of an ARM based processor to manage the unit (and run processes such as virus scans), a SATA-class hard drive, NVRAM as well as an Ethernet controller and network port. Networking the drives would be supported by a “star-type structure of commodity switches” at the outer branches of the network and “potentially higher-performance switches in the core”.

The NVRAM plays the critical role of keeping the power consumption as well as the wear on hard drives down. But since flash memory has a limit on its life - based on the number of writes in each of its blocks – there is suddenly another variable that can impact the reliability of this system. However, the researchers claim that since NVRAM primarily holds metadata such as algebraic signatures and index information, flash writes are relatively rare – only about 1000 times during a year, while it is generally assumed that high-quality NAND flash memory supports more than 1 million rewrites.

So, how effective can Pergamum be?

At this time there are no systems in place and much of the data is based on test runs of a prototype system and estimates. And at least those estimates are promising. A 10 PB storage system could be built for about $4700 with an annual operational cost (power for running and cooling the system) of about $50. To come up with an indication of the system’s likely reliability, the researchers used a metric of the expected mean time to data loss (MTTDL) of a deployed Pergamum system. The estimate assumes that each active device transfers a constant 2 MB/s and an on-disk sector error rate is of 1/13245 hours. Each disk in the system was estimated to fail at a rate of 1/100000 hours and was subject to a full “scrub” every year or every 8640 hours. The rebuild time of a single device in this simulated system was put a 100 hours or 3 MB/s.

In a simulation that uses 1 TB hard drives in a 10 PB Pergamum system structured with three inter-disk parity segments per 16-disk reliability group and 3 intra-disk parity blocks per segment, the estimated reliability came up at a MTTDL of 1.25×107 hours, or about 1400 years. Of course, that is just an estimate. But this estimate is far beyond of what we have heard so far and certainly the first true long-term storage idea we have come across.

Performance is another aspect of this system. Using 400 MHz ARM 9 CPUs, 7200 rpm SATA drives as well as 1 GB of flash metadata storage resulted in a Pergamum tome-side throughput of 3.25 MB/s, the researchers said.

  • wavetrex
    "A 10 PB storage system could be built for about $4700"

    That is 4.7$ per 1TB disk, without considering ANY redundancy or extra hardware to handle the wear spread...
    I wish I knew where to buy such cheap 1TB drives...

    Something is wrong with that sum, or the storage size.
  • rkhpedersen
    Year, they probably mean a 10 terabyte system.
  • alangiv
    "throughput of 3.25 MB/s"???

    Kind of slow... Is that a typo by any chance?

  • ntrceptr
    And how did they calculate 1400 years? I guess I'm not as imaginative to come up with it. It's about as fast as some old ATA33 drives that need defragging.
    It's a good thought, we need some type of longer term storage but maybe they should look to new mediums (crystal, holographic, etc...)
    Anything with moving parts will fail!
  • So in each group of 16 drives 6 of the drives are providing redundant parity stripes? So, for 10 petabytes you'd need 1639 1-Terabyte harddrives, each with their own dedicated ARM 9 proc, ethernet port, and NVRAM. So hardware costs for a single unit:
    ARM 9 Board with SATA and Ethernet: $229
    1 TB SATA Harddrive: $159
    8GB SD Card NVRAM: $30
    To hit 10 PB 1639 units
    $685,102.00 Dollars

    You still need to add in network switches, and a chassis for all this, etc.