We use cookies to provide you with a better experience. If you continue to use this site, we'll assume you're happy with this. Alternatively, click here to find out how to manage these cookies

hide cookie message
80,258 News Articles

How do we solve the digital archiving problem?

Vast quantities of digital data we aren't equipped to preserve

When it comes to digital data, we're churning out more than ever but as new storage mediums and file format emerge, making others obsolete, it becomes harder to access data we have previously archived. There's a good chance the digital data we are currently generating will very likely become unusable within our lifetimes unless we take steps to preserve it. We look at the drawbacks of digital storage.

Online survivability

What about online storage? These are hard drives that are turned on and ready for immediate access. Here, the data can be constantly checked for integrity and easily replicated. But it can also be corrupted quickly, and the long-term reliability necessary for archiving is not on the horizon, complains David S.H. Rosenthal, chief scientist for the 'Lots of Copies Keep Stuff Safe' (LOCKSS) program, a Stanford University Libraries initiative.

Rosenthal has investigated what would be required for a petabyte stored online to have a 50 percent chance of being usable after a century. Analysing the drive maintenance data published by various storage farms, he found that to reach the petabyte-century goal, the reliability of online storage has to be improved by a factor of 109 (ie 1 billion).

But even if we could honestly achieve a billion-fold improvement in online storage reliability, there would be no realistic way to test such a system short of plugging it in and waiting 100 years, he points out.

With the odds of digital survival being so low, and with so much information originating in digital form, "we could be facing a digital dark age 50 years from now, and future scholars will not be able to understand our culture," says Andy Maltz, director of the science and technology council of the Academy of Motion Picture Arts and Sciences - the group that awards the Oscars.

Preservation standards

With awareness of the problem growing, various organisations have been working on approaches to the archiving problem, focusing primarily on ways to reduce the danger of format obsolescence.

Preventing obsolescence usually involves developing dictionaries of metadata - information about a file that is stored with a file. That way, future users won't be stuck like the scientists in 1999 who were unable to make any sense of magnetic tapes containing NASA's Mars probe data from 1975. (After finding some printouts, the scientists were able to analyze about one-third of the data.)

Beyond standards, there is also a more subtle management issue. "Most organisations could not tell you how long certain electronic content needs to be kept, and only 5 percent to 10 percent are tagging the content with metadata in sufficient detail" for employees to know how long to keep the data, says Donald Post, a SNIA spokesman and a partner at Imerge Consulting, a firm specialising in records management. "Meanwhile, 80 percent of what they are trying to keep are duplicates, but they are not taking the time to discard the duplicates. And 95 percent think that making a routine backup is [sufficient] protection."

Enterprise IT managers aren't pushing for commercial solutions to the problem, and therefore vendors aren't rushing to offer any, says Post, but he also expects the situation to change within the next three years as vendors realise the commercial potential for digital preservation products.

NEXT PAGE: Keeping bits alive

  1. We can't preserve the data we're churning out
  2. Online survivability
  3. Keeping bits alive
  4. How other government agencies handle the issue
  5. Troubled Oscars and libraries
  6. Looking to the future

IDG UK Sites

Android M Developer Preview announced at Google I/O: Android M UK release date and new features. Wh?......

IDG UK Sites

Why I think the Apple Watch sucks and you'd be mad to buy it

IDG UK Sites

Ben & Holly's Game of Thrones titles spoof is delightfully silly

IDG UK Sites

Mac OS X 10.11 release date rumours: all the new features expected in Yosemite successor