When it comes to digital data, we're churning out more than ever but as new storage mediums and file format emerge, making others obsolete, it becomes harder to access data we have previously archived. There's a good chance the digital data we are currently generating will very likely become unusable within our lifetimes unless we take steps to preserve it. We look at the drawbacks of digital storage.
What about online storage? These are hard drives that are turned on and ready for immediate access. Here, the data can be constantly checked for integrity and easily replicated. But it can also be corrupted quickly, and the long-term reliability necessary for archiving is not on the horizon, complains David S.H. Rosenthal, chief scientist for the 'Lots of Copies Keep Stuff Safe' (LOCKSS) program, a Stanford University Libraries initiative.
Rosenthal has investigated what would be required for a petabyte stored online to have a 50 percent chance of being usable after a century. Analysing the drive maintenance data published by various storage farms, he found that to reach the petabyte-century goal, the reliability of online storage has to be improved by a factor of 109 (ie 1 billion).
But even if we could honestly achieve a billion-fold improvement in online storage reliability, there would be no realistic way to test such a system short of plugging it in and waiting 100 years, he points out.
With the odds of digital survival being so low, and with so much information originating in digital form, "we could be facing a digital dark age 50 years from now, and future scholars will not be able to understand our culture," says Andy Maltz, director of the science and technology council of the Academy of Motion Picture Arts and Sciences - the group that awards the Oscars.
With awareness of the problem growing, various organisations have been working on approaches to the archiving problem, focusing primarily on ways to reduce the danger of format obsolescence.
Preventing obsolescence usually involves developing dictionaries of metadata - information about a file that is stored with a file. That way, future users won't be stuck like the scientists in 1999 who were unable to make any sense of magnetic tapes containing NASA's Mars probe data from 1975. (After finding some printouts, the scientists were able to analyze about one-third of the data.)
Beyond standards, there is also a more subtle management issue. "Most organisations could not tell you how long certain electronic content needs to be kept, and only 5 percent to 10 percent are tagging the content with metadata in sufficient detail" for employees to know how long to keep the data, says Donald Post, a SNIA spokesman and a partner at Imerge Consulting, a firm specialising in records management. "Meanwhile, 80 percent of what they are trying to keep are duplicates, but they are not taking the time to discard the duplicates. And 95 percent think that making a routine backup is [sufficient] protection."
Enterprise IT managers aren't pushing for commercial solutions to the problem, and therefore vendors aren't rushing to offer any, says Post, but he also expects the situation to change within the next three years as vendors realise the commercial potential for digital preservation products.
NEXT PAGE: Keeping bits alive