When it comes to digital data, we're churning out more than ever but as new storage mediums and file format emerge, making others obsolete, it becomes harder to access data we have previously archived. There's a good chance the digital data we are currently generating will very likely become unusable within our lifetimes unless we take steps to preserve it. We look at the drawbacks of digital storage.
Troubled Oscars and libraries
The movie industry got a nasty shock when it became evident that digital data is an impermanent medium. Before Hollywood adopted digital technology, it relied on celluloid film, and movies archived on that medium have lasted a century, according to Maltz of the Academy of Motion Picture Arts and Sciences. A 2007 study by the Academy found that the long-term cost of archiving the master material of a commercial movie on film is $1,059 (£692) per year. In digital format, the cost is 11 times higher - $12,514 (£8,175) per year.
With digital technology, "you have to migrate your data formats and storage media - your technology infrastructure - every three to five years, or your data may be unrecoverable," he says.
The Academy has undertaken several projects to try to address the problem. For example, it has launched an effort to develop image file interchange conversion formats and metadata standards that would work for the movie industry. It also built an experimental digital preservation system. "I can say that it turned out to be way more complicated than we understood when we began," Maltz said of Hollywood's digital initiatives.
Digital impermanence has also been a problem for libraries, says Vicky Reich, head of the LOCKSS program at Stanford University Libraries. Not only can material disappear in a twinkling, but troublemakers can tamper with things without leaving any evidence.
"Paper libraries are under attack a lot," she says, explaining that the challenges librarians face include people who remove books or magazine articles on topics they don't approve of. But with printed publications, there are usually multiple copies in libraries scattered across a particular jurisdiction, so it's unlikely that a crusade to eliminate a specific piece of material could be completely successful.
The LOCKSS project takes the same decentralised approach in the digital domain. Participating libraries (currently about 200, predominately at universities) first set up a PC to devote to the archiving project; the machine must have an internet connection and at least two terabytes of storage and be equipped with open-source LOCKSS software. Each library then chooses material from a list of about 420 publishers that have granted permission to archive their publications, or a library can get permission elsewhere on its own. The machines then crawl the sources and copy their material. The library machines act as a proxies for the original sites, serving clicks when the original sites can't.
LOCKSS machines with the same originals will compare their content and repair it as necessary. There's no tape backup - the machines back each other up, Reich says. The 'magic number' needed to ensure preservation appears to be six or seven, she adds, and it results from random overlapping among the preservation choices made by the participating libraries
NEXT PAGE: Looking to the future