When it comes to digital data, we're churning out more than ever but as new storage mediums and file format emerge, making others obsolete, it becomes harder to access data we have previously archived. There's a good chance the digital data we are currently generating will very likely become unusable within our lifetimes unless we take steps to preserve it. We look at the drawbacks of digital storage.
Keeping bits alive
Of course, there are some organisations that are successfully dealing with the challenge of digital archiving.
"Most countries have this problem of digital preservation," notes Dyung Le, director of systems engineering for the Electronic Records Archive initiative of the US National Archives and Records Administration. There, archived tapes are recopied every 10 years, and the National Archives tries to have at least three copies of everything, with at least one copy being off-site. The agency manages more than 400 terabytes of data, he estimates.
Since there's no telling what computer applications will be in use centuries from now, text-based material is typically converted to XML, which is based on ASCII. Various forms of metadata are preserved in the file, including descriptive data that could be used as a search aid. Le said that the XML files store the metadata using an extension of PREMIS (Preservation Metadata: Implementation Strategies), a digital preservation standard also based on XML and ASCII and created by the Online Computer Library Centre.
There's no intermediate format like XML for non-text data, Le said. Therefore, the best an organisation that wants to archive material can do is note what format the material is in and plan to eventually migrate it to whatever application format is dominant in the future - but it must do that at a time when systems for converting from the original format are still available, Le says. In other words, organisations must take their best guess about what formats will be used in the future and convert to them while they still can.
An archivist must also be able to certify that material being saved is an authentic copy, he explains. That's done by creating a hash key for each file; the hash keys travel with the file. When copies are supplied, the archivist must also certify that no characteristics of the file have been changed that would change the meaning of the material. For that reason, text must sometimes be preserved in its original format, since the formatting is deemed essential to the meaning, Le adds.
Other government agencies, state archives and libraries, and sometimes even private individuals, are also facing the problem of digital preservation. For them the Library of Congress, at the direction of Congress, has set up the National Digital Information Infrastructure and Preservation Program (NDIIPP), says LeFurgy.
NDIIPP officials are working with about 170 stakeholders, including trade organisations and foreign governments, and they publish an inventory of tools and services at DigitalPreservation.gov.
NEXT PAGE: How other government agencies handle the issue