Understanding Standards for Digital Newspapers

Digital preservation standards and practices grow annually in number and complexity. As a result, it is hard to know where to start, or how to define workflows that will last a reasonable length of time.

An institution that seeks to preserve its digital newspapers may turn to various authoritative sources to get its bearings. It might turn to the Library of Congress or the newspaper sections and working groups of various library and archival professional associations such as the Center for Research Libraries Global Resources Network (CRL GRN), the American Library Association (ALA), the Society of American Archivists (SAA), or the International Federation of Library Associations (IFLA). It might also turn to various professional listservs such as newslib, digi-pres, code4lib, or digital-curation.

Along the way, the institution almost certainly will gain familiarity with the standards known as the Reference Model for an Open Archival Information System (OAIS), and ISO:16363 Audit and certification of trustworthy digital repositories. Both of these standards have been instrumental in formulating the general concepts and terminology necessary to implement a digital archive. They also help to outline the organizational and technical practices and technologies that auditors and stakeholders should be able to evaluate. These standards aim less to suggest particular implementations than to set forth the full range of requirements needed to accomplish preservation in a responsible fashion.

The institution likely will also encounter the National Digital Newspaper Program (NDNP) Technical Guidelines. Released first in 2007 and updated for each phase of NDNP, these specifications address scanning resolutions and establish standard, high-quality file formats for digitization (e.g., TIFF 6.0). They also provide quality requirements for uniform metadata (e.g., CONSER-derived), encoding levels (METS-ALTO), and derivative file formats (e.g., JPEG2000 and PDF w/Hidden Text). Each of these technical requirements is in keeping with current, accepted high standards for image-based archival-quality digitization and prepares the collections for long-term preservation.

The institution will also grapple with various recommendations regarding preservation metadata standards and schemas. In particular, two standards—the Metadata Encoding Transmission Standard (METS) and Preservation Metadata: Implementation Strategies (PREMIS)—have been designed as robust strategies for encapsulating the widest possible range of preservation-oriented information about digital objects and collections. The goal of these standards is to help institutions provide better lifecycle management for these objects.

Each of these standards documents comprehensive strategies for accomplishing some part of the complex task of preserving digital content. However, these comprehensive standards can seem formidable, even to experienced preservationists. Upon gaining familiarity with the standards literature, an institution might worry that it would need to completely re-think or reverse its practices to begin preserving its content.

If an institution can engage in an incremental process that allows it to begin preserving content now, while slowly and steadily building toward an optimal level of preservation readiness, it will be more likely to begin participating in preservation activities. Once institutions begin preserving content, they also will begin building the requisite expertise and knowledge in this area to prepare new collections and normalize legacy collections according to optimal standards.

  1. Throughout, “digital newspapers” are referenced. My question is are these digitized surrogates of newspapers or born-digital? Or both? I think a clarification between the two early in the document might be sufficient. Perhaps it wouldn’t even matter to the audience, but speaking as an archivist who does indeed have born-digital newspapers in our collection it might be something to consider.

