The previous sections of these Guidelines have been geared towards preparing digital news collections and content for packaging. Whether an institution is seeking to store digital news content locally or to exchange their digital newspapers with an external preservation service provider, properly packaging this content provides curators with the necessary information, controls, and linkages to manage digital newspapers over time, including through changes in storage media, while still maintaining the integrity of both the objects and collections.
Packaging can be accomplished in multiple ways, including:
- Using simple (text-based) documentation strategies;
- By deploying globally unique identifiers (GUIDs) coupled with Name Assigning Authorities and other local metadata and management tools; and/or
- Via lossless packaging formats like TAR, WARC, or self-describing preservation specifications such as BagIt (among other approaches).
Using lossless packaging formats helps to hold content together as it is moved to archival storage, requested for routine audit purposes, and/or being sent to and from preservation partners over time. Each of these elements will be covered in the sub-sections ahead.
The Reference Model for an Open Archival Information System (OAIS) defines the range of information elements that assist in long-term preservation and access as Preservation Description Information (PDI). The Reference Model also describes three information package models—Submission Information Packages (SIPs), Archival Information Packages (AIPs), and Dissemination Information Packages (DIPs). These models build upon each other and for the purposes of these Guidelines, they are discussed as being highly interrelated. See the Definitions on the next page for more information about each of these package concepts and models.
The initial packaging of digital newspapers for long-term preservation (i.e., forming a SIP) should take into account 1) the information necessary to ultimately produce an AIP, and 2) any and all information that is essential to restoring the institution’s digital news content as DIPs in the event of later loss or corruption (part of the goal achieved through PDI).
The work involved in packaging content for preservation can vary widely across a spectrum of activity. At the highest end, institutions may specifically follow such packaging standards as those put forward in OAIS, using globally unique identifiers with various metadata or management tools, and packaging content in lossless packaging formats to ensure long-term stability. These OAIS concepts and information models can be a “high-bar” standard to achieve. If an institution has the resources, creating such packages for digital newspaper collections will serve the institution well in the long-term.
Curators at smaller and/or less resourced institutions can package digital news content effectively using practices that do not require comprehensive understanding of all the relevant standards. If an institution has followed the lightweight practices recommended in previous sections (including inventorying digital news content, packaging metadata, and organizing content) that institution will be ready to create sufficient “preservation ready” packages to fulfill the goal of long-term preservation. More on this under “Essential Readiness” ahead.
For well-resourced institutions, packaging digital newspapers for local preservation may include designating and applying a globally unique identifier (GUID) scheme and using a Name Assigning Authority for digital newspaper content. GUIDs (which can differ from persistent URLs and do not always rely directly on filenames) are algorithmically assigned identifiers that are unique to the items for which they are created (in this case digital newspaper objects). Once stored and indexed in relation to the objects, they can help curators to locate, access, and manage digital newspaper files in archival storage environments. In addition, GUIDs can be leveraged via the use of metadata schemas such as METS and/or PREMIS (see Section 3: “Metadata Packaging for Digital Preservation”) in conjunction with additional scripting to machine-automate many preservation management functions. Much depends on the underlying repository software systems or configurations and how they are designed to facilitate integrations for GUIDs. It is not within the scope of this document to address any particular underlying systems or approaches. Curators should consult their repository system documentation or their repository architects to determine support for GUIDs. Finally, when an institution is packaging digital newspapers to exchange with an external preservation service provider, it is important to export and include any such existing GUIDs. Note that if these identifiers have been central to a repository infrastructure, they are foundational elements if/when an institution needs to rebuild and restore collections in the future from its preservation copies.
Thinking beyond GUIDs, preservation curators (both local and external) will need ways to validate the integrity (completeness and correctness) of the objects that they receive, monitor, and manage. To that end, an authoritative record or list of all the files and their checksums should be included in the archival package, (see Section 1: “Inventorying Digital Newspapers” and Section 4: “Checksum Management for Digital Newspapers”). BagIt and its creation of per-file inventories (which include file extensions) and checksums can be one lightweight approach.
Finally, placing digital newspaper objects or collection units into lossless archival packaging formats helps to maintain their integrity over extended periods of active preservation management and storage media changes. This can provide multiple stakeholders with manageable units of data that can be traced and validated both across and between institutions.