Institutions with more resources can take guidance from OAIS concepts such as Preservation Description Information (PDI) as they form Submission Information Packages (SIPs) and Archival Information Packages (AIPs). These packages should include information such as:
- Reference Information that can assist the retrieval of the collection from within an archival storage environment, be that local or an external preservation service provider (globally unique identifiers can be helpful here);
- Context Information that can help to explain the relationship between a digital newspaper collection and its environment and other digital newspaper collections (METS can be helpful for defining these relationships);
- Provenance Information that can help archival managers and users understand the chain of stewardship for the collection and what sort of preservation actions have taken place over time (METS and PREMIS are increasingly being used for recording such information);
- Fixity Information that can assist with auditing a collection and its digital objects over the course of its archival management (BagIt is one simple approach to creating and storing per-file fixity with a collection); and
- Access Rights that can help curators or external preservation service providers understand what levels of access and handling are permitted for the collection as it is managed over time (generic descriptive metadata can make this evident and can also be used within METS).
As mentioned in Section 3: Metadata Packaging for Digital Newspapers, institutions with greater resources should consider implementing METS and/or PREMIS for their digital newspaper content because these standards explicitly retain linkages between associated digital newspaper objects and their metadata. METS and PREMIS can also accommodate and leverage globally unique identifiers (GUIDs) and operate as XML schemas that help with automating various archival management and access processes for any digital objects that they describe and encapsulate. Making use of METS and/or PREMIS in specification-conformant ways is an ideal way to package digital newspaper collections.
Globally unique identifiers (GUIDs) should be a priority for institutions using METS and/or PREMIS. GUIDs require a Name Assigning Authority (NAA) that can register the institution and mint a unique identifier. As mentioned above, one of the increasingly popular NAAs is the one maintained by the California Digital Library (CDL) and mirrored at the National Library of Medicine and the Bibliothèque Nationale de France. Each institution that registers is given a Name Assigning Authority Number (NAAN) that it can use as a prefix for the identifier. Archival Resource Keys (ARKs) or Handles can be used to produce GUIDs for digital newspaper collections or individual digital newspaper objects. NOID, or Nice Opaque Identifier, is a micro-service application that can assist with minting unique ARKs or Handles that can then be coupled with the institution’s NAAN.
Once an institution has assigned identifiers and associated metadata with its newspapers, the institution can encapsulate news data by applying BagIt and placing the collections in an archival format such as TAR or WARC. BagIt produces a full inventory of included files and per-file checksums that assists in validating content during exchanges and on-going audits. The best packaging model may differ across born-digital or digitized content types. For example, TAR works very well for packaging well-organized and hierarchical folders of digitized newspaper collections. WARC on the other hand is geared primarily toward packaging website oriented collections.