Readiness Spectrum

Organizing digital newspaper content can vary across a wide spectrum of practice while still fulfilling the basic goal of providing future curators with the information they need to understand the structure of each digital newspaper collection. This facilitates the curator’s ability to preserve and render content reliably over time.

At the lower end of the preservation readiness spectrum, an institution may focus upon four core tasks:

  1. Identifying problems in file names that could compromise those files in the future;
  2. Using basic systems tools to perform batch renaming of these files (starting in a test-bed environment!);
  3. Documenting institutional conventions in a text-based document that can help a future curator understand the collection logic; and
  4. Updating the digital news inventory to reflect all changes.

At the higher end of the preservation readiness spectrum, institutions may also streamline file-naming and folder usage practices into one or more well-documented and unified convention(s). Institutions may then use Unix or Windows tools to remediate content according to their chosen convention(s). After completing this remediation work, institutions should always update their inventories to reflect these changes.

Organizing digital news content ultimately makes collections intelligible and recoverable in the short and near term. With that in mind, the goal of this activity is to refine and communicate collection structures, file identifications, and relationships so that curators and preservation partners can care for these collections. There are both machine-based and human-based approaches that can be taken across the readiness spectrum to achieve these goals.

Case Study: File Naming Conventions

Below are some real world examples of file name conventions that do a good job of providing title, issue, date and other unique id encodings. They include examples from both digitized and born-digital newspaper collections. They are just examples not standards.Digitized Newspaper Examples (pdf, tif & jp2)
051-AAR-1873-09-24-001-SINGLE.pdf (title code/date)
DCC_19601125-19600101_DLH_217.tif (title code/dates)
bcheights_20040406_0001.jp2 (title code/date)

Born-Digital Newspaper Examples (eprint & web)
an970607.pdf (title code/date)
morning.725.5977.html (Morning Ed/7:25am/May 9, 1977)

Born-digital eprints and web files often use the same filenames and extensions for both preservation and access copies. Make sure changes to preservation copies adhere to their current access copy filename conventions.

Case Study: Boston College

Below is an example of a digital newspaper collection organization scheme as employed by Boston College. It is just an orderly example and not a standard.

bcheights/……….(collection title folder)
2004/………(annual volume folder)
04/……(monthly volume folder)
06/…….(daily issue folder)
bcheights_20040406.pdf
bcheights_20040406_0001.jp2
bcheights_20040406_0001.xml
bcheights_20040406_0002.jp2
bcheights_20040406_0002.xml