As described above, organizing collections for preservation should begin with an analysis of file-naming conventions.
File-naming conventions for digital newspapers should follow established good practices (documented below), including attending to those specific to digital news content. Examining and adjusting filenames prior to preservation action is imperative because many repository systems (both preservation-oriented and access-oriented) may refuse to handle content that does not conform to standard practices. At best, in these cases the files will not render properly. At worst, poorly named files will not be able to be ingested and preserved into a repository at all.
General good practices for folder and file naming recommend:
• Avoiding the use of special characters in a file name. \ / : * ? “ < > | [ ] & $ ,;
• Using underscores instead of periods or spaces;
• Avoiding lengthy filenames where possible;
• Including all necessary descriptive information independent of where it is stored;
• Including dates and formatting them consistently (YYYY_MM_DD or YYYYMMDD); and
• Including a version number on documents to more easily manage drafts and revisions.
Good practices* for applying consistent folder and file naming conventions to digital newspaper content more specifically include:
• Retaining any repository system-defined folder naming conventions if supplied—this can be helpful for restoring collections to those systems at a later date;
• Following a simple title, year, volume, issue, month, day schema for folder and sub-folder conventions;
• Clearly identifying the newspaper title in the file name;
• Including the year, month, and date of the issue publication in the file name;
• Including the page or article sequence number in the file name when appropriate;
• Including the corresponding newspaper section name where helpful; and
• Making sure that each file includes its accurate file extension (e.g., txt, pdf, tif, jpg2, etc.).
Depending upon the number of digital news files an institution is managing and how many of these are problematic, rectifying filename problems may be done “by hand” (on a small scale) or through the use of software tools. Such tools as those mentioned above allow for batch renaming of files, so that if there is a regular problem (e.g., a space or special character that needs to be replaced collection-wide), this can be dealt with simultaneously across a large number of files. Be sure to thoroughly test tools and batch processing prior to implementation. Wherever possible, create a copy of each collection that needs attention and work with those copies to ensure that accidental damage is not done to the originals as these file-name problems are corrected.
This renaming process, including the tools used, should be documented and this documentation should be included with the collection upon packaging (see Section 6: “Packaging Digital Newspapers for Preservation”).
After an institution addresses potential gaps in its file-naming conventions, it can begin analyzing and documenting its overall collection structures, including folder and sub-folder usage. For institutions with limited resources, this may simply mean creating a text-based document that explains the collection structures as they currently exist and what data elements, such as unique identifiers, are vital to preserving the file relationships.
* Examples of file and folder naming conventions for digital newspapers being provided are derived from analyses of digital newspaper collections as provided by the Chronicles in Preservation (2011-2014) project partners.