Optimal Readiness

For institutions with more time, resources, and expertise, command-line programs like md5sum or sha1sum will provide more flexibility regarding the application of the task and control over the output format (BagIt gives you a quick solution but has some mild dependencies on the BagIt utilities). As mentioned there are also versatile tools such as md5deep/hashdeep that can facilitate batch creation of checksums, and that provide a suite of features for comparing checksum digests.

After checksums have been created, they must be properly managed over time. An institution should store its checksums in secure locations, developing logical schemas and approaches for associating checksums to the digital newspaper objects for which they were generated, and establishing reasonable schedules and workflows for recreating checksums and comparing them back against their previously generated counterparts. This might start out as a fairly manual process but over time should elevate to an automated process—perhaps as part of a larger repository environment. Establishing regular audit schedules and enforcing these within the institution’s broader digital preservation policy can help to ensure that the practice is carried out routinely over time. Audit schedules should be logical and take into consideration the overall amount of data that needs to have checksums generated and compared at any given interval (checksum creation and comparison operations, particularly when involving large amounts of data, can be time and CPU intensive).

2 thoughts on “Optimal Readiness

    1. As you pointed out ACE does check the integrity of collections; however, it generates new hashes for any collection it manages. A potential automated solution would be a lightweight cron script that validates a set of bags with a utility like bagit.py. We’ll research this topic more for our final publication.

Comments are closed.