Distribution vs. Back Up

Differentiating between creating back-ups and engaging in preservation is crucial. Backup systems are for the most part non-intelligent and will merely produce a limited set of direct copies of assigned data—regardless of its sustainability or integrity. Many backup systems have the built-in tendency for overwriting older healthy copies of data with more recent (including corroded) bits if data is not being monitored proactively by a curator. Backup systems may instill a false sense of protection and security (i.e., let the machines handle it). Most backup systems do not keep multiple copies of data in sync with one another in any audited sense. Backup systems often are maintained in close proximity to master copies and by the same staff members; making it plausible that one catastrophic event or human act of malice or error could destroy or corrupt all copies.

Digital news curation—like other digital content curation—demands attention to content risks. Sound practice recommends providing for at least three copies of data that do not share a similar set of natural or man-made threats. There are distributed digital preservation (DDP) systems that perform replication across a limited geographic distance—for example one institution, with replications at multiple sites within the same city or neighboring locale or region (e.g., University of North Texas and their Coda Repository); DDP systems that support three replications of data across a large nation (e.g., Chronopolis); as well as DDP systems that provide for up to seven replications of data across continents (e.g. MetaArchive Cooperative). These are just a few examples, and ones that have explicitly benchmarked their systems for preserving digital newspaper data (see Chronicles in Preservation Project: Comparative Analysis of DDP Frameworks). It is worth noting that each of these three systems have also worked together to replicate each others’ digital collections to demonstrate the importance of replicating content across multiple heterogeneous storage infrastructures.