2. Preservation Readiness: Format Management for Digital Newspapers

For more than a decade, newspapers have been digitized by a variety of institutions (libraries, commercial vendors, etc.) according to a variety of image, document and text output specifications. During this same timeframe, institutions have been acquiring “born-digital” newspaper content, including both e-prints (often through FTP or hard drive exchanges from a publisher to a library) and web-based files (often “harvested” using web-capture tools like Hereitrix or obtained via FTP exchanges). The resulting digital newspaper files come in a variety of flavors, including those typical for digitized newspapers (e.g., TIFF, PDF/A, JPEG2000, XML, etc.) and for “born-digital” newspaper contents (e.g., PDF, various image, audio and multimedia formats, HTML, XHTML, CSS, JavaScript, etc).