A Decision Support System to Facilitate File Format Selection for Digital Preservation

Roman Graf, Heather M. Ryan, Tibaut Houzanme, Sergiu Gordea

DOI: http://dx.doi.org/10.15291/libellarium.v9i2.274


This paper presents a method to facilitate decision making for the preservation of digital content in libraries and archives using institutional risk profiles that highlight endangered files formats (in danger of becoming inaccessible or unusable). The primary contribution of this work is the combined use of both machine-mined data and human-expert input to select and configure institution-specific preservation risk profiles. The machine-mined data used the developed File Format Metadata Aggregator (FFMA), and the crowdsourced expert input was collected via two surveys of digital preservation practitioners. A by-product of this endeavor is the ability to visualize risk factors for analysis. The underlying decision support system used the Cosine Similarity algorithm to provide recommendations for matching risk profiles to selected institutional risk settings. This method improves the interpretability of risk factor values and the quality of a digital preservation process. The aggregated information about the risk factors is presented as a multidimensional vector that shows a particular analysis focus and its resulting impact on selected file formats. Sample risk profile calculations and the visualization of risk factor dimensions are shared in the evaluation section.


digital preservation, file format, institutional risk profiles, decision support system, information aggregation

Full Text:



R. Graf, S. Gordea, and H. Ryan. A model for format endangerment analysis using fuzzy logic. In Proceedings of the 11th International Conference on Digital Preservation (iPres2014), pages 160–168, State Library of Victoria, Melbourne, Australia, Oct 2014.

P. Ayris, R. Davies, R. McLeod, R. Miao, H. Shenton, and P. Wheatley. The life2 final project report. Final project report, LIFE Project, London, UK, 2008.

J. Rotenberg. Ensuring the Longevity of Digital Information: (An expanded version of the article “Ensuring the Longevity of Digital Documents” that appeared in the January 1995 edition of Scientific American (Vol. 272, Number 1, pp. 42-7). Accessed 6/1/2016: URL: http://www.clir.org/pubs/archives/ensuring.pdf

R. Graf and S. Gordea. Aggregating a knowledge base of file formats from linked open data. Proceedings of the 9th International Conference on Preservation of Digital Objects, poster: 292–293, October 2012.

H. Ryan, R. Graf, and G. Sergiu. Human and machine-based file format endangerment notification and recommender systems development. In Proceedings of the 12th International Conference on Digital Preservation (iPres2015), Chapel Hill, North Carolina, USA, Nov 2015. UNC.

W. Michael et al. Constructing Data Curation Profiles. In The International Journal of Digital Curation Issue 3, Volume 4 | December 2009, pp 93-103. Accessed 6/1/16. URL: http://ijdc.net/index.php/ijdc/article/view/137/165

The Library of Congress. Digital Accession Metadata Profile for use in Bagger (Json files): https://github.com/LibraryOfCongress/bagger/tree/master/bagger-business/src/main/resources/gov/loc/repository/bagger/profiles. More details can be had in Bagger’s Enhancement for Digital Accessions (LOC blog) and Indiana Archives and Records Administration’s Accession Profile Use in Bagger (SAA blog).

P. Bantin, (Editor). (2016-forthcoming). Selecting an Integrated Records and Preservation Management System for the Indiana Archives and Records Administration. By Jim Corridan & Tibaut Houzanme. in Building Trustworthy Digital Repositories. Lanham, MD: Rowman & Littlefield.

DOI: http://dx.doi.org/10.15291/libellarium.v9i2.274

Article Metrics

Metrics Loading ...

Metrics powered by PLOS ALM


  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

Libellarium (Online). ISSN 1846-9213 © 2008


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.