A Passion to Preserve: Text Corpus of AV Materials

UWM Libraries

This corpus was created as part of a project to develop workflows and best practices to use machine learning tools to extract text from archival AV materials, with a focus on the LGBTQ+ collections that are part of the UWM Archives. In addition to creating the corpus, the project also developed a prototype dashboard to demonstrate the teaching and research potential of the corpus using text analysis and engaging new modes of discovery.

Creation of this corpus was funded by the Andrew W. Mellon Foundation as part of the second cohort for Collections as Data: Part to Whole.


Collection consists of Will Fellow's A Passion to Preserve book project records. The project focused on gay men preservationists from throughout the United States. The text corpus is derived from the AV materials that are part of the collection.

The full finding aid for the Passion to Preserve Project Records, 1997-2015 can be found here: http://digital.library.wisc.edu/1711.dl/wiarchives.uw-mil-uwmmss0374