UWM Libraries

Document Type


Publication Date



This file includes all of the LGBTQ+ collections that were identified as part of the project, and organizes the text output as well as associated metadata in an Excel spreadsheet. Transcripts can be found in Column B, "Text."


This corpus was created as part of a project to develop workflows and best practices to use machine learning tools to extract text from archival AV materials, with a focus on the LGBTQ+ collections that are part of the UWM Archives. In addition to creating the corpus, the project also developed a prototype dashboard to demonstrate the teaching and research potential of the corpus using text analysis and engaging new modes of discovery.

Creation of this corpus was funded by the Andrew W. Mellon Foundation as part of the second cohort for Collections as Data: Part to Whole.