Democratizing Information: Volunteer Transcribers Enable Access to Cultural Heritage Materials

Laurie Robinson - April 16, 2024

Crowdsourced transcriptions are making historical documents more accessible for those who use screen readers

Transcribing text to audio.

Volunteers from around the world are changing the way we interact with cultural heritage materials. Hundreds of crowdsourcing projects are harnessing the collective power of internet-connected individuals who are eager to contribute their time and skills to meticulously transcribing historical documents, letters, and diaries from libraries, archives, and museums (LAMs). These individuals are transforming inaccessible, image-based documents into searchable text, thereby unlocking vast repositories of knowledge that were previously closed to many.

Transcription makes it possible for people who use screen readers, including those who are blind or have low vision to access and engage with cultural heritage materials in meaningful ways. This commitment to accessibility not only enriches the lives of individuals but also broadens the reach and impact of LAM collections.

“People participate in crowdsourcing for a whole lot of reasons, and it can be because they enjoy the subject matter or because they just want to do something that furthers human knowledge,” says University of Maryland College of Information Studies (INFO) Assistant Professor Victoria Van Hyning. “It’s a very common motivation, a sort of altruistic motivation to help researchers of the future.”

“I really don’t think there are many other modes of engagement or interaction between cultural heritage and members of the public quite like crowdsourcing. We’re not asking you to give your stuff; we’re not asking you to deposit. We’re asking you to create, to be a part of the historical record.”

Despite the evident enthusiasm for and benefits of crowdsourcing, LAM professionals confront a myriad of technical and logistical hurdles that hinder the full realization of these projects’ potential. Through her early career IMLS-funded grant “Crowdsourced Data: Accuracy, Accessibility, Authority (CDAAA)” Van Hyning, INFO colleague Bern Jordan, and doctoral student Mace Jones, are investigating the sociotechnical barriers that LAMs face when they try to integrate crowdsourced transcriptions to their discovery platforms, and make their content discoverable and accessible.

Challenges ranging from restrictive field character lengths and ill-configured metadata fields to inefficient search functionalities that slow down or even crash systems, underscore the complexities of integrating crowdsourced data into content management systems (CMSs). These obstacles not only threaten the sustainability of volunteer efforts but also impede access to invaluable cultural resources.

The integration of crowdsourced transcriptions into CMSs also raises essential questions about the preservation of information, the authority and accuracy of volunteer-generated content, and, fundamentally, the accessibility of digital archives. Through in-depth interviews with LAM practitioners at 12 organizations, Van Hyning and her team are investigating these challenges as well as identifying successful strategies that can be replicated across institutions. By identifying and addressing the barriers to effective crowdsourced data integration, LAMs can ensure they take a critical step towards making those transcriptions accessible.

Bridging Accessibility and Expectations

Van Hyning and her team have found that although the concept of using crowdsourcing to enhance accessibility has gained significant attention in LAMs over the last five years or so, a persistent gap remains between these intentions and the real expectations and experiences of users, particularly those with low or no vision. Through intensive usability testing with 12 blind or low vision testers, the team is finding that many print-disabled users are not familiar with crowdsourced transcription resources, and have trouble navigating LAM systems to locate them.

Traditionally, visually impaired individuals have been conditioned to expect minimal accommodation from digital platforms. Their experience has been typified by encountering images labeled with minimal alt text such as “image” or, more disheartening, finding valuable resources locked behind paywalls or restricted to those with specific institutional affiliations.

Van Hyning and her team will look into ways that institutions can address these gaps, and work with print-disabled users to raise awareness of the rich resources in their collections that crowdsourcing can make more accessible. It requires more than just technological solutions; it necessitates a holistic approach combining technology with human-centered strategies. “I think, understandably, a lot of institutions have been focusing on that technology piece,” says Van Hyning. “How do we do crowdsourcing? How do we do virtual online crowdsourcing to do these transcriptions? And now this is the next job of, ‘How do we communicate the results?’”

According to Van Hyning, institutions must go beyond merely adopting crowdsourcing technologies for transcription. There’s an imperative need for effective outreach, clear messaging, and intuitive navigation within collection platforms. Even the most meticulously crowdsourced transcriptions can fail to serve their purpose if they’re not effectively integrated within the systems that people use. Her team recommends resources like specialized guides that can serve as navigational aids, informing users about the availability of accessible content and guiding them on how to leverage it.

Ultimately, the journey towards using crowdsourcing to increase accessibility is an ongoing one. It demands a commitment not only to technological innovation but also to user-centric communication and outreach. By aligning the values that drive crowdsourcing initiatives with the actual experiences and expectations of users, institutions can move closer to sharing their collections with everyone.