Environmental DH Seminar: Jim Clifford & Jacon Polay, "Solving OCR: Using olmOCR to Follow Commodities across the British World"
Wednesday 11 March 2026, 3:00pm to 4:00pm
Venue
Online, Lancaster, United Kingdom, LA1 4YD - View MapOpen to
External Organisations, Postgraduates, Public, Staff, UndergraduatesRegistration
Free to attend - registration requiredRegistration Info
https://www.eventbrite.co.uk/e/solving-ocr-using-olmocr-to-follow-commodities-across-the-british-world-tickets-1856518660289
Event Details
Join the Lancaster-Manchester Environmental DH Seminar for a presentation by Jim Clifford and Jacob Polay (University of Saskatchewan) on how recent advances in open-source Optical Character Recognition (OCR) technologies are transforming the possibilities of digital history.
Abstract
OCR has long posed a challenge for historians working with digitised archives. Irregular typefaces, the long s, uneven print quality, and low-resolution scans—particularly those derived from microfilm—have severely constrained the use of text mining and computational analysis in historical research. In this talk, Clifford works with olmOCR, an energy-efficient, low-cost, and open-source OCR system developed by Allan AI, which surpasses the performance of expensive multimodal large language models.
Clifford and his collaborators have developed a pipeline capable of downloading and processing hundreds of thousands of pages from the Internet Archive. In partnership with Canadiana.org, they are now reprocessing extensive collections to generate high-quality OCR outputs at scale. With cleaner text data, the team is constructing named entity recognition pipelines to identify and interlink people, places, and commodities, ultimately producing Linked Open Data and knowledge graphs that trace the development of extractivist commodity economies across the British World System from the 1650s to the 1960s.
This presentation will introduce the technical foundations of the project, share early findings, and reflect on how improved OCR and data infrastructures can support large-scale, open, and reproducible research in environmental and economic history.
About the Speakers
Jacob Polay is a second-year Master's student in the Department of History at the University of Saskatchewan. His thesis research focuses on adapting small open-weight language models for text mining pipelines designed to work with early modern English texts. He is the developer of Early Modern NER, an open-source named entity recognition tool for early modern English (github.com/polayj/earlymodernner).
Jim Clifford is an Associate Professor of History at the University of Saskatchewan, specializing in environmental history and digital history. His research examines the extractivism of the long nineteenth century that supplied the raw materials fuelling urban industrial growth in Britain and across the British World. He employs digital methods, including GIS, knowledge graphs, and large language models, to trace the transnational environmental transformations driven by industrialization. He is a co-editor of Historical Methods.
About the Environmental Digital Humanities Seminar (EDHS)
The Environmental Digital Humanities Seminar (EDHS) brings together scholars from across the humanities who use digital methods to understand environments past, present, and future. EDHS is inclusive of urban, rural, and suburban spaces and places, and while we explore environments globally, we also showcase local work from and about the North of England.
EDHS is supported by the N8, the Lancaster Data Science Institute, Digital Humanities Centre at Lancaster, Centre for Digital Humanities, Cultures, and Media at the University of Manchester, and the MCGIS research group at Manchester.
Organisers
Giulia Grisot (Manchester), Katherine McDonough (Lancaster), Luca Scholz (Manchester), Joanna Taylor (Manchester)
Contact Details
| Name | Katherine McDonough |