This public repository contains code used in: Hill, M. J. & Hengchen, S., 2018. Quantifying the impact of dirty OCR on historical text analysis: Eighteenth Century Collections Online as a case study, (Accepted/In press) In : Digital Scholarship in the Humanities : DSH.
This research is part of the Helsinki Computational History Group’s (COMHIS: https://www.helsinki.fi/en/comhis) larger project on ECCO and ESTC. We would like to thank Gale for providing our group with ECCO data. Special thanks go to Prof. Mäkelä for providing figures as well as drawing our attention to OCR accuracy estimations.