layout	title	nav_order	has_children	has_toc	blank
default	Lessons	5	true	false	true

Pre-Processing Digitized Texts

Lesson roadmap

An overview of the textual data analysis workflow
Data provenance
Initial data analysis (IDA)
Correcting OCR errors in scanned documents with OpenRefine

Basic tokenization (to work with unstructured text)
Filtering and faceting (to target specific errors)
Regular expressions (to expand the application of error correction strategies)

Exporting your data from OpenRefine

Lesson format

The hands-on components of the lesson are available as videos with written instructions and screenshots of the video content below. You are encouraged to watch the videos and use the written sections for later reference but feel free to approach the lesson however you prefer!

Lesson prerequisites

Basic computer literacy, e.g. creating and working with different file types, navigating interfaces, using shortcut keys and so on. In the lesson, we will use a spreadsheet-like software called OpenRefine; experience with OpenRefine, of course, would be an asset! If you are very familiar with spreadsheet software, on the other hand, you will find some differences in working with OpenRefine.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lessonsPage.md

lessonsPage.md

Pre-Processing Digitized Texts

Lesson roadmap

Lesson format

Lesson prerequisites

Files

lessonsPage.md

Latest commit

History

lessonsPage.md

File metadata and controls

Pre-Processing Digitized Texts

Lesson roadmap

Lesson format

Lesson prerequisites