Skip to content

Latest commit

 

History

History
30 lines (22 loc) · 1.24 KB

lessonsPage.md

File metadata and controls

30 lines (22 loc) · 1.24 KB
layout title nav_order has_children has_toc blank
default
Lessons
5
true
false
true

Pre-Processing Digitized Texts

Lesson roadmap

  1. An overview of the textual data analysis workflow
  2. Data provenance
  3. Initial data analysis (IDA)
  4. Correcting OCR errors in scanned documents with OpenRefine
  • Basic tokenization (to work with unstructured text)
  • Filtering and faceting (to target specific errors)
  • Regular expressions (to expand the application of error correction strategies)
  1. Exporting your data from OpenRefine

Lesson format

The hands-on components of the lesson are available as videos with written instructions and screenshots of the video content below. You are encouraged to watch the videos and use the written sections for later reference but feel free to approach the lesson however you prefer!

Lesson prerequisites

Basic computer literacy, e.g. creating and working with different file types, navigating interfaces, using shortcut keys and so on. In the lesson, we will use a spreadsheet-like software called OpenRefine; experience with OpenRefine, of course, would be an asset! If you are very familiar with spreadsheet software, on the other hand, you will find some differences in working with OpenRefine.