layout | title | nav_order | has_children | has_toc | blank |
---|---|---|---|---|---|
default |
Lessons |
5 |
true |
false |
true |
- An overview of the textual data analysis workflow
- Data provenance
- Initial data analysis (IDA)
- Correcting OCR errors in scanned documents with OpenRefine
- Basic tokenization (to work with unstructured text)
- Filtering and faceting (to target specific errors)
- Regular expressions (to expand the application of error correction strategies)
- Exporting your data from OpenRefine
The hands-on components of the lesson are available as videos with written instructions and screenshots of the video content below. You are encouraged to watch the videos and use the written sections for later reference but feel free to approach the lesson however you prefer!
Basic computer literacy, e.g. creating and working with different file types, navigating interfaces, using shortcut keys and so on. In the lesson, we will use a spreadsheet-like software called OpenRefine; experience with OpenRefine, of course, would be an asset! If you are very familiar with spreadsheet software, on the other hand, you will find some differences in working with OpenRefine.