This repositorium was created to manage the files used in Kristjan Poska's bachelor's thesis.
The protocol texts originate from the crowdsourcing project of The National Archives of Estonia, and the manual annotations have been created in the project "Possibilities of automatic analysis of historical texts by the example of 19th-century Estonian communal court minutes". The project "Possibilities of automatic analysis of historical texts by the example of 19th-century Estonian communal court minutes" is funded by the national programme "Estonian Language and Culture in the Digital Age 2019-2027".
Named Entity Recognition in 19th Century Parish Court Protocols
Siim Orasmaa, PhD
- Python 3.7
- EstNLTK (version 1.6). The files from the estner folder of branch devel_1.6 were also used (state of files at commit ebf1451).
- Jupyter Notebooks (+ Anaconda)
The training has also been tested and works with Python 3.8 and EstNLTK version 1.6.8b.
- nervaluate 0.1.8
- pandas 1.2.0
- os, json, random, sys, re