Skip to content

Releases: CLARIAH/wp6-missieven

Data version 0.4

16 Nov 15:39
Compare
Choose a tag to compare

All footnotes resolve.
Rather good dataset, still full of OCR mistakes, but structurally clean.

New data release

27 Oct 16:22
Compare
Choose a tag to compare

Footnote bodies are almost all checked and corrected (12247 in total),
footnote marks have been checked
en corrected for volumes 1-4, there remain at least (300) pages with unlinked footnotes
out of the 5270 pages that have footnotes.
Editorial text is now in the main text, on equal footing with the original letter content,
but separable from it in a number of ways.

Decent conversion to tf

13 Oct 07:56
Compare
Choose a tag to compare

The source is messy.
Lots of things have been detected and corrected, but by no means all issues got treated.

Still outstanding:

  • correcting footnote marks so that the proper footnote bodies can be linked to the proper marks
  • tables have bad TEI encodings, especially tables in landscape mode. Better OCR is available and can be manually encoded
  • words and punctuation must be separated
  • etc.

Initial commit

02 Sep 10:15
Compare
Choose a tag to compare

Everything set up, preliminary docs, provenance, acknowledgements, but not yet code.