Skip to content

Commit

Permalink
new data release
Browse files Browse the repository at this point in the history
  • Loading branch information
dirkroorda committed Oct 27, 2020
1 parent d61a642 commit f21ce14
Show file tree
Hide file tree
Showing 3 changed files with 17 additions and 5 deletions.
10 changes: 9 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,13 @@ Status

This is **work in progress!**

* 2020-10-13 A new TF version (0.3) has been delivered
Footnote bodies are almost all checked and corrected (12247 in total),
footnote marks have been checked
en corrected for volumes 1-4, there remain at least (300) pages with unlinked footnotes
out of the 5270 pages that have footnotes.
Editorial text is now in the main text, on equal footing with the original letter content,
but separable from it in a number of ways.
* 2020-10-13 A new TF version (0.2) has been delivered, and there is now a TF-app
[missieven](https://github.com/annotation/app-missieven) for this corpus.
That means that functions like the Text-Fabric browser and easy downloading of data are supported.
Expand All @@ -27,7 +34,8 @@ This is **work in progress!**
and some pages that are altoghether missing.
See [trimTei0.py](https://github.com/Dans-labs/clariah-gm/blob/master/programs/trimTei0.py) where some of those
pages have already been added.
* 2020-10-07 Many checks have been performed, many structural corrections w.r.t the TEI source have been performed,
* 2020-10-07 Many checks have been performed, many structural corrections
w.r.t the TEI source have been performed,
the metadata of all metadata has been thoroughly checked and corrected.
See the reports in
[trimreport2](trimreport2).
Expand Down
10 changes: 7 additions & 3 deletions docs/transcription.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,18 +41,22 @@ Whitespace will be normalized to single spaces or newlines.

These are the words of the corpus, the basic units, a.k.a *slots*.

Only the letter contents are stored word by word.
Editorial remarks are stored in bigger chunks, as values of features.
Only the original letter contents and the editorial remarks are stored word by word.
The footnotes are stored one by one, as values of the feature `fnote`, see below.

feature | type | description
--- | --- | ---
emph | 1 or absent | whether the word is set in emphatic typography
folio | string | an indication of an original folio at this point
punc | string | punctuation and or white space after the word
remark | string | an editorial remark at this point
punco | string | as `punc`, but only for original letter content
puncr | string | as `punc`, but only for editorial content
remark | 1 or absent | whether the word belongs to editorial content
super | 1 or absent | whether the word is in superscript, possibly the numerator of a fraction
special | 1 or absent | whether the word has extreme typography or a strange value (possibly OCR effects)
trans | string | the value of the word
transo | string | as `trans`, but only for original letter content
transr | string | as `trans`, but only for editorial content
und | 1 or absent | whether the word is underlined, possibly the total amount in a calculation

## Additonal annotations
Expand Down
2 changes: 1 addition & 1 deletion xml/02/p0673.xml
Original file line number Diff line number Diff line change
Expand Up @@ -1192,7 +1192,7 @@ doch de quaetwillicheyt van den Sousouhounangh ende de dierte van den rijs op<lb
Batavia heeft ons genootdruct dat werek te verhaesten.<lb/>
</para>
<remark>«Zending van schepen naar AraJcan,
<ref>Daghregisters, 31 </ref>aug ., p. 122, met cargasoen<lb/>
<ref>Daghregisters, 31 aug </ref>., p. 122, met cargasoen<lb/>
van f. 22776.12.10; de Leeuwarden komt 26 febr. van Siam, ibid., p. 18-19; 22 maart<lb/>
de Gecroonde Liefde, 31 mei de Witte Valck, p. 74; klachten van den koopman Westerwolt<lb/>
aldaar;<lb/>
Expand Down

0 comments on commit f21ce14

Please sign in to comment.