Skip to content

Releases: CLARIAH/wp6-missieven

With entity and ent nodes

27 Mar 07:15
Compare
Choose a tag to compare

Now the entity occurrences are represented as ent nodes and these nodes have the features eid and kind for entity ID and entity kind. There are also entity nodes that collect entity occurrences with the same eid and kind.
The edge feature eoccs links entity nodes to their occurrences, the ent nodes.

So, multiword entity occurrences now corresponds to a single ent node, linked to the words the entity occupies.

The ent and entity nodes are added to the original dataset. The version of the dataset is still 1.0e.

Note that most tutorials work with version 1.0, but not version 1.0e.

If you need to work with earlier versions of the missieven, specify the version in the use command, like so:

A = use("CLARIAH/wp6-missieven", version="1.0")

This works best if you have installed Text-Fabric as

pip install --upgrade 'text-fabric[all]'

because then TF can use the GitHub API to fetch the data.

If you only work with the latest version (1.0e) this is not needed.

With entities as nodes

26 Jan 14:49
Compare
Choose a tag to compare

Now the entities are represented as nodes and these nodes have the features eid and kind for entity ID and entity kind.
So, a multiword entity occurrences now corresponds to a single entity node, linked to the words the entity occupies.

The entity nodes are added to the original dataset. The new version of the dataset is 1.0e.

Not that all tutorials work with version 1.0, but not version 1.0e.

If you need to work with earlier versions of the missieven, specify the version in the use command, like so:

A = use("CLARIAH/wp6-missieven", version="1.0")

This works best if you have installed Text-Fabric as

pip install --upgrade 'text-fabric[all]'

because then TF can use the GitHub API to fetch the data.

If you only work with the latest version (1.0e) this is not needed.

With a new entities export

12 Oct 08:47
Compare
Choose a tag to compare

Metadata added for harvesting by CLARIAH.

New version of entity annotations by Sophie Arnoult.

Note on the attachments:

No need to download them manually. They will be fetched by Text-Fabric when needed.

  • tf: the main corpus
  • voc-missives-export entity annotations as produced in cltl/voc-missives
  • voc-missives-migrated entity annotations as migrated from an earlier version in cltl/voc-missives
  • exercises-entities: results of toy example of creating entity annotations
  • exercises-numerics: results of toy example of creating other annotations

Includes volume 14

04 May 14:30
Compare
Choose a tag to compare

Volume 14 was not included so far.
It has two bands.
We converted this material from a textual pdf, produced the same kind of xml as for volumes 1-13,
and generated TF from the result.

All letters in a page, space corrections

22 Jul 15:19
Compare
Choose a tag to compare

All letters now are part of a page, also the letters that do not have a <pb> element in their text.
The space corrections by Sophie have been applied.

Spaces

17 Jun 08:17
Compare
Choose a tag to compare

Added spaces to feature punc and friends on the basis of a correction set by Sophie Arnoult.

Fixed words outside lines

20 May 14:40
Compare
Choose a tag to compare

When multiple letters occur on a single page,
the non-first letters on such pages end up with the words on the first line not wrapped in a line node.
This hinders a space-optimization in the layered search app.
Corrected.

New data version 0.7

30 Jan 14:31
Compare
Choose a tag to compare

Data version 0.7 has a different treatment of footnotes.
Before, the footnote bodies were mere feature values.
Now they occupy slots and lines themselves.

v0.6

07 Dec 11:03
Compare
Choose a tag to compare

Dataversion 0.6.

Small fixes in folio references.

There is also a simple data export of all words plus basis information.
You can use this as input for natural language tools and named entity recognition.
The data can be used to run this tools on orginal words and editorial words separately.

See
export notebook
for a detailed description and the way it is generated.

Data version 0.5

17 Nov 08:39
Compare
Choose a tag to compare

Fixed the generation of spurious newlines in footnote bodies