You must be signed in to change notification settings - Fork 6
Treebanking 1: morphosyntactic annotation
Thursday March 1, 2018, 16h00-17h15 Greenwich Mean Time
Convenors: Polina Yordanova (Sofia & London), Marja Vierros (Helsinki)
YouTube link: https://youtu.be/nmPx7pl-36Q
Slides: Slides PDF
This lecture gives an introduction to morphosyntactic annotation of Ancient Greek and Latin according to the Dependency Grammar. We will demonstrate how to annotate Greek and Latin texts with the Arethusa tool in the Perseids platform, and also show the Sematia tool for the parallel annotation of diplomatic and restored papyrological texts.
- Introduction to Treebanking (10 minutes, MV)
- Treebanking with Arethusa - annotation demo, Greek (35 minutes, PY)
- Sematia - annotating papyri and ostraca, Latin (15 minutes, MV)
- Mambrini, F. (2016). "The Ancient Greek Dependency Treebank: Linguistic Annotation in a Teaching Environment." In Romanello M. & Bodard G, Digital Classics Outside the Echo-Chamber. London: Ubiquity Press. Available: https://doi.org/10.5334/bat.f
- Nicola Reggiani (2017). Digital Papyrology I: Methods, Tools and Trends. (Berlin/Boston: De Gruyter), ch. 7.1, "Quantitative Analysis of Textual Data: Past and Future of Computational Linguistics Applied to Papyrology". Available: https://www.degruyter.com/downloadpdf/books/9783110547474/9783110547474-007/9783110547474-007.pdf ; read pp. 178–189
- Universal Dependencies: http://universaldependencies.github.io/docs/#language-en (In particular: Introduction and Syntax: General Principles)
- Celano, Giuseppe G. A. 2014. Guidelines for the annotation of the Ancient Greek Dependency Treebank 2.0. https://github.com/PerseusDL/treebank_data/edit/master/AGDT2/guidelines (only Chapter 3, including analysis of the hyperlinked examples)
- Bamman David & al. 2008. Guidelines for the Syntactic Annotation of Latin Treebanks (v. 1.3). http://nlp.perseus.tufts.edu/syntax/treebank/1.3/docs/guidelines.pdf (only p. 3-21; 24; 26)
- Celano, Giuseppe G.A., Gregory Crane, Saeed Majidi. 2016. Part of Speech Tagging for Ancient Greek. Available: https://www.degruyter.com/view/j/opli.2016.2.issue-1/opli-2016-0020/opli-2016-0020.xml
- Vierros, Marja and Erik Henriksson. 2017. "Preprocessing Greek Papyri for Linguistic Annotation." Journal of Data Mining and Digital Humanities. Special Issue on Computer-Aided Processing of Intertextuality in Ancient Languages, edited by Marco Büchler and Laurence Mellerin. Available: http://jdmdh.episciences.org/paper/view/id/1385
Discuss what benefits treebanking of ancient texts can have from the points of view of the annotator (student and/or researcher) and the research community. Consider pedagogical value, attention to detail and understanding dependency grammar in relation to constituency grammar and the different ways grammar is taught in the classrooms.
- Ancient Greek: https://github.com/PerseusDL/treebank_data/blob/master/AGDT2/guidelines/Greek_guidelines.md
- Latin: https://github.com/PerseusDL/treebank_data/blob/master/v1/latin/docs/guidelines.pdf
Create an account in Perseids http://sosol.perseids.org/sosol/, log-in and create a new treebank annotation.
Example sentences:
- Greek:
- τῇ δὲ ὑστεραίᾳ ἐπορεύοντο διὰ τοῦ πεδίου καὶ Τισσαφέρνης εἵπετο.
- ἦν τις ἐν τῇ στρατιᾷ Ξενοφῶν Ἀθηναῖος , ὃς οὔτε στρατηγὸς οὔτε στρατιώτης ὢν συνηκολούθει.
- “ Ἀλλ' , ὦ οὗτος , σοί γε , κἂν θῦλαξ γένῃ , οὐ προσελεύσομαι ”.
- προσέταξεν δὲ καὶ τοὺς καταπορευομένους ἔκ τε τῶν μαχίμων καὶ τῶν ἄλλων τῶν ἀλλότρια φρονησάντων ἐν τοῖς κατὰ τὴν ταραχὴν καιροῖς κατελθόντας μένειν ἐπὶ τῶν ἰδίων κτήσεων.
- Homework:
Ἀλώπηξ καὶ πίθηκος βασιλεὺς αἱρεθείς. Ἐν συνόδῳ τῶν ἀλόγων ζῴων πίθηκος ὀρχησάμενος καὶ εὐδοκιμήσας βασιλεὺς ὑπ' αὐτῶν ἐχειροτονήθη. Ἀλώπηξ δὲ αὐτῷ φθονήσασα, ὡς ἐθεάσατο ἔν τινι πάγῃ κρέας κείμενον, ἀγαγοῦσα αὐτὸν ἐνταῦθα ἔλεγεν ὡς εὑροῦσα θησαυρὸν αὐτὴ μὲν οὐκ ἐχρήσατο, γέρας δὲ αὐτῷ τῆς βασιλείας τετήρηκε, καὶ παρῄνει αὐτῷ λαμβάνειν. Τοῦ δὲ ἀτημελήτως ἐπελθόντος καὶ ὑπὸ τῆς πάγης συλληφθέντος, αἰτιωμένου τε τὴν ἀλώπεκα ὡς ἐνεδρεύσασαν αὐτῷ, ἐκείνη ἔφη· " Ὦ πίθηκε, σὺ δὲ τοιαύτην μωρίαν ἔχων τῶν ἀλόγων ζῴων βασιλεύεις;" Οὕτως οἱ τοῖς πράγμασιν ἀπερισκέπτως ἐπιχειροῦντες ἐπὶ τῷ δυστυχεῖν καὶ γέλωτα ὀφλισκάνουσιν.
Ad rivum eundem lupus et agnus venerant siti compulsi; superior stabat lupus longeque inferior agnus. Tunc fauce improba latro incitatus iurgii causam intulit. Cur, inquit, turbulentam fecisti mihi aquam bibenti? Laniger contra timens: Qui possum, quaeso, facere, quod quereris, lupe? A te decurrit ad meos haustus liquor. Repulsus ille veritatis viribus: Ante hos sex menses male, ait, dixisti mihi. Respondit agnus: Equidem natus non eram. Pater hercle tuus, ille inquit, male dixit mihi. Atque ita correptum lacerat iniusta nece. Haec propter illos scripta est homines fabula, qui fictis causis innocentes opprimunt.
By thirst incited, to the brook The Wolf and Lamb themselves betook. The Wolf high up the current drank, The Lamb far lower down the bank. Then, bent his ravenous maw to cram, The Wolf took umbrage at the Lamb. "How dare you trouble all the flood, And mingle my good drink with mud?" "Sir," says the Lambkin, sore afraid, "How should I act, as you upbraid? The thing you mention cannot be, The stream descends from you to me." Abash'd by facts, says he, " I know 'Tis now exact six months ago You strove my honest fame to blot"- "Six months ago, sir, I was not." "Then 'twas th' old ram thy sire," he cried, And so he tore him, till he died. To those this fable I address Who are determined to oppress, And trump up any false pretence, But they will injure innocence. (Translation of Phaedrus 1.1. by Christopher Smart, A. M., 1913; The Wolf and the Lamb. You may want to modernise the English first and/or annotate syntax only)
53075 - example sentences
44899 - Aesop's fable treebanked
More texts at: http://cts.perseids.org/
SEMATIA ENTRY AND ANNOTATION - Optional exercise, where you can contribute to the scholarly community
Choose a Greek or Latin papyrus, ostracon, or tablet (any of your liking or from the lists provided below) which is available in the Papyrological Navigator (http://papyri.info). It is preferred that you have the possibility to check also the original edition (translations also help), you can ask your tutor/Marja for a scan). Then add the papyrus to Sematia portal as instructed in class, add writer metadata if possible, and then annotate both layers in the Arethusa annotation environment according to the Treebanking Guidelines (you need to have a Perseids account set up already at http://sosol.perseids.org/sosol/). When Standard and Original layers are identical, you can use copy-paste in the xml, but remember to consider all the differences between the layers. Then submit the annotations to the Sematia board. If you are new to Treebanking, you might want to start with a relatively short and not-so-fragmentary papyrus. (NB. several people cannot edit the same text in Sematia)
- bgu.1.100 = Trismegistos 8875 (letter from Komon to Pekysis)
- bgu.3.701 = Trismegistos 28077 (letter to the sitologoi od Karanis)
- p.cair.zen.2.59224 = Trismegistos 869 (fragment of a letter to Zenon from Teos)
- p.col.3.10 = Trismegistos 1731 (letter from Mnasistratos to Zenon)
- p.col.3.16 = Trismegistos 1736 (letter of Charoppas to Zenon about vine crop survey)
- p.mich.1.6 = Trismegistos 1912 (letter from Sostratos to Zenon)
- p.mich.1.97 = Trismegistos 1996 (memorandum to Zenon)
- p.mich.mchl.23 = Trismegistos 40933 (letter from Nilos to Nemesion)
- p.oslo.2.50 = Trismegistos 25904 (letter from Severus to Limnaios)
- p.zen.pestm.67 = Trismegistos 1898 (letter of introduction to Zenon)
- c.ep.lat.242 = Trismegistos 17309 (official letter)
- chla.4.269 = Trismegistos 20419 (request for a guardian)
- c.ep.lat.177 = Trismegistos 21149 (letter of recommendation)
- c. ep.lat.219 = Trismegistos 29894 (private letter, fragmentary)
- chla.6.315 = Trismegistos 44782 (lettres circulaires)
- o.berenike.2.123 = Trismegistos 89148 (remains of a letter of recommendation)
In https://sematia.hum.helsinki.fi/tools/, you can perform simple queries from treebanked papyri and ostraca (from the limited corpus that has been annotated at the moment). The search fields use Regular expressions. For the exercise, think about how you could find the following:
- different case used in the original text than what the editor suggests to be the standard
- some linguistic structure, like the genitive absolute or accusative with infinitive
- certain words in certain functions, e.g. the verb 'send' as an object (or the head of a subordinate clause acting as an object)
Test them!