-
Notifications
You must be signed in to change notification settings - Fork 6
Treebanking 2: using treebanks
Thursday March 8, 2018, 16h00-17h15 Greenwich Mean Time
Convenors: Dag Haug (Oslo), Francesco Mambrini (DAI, Berlin)
YouTube link: https://youtu.be/fYaUMVfCwms
Slides: Dag Haug + Francesco Mambrini
We will start with an introduction to regular expressions. Then we look at use cases for treebanks and learn how to query them in the INESS tool (http://iness.uib.no) using regular expressions. In the last part of the lecture, we will discuss about methods to query the Ancient Greek and Latin Dependency Treebanks using the related technologies mentioned in the project's website and the (still experimental) gAGDT.
- Dag Haug (2015). "Treebanks in historical linguistic research." In Carlotta Viti (ed.), Perspectives on Historical Syntax, Benjamins, 188-202. Preprint available: http://folk.uio.no/daghaug/historical-treebanks.pdf
- D. Neel Smith (2016). "Morphological Analysis of Historical Languages." Bulletin of the Institute of Classical Studies 59.2, pp. 89-102. Available: http://onlinelibrary.wiley.com/doi/10.1111/j.2041-5370.2016.12040.x/epdf
- Francesco Mambrini and Marco Passarotti (2016). "Subject-Verb Agreement with Coordinated Subjects in Ancient Greek - A Treebank-Based Study", Journal of Greek Linguistics 161.1, pp. 87-116. Available: https://doi.org/10.1163/15699846-01601003
tba
Regular expressions:
- Match all inflectional forms of filia
- Match all participles (and as little else as possible)
INESS queries: start with the query we developed in the class and
- change it so that we also find pronominal subjects and objects
- change it so we only look at word order in complement clauses
- more advanced exercises in the final slide
AGDT queries: Mambrini and Passarotti 2016 (see above on "Other Resources") discuss the verb-subject number agreement with multiple coordinated subjects. The article shows that in Greek singular verbs with multiple coordinated subjects is not only allowed but numerically predominant over "resolved" (plural/dual) agreement.
How would you replicate the queries to study this issue? Hint: you need to set the following constraints:
- A verb governing at least 2 coordinated subjects. This translates to a dependency chain of:
verb > conj > subject
- the verb must have a testable number feature: this excludes some forms; which ones?
- The study also shows that there is a difference between "and-" and "or-" coordinators; you might want to limit the query to the most frequent "and"-conjunction (καί); bonus question: it is best to use the
lemma
feature even if καί is an invariable; why?
Try to replicate these queries using Structural Search and (optional) the gAGDT