Skip to content

Treebanking 2: using treebanks

Monica Berti edited this page Mar 31, 2018 · 14 revisions

Sunoikisis Digital Classics Spring/Summer 2018

Session 7. Treebanking 2: using treebanks

Thursday March 8, 2018, 16h00-17h15 Greenwich Mean Time

Convenors: Dag Haug (Oslo), Francesco Mambrini (DAI, Berlin)

YouTube link: https://youtu.be/fYaUMVfCwms

Slides: Dag Haug + Francesco Mambrini

Description

We will start with an introduction to regular expressions. Then we look at use cases for treebanks and learn how to query them in the INESS tool (http://iness.uib.no) using regular expressions. In the last part of the lecture, we will discuss about methods to query the Ancient Greek and Latin Dependency Treebanks using the related technologies mentioned in the project's website and the (still experimental) gAGDT.

Seminar readings

Other resources

  • Francesco Mambrini and Marco Passarotti (2016). "Subject-Verb Agreement with Coordinated Subjects in Ancient Greek - A Treebank-Based Study", Journal of Greek Linguistics 161.1, pp. 87-116. Available: https://doi.org/10.1163/15699846-01601003

Essay title

tba

Exercise

Regular expressions:

  1. Match all inflectional forms of filia
  2. Match all participles (and as little else as possible)

INESS queries: start with the query we developed in the class and

  1. change it so that we also find pronominal subjects and objects
  2. change it so we only look at word order in complement clauses
  3. more advanced exercises in the final slide

AGDT queries: Mambrini and Passarotti 2016 (see above on "Other Resources") discuss the verb-subject number agreement with multiple coordinated subjects. The article shows that in Greek singular verbs with multiple coordinated subjects is not only allowed but numerically predominant over "resolved" (plural/dual) agreement.

How would you replicate the queries to study this issue? Hint: you need to set the following constraints:

  • A verb governing at least 2 coordinated subjects. This translates to a dependency chain of: verb > conj > subject
  • the verb must have a testable number feature: this excludes some forms; which ones?
  • The study also shows that there is a difference between "and-" and "or-" coordinators; you might want to limit the query to the most frequent "and"-conjunction (καί); bonus question: it is best to use the lemma feature even if καί is an invariable; why?

Try to replicate these queries using Structural Search and (optional) the gAGDT

Clone this wiki locally