-
Notifications
You must be signed in to change notification settings - Fork 2
DC Session 7 Translation alignment
Gabriel Bodard edited this page Mar 19, 2020
·
19 revisions
Thursday Feb 27, 16:00 UK = 17:00 CET
Convenors: Chiara Palladino (Furman University), Tariq Yousef (Leipzig)
YouTube link: https://youtu.be/bEkxR0QUU2E
- Introduction: what is text alignment? (5 mins)
- Ugarit: a tool for text alignment (10 mins)
- Live demo of Ugarit (10 mins)
- Case studies: translation alignment in the classroom (10 mins)
- Applications: automatic translation alignment, graph databases, dynamic lexicon (15 mins)
- Presentation of the exercise: low stakes and high stakes (15 mins)
- Gregory Crane (2019), "Beyond Translation: Language Hacking and Philology." Harvard Data Science Review 1.2. Available: https://doi.org/10.1162/99608f92.282ad764
- Tamara Pataridze & Bastien Kindt (2018). "Text Alignment in Ancient Greek and Georgian: A Case-Study on the First Homily of Gregory of Nazianzus." Journal of Data Mining and Digital Humanities. Available: https://jdmdh.episciences.org/4182/pdf
- Bamman, D., Babeu, A. & Crane, G. (2010). "Transferring Structural Markup Across Translations Using Multilingual Alignment and Projection." Available: http://www.perseus.tufts.edu/publications/jcdl27-bamman.pdf
- Bizzoni, Y., Boschetti, F. et al. (2014). "The making of Ancient GreekWordNet." Available: http://www.lrec-conf.org/proceedings/lrec2014/pdf/1071_Paper.pdf
- Lucia Cocci (2009), "CAT Tools for Beginners." Translation Journal. Available: http://translationjournal.net/journal/50caten.htm
- Crane et al. (2019). "Confronting Complexity of Babel in a Global and Digital Age". DH2019 Book of Abstracts, https://dev.clariah.nl/files/dh2019/boa/0611.html
- Graça, João, Joana Paulo Pardal, Luísa Coheur and Diamantino Caseiro (2008). “Building a Golden Collection of Parallel Multi-Language Word Alignment.” LREC - International Conference on Language Resources and Evaluation. Available: https://www.aclweb.org/anthology/L08-1185/.
- Philipp Koehn (2009). Statistical Machine Translation, Chapter 4: "Word-Based Models." Cambridge University Press
- Rada Mihalcea & Ted Pedersen (2003). "An Evaluation Exercise for Word Alignment." Proceedings of the HLT-NAACL 2003 Workshop on Building and Using Parallel Texts, 1–10. Available: http://www.aclweb.org/anthology/W03-0301
- Despoina Panou (2013). Equivalence in Translation Theories: A Critical Evaluation, Theory and Practice. Language Studies, Vol. 3, No. 1, pp. 1-6, January 2013 (http://www.academypublication.com/issues/past/tpls/vol03/01/01.pdf)
- Raquel de Pedro (1999). "The Translatability of Texts: A Historical Overview." Meta XLIV, 4, 1999 (http://www3.uji.es/~aferna/EA0921/4a-Translatability.pdf)
- Véronis, Jean, ed. 2000. Parallel Text Processing: Alignment and Use of Translation Corpora. Text, Speech and Language Technology. Springer Netherlands. https://www.springer.com/la/book/9780792365464.
- Go on Ugarit and create a bilingual alignment of a parallel corpus of your choice (or feel free to use our suggestion: Bible parallel corpus in different languages: https://github.com/SunoikisisDC/SunoikisisDC-2019-2020/tree/master/2020-Digital-Classics-slides/Translation%20Alignment/data/txt). Choose two languages that you are familiar with and focus on the differences across translation: what words align perfectly? What words align imperfectly, or not at all? What words are missing across the two texts? What is the overall percentage of matches?
- After you have completed the bilingual alignment, choose a parallel text in a third language that you do not know and perform a trilingual alignment. See how much of the third language you can align, by using the two other languages as an aid for better understanding.
- Look at Tariq's Jupyter notebook on doing translation alignment with NLTK/Python. Edit the notebook to compare two texts of your choice, and examine the results. Report back to your class any interesting features.
- Update: Tariq has very kindly added a new Jupyter notebook that should allow you to (a) visualize the automated IBM alignments from the above exercise directly in the browser; and (b) use Ugarit visualization with already aligned sentences from the NLTK Comtrans corpus. Documentation will be added shortly. Please feel free to get in touch if you have any questions about this process.