DC Session 4 Python

Sunoikisis Digital Classics, Spring 2020

Session 4. Introduction to programming with Python

Thursday Feb 6, 16:00 UK = 17:00 CET

Convenors: Paula Granados García (Open University & The Watercolour World), Matteo Romanello (École polytechnique fédérale de Lausanne)

YouTube link: https://youtu.be/JDxRd-RYkXA

Presentation: run the Jupyter notebooks on binder:

NB: for those who want to edit the notebooks on Binder (cloud-based platform) – for example to do the exercise – without losing their edits when the session expires (usually after 15 mins of inactivity), make sure to read this tutorial.

Session outline

This session will begin with a general discussion of programming for the humanities with an specific focus on how programming languages can be useful to humanists, followed by a general introduction to the Python programming language. We will then look at two key Python libraries (collections of code that enhance Python funtionality for specific purposes): Pandas (for structuring and analysing data), and Beautiful Soup (for parsing HTML and XML). These skills will then be illustrated with specific examples and exercises, all of which will be available for your use and adaptation in the Jupyter notebook linked from this session page.

In preparation for this session, please install or activate a version of Jupyter Notebooks on your computer or in the cloud (see below under "Exercise" for links).

Seminar readings

Kestermont, Mike & Justin A. Stover (2016), "The Authorship of the Historia Augusta: Two new computational studies." Bulletin of the Institute of Classical Studies 59.2. Pp. 140–157. Available: https://onlinelibrary.wiley.com/doi/epdf/10.1111/j.2041-5370.2016.12043.x
Romanello, Matteo (2016). "Exploring Intertextuality in Classics through Citation Networks" Digital Humanities Quarterly 10.2. Available: http://www.digitalhumanities.org/dhq/vol/10/2/000255/000255.html

Python Resources

Python Programming for the Humanities http://www.karsdorp.io/python-course/
Programming Historian: https://programminghistorian.org/en/lessons/
Python for Everybody: https://www.py4e.com/
Pandas: https://pandas.pydata.org/pandas-docs/version/0.15/tutorials.html
Charlie Harper (2018). "Visualizing Data with Bokeh and Pandas." Programming Historian. Available: https://programminghistorian.org/en/lessons/visualizing-with-bokeh
Jeri Wieringa (2012), "Intro to Beautiful Soup." Programming Historian. Available: https://programminghistorian.org/en/lessons/intro-to-beautiful-soup

Exercise

To set up your own Jupyter Notebook environment:
- Install Jupyter on your desktop (easiest as part of the Anaconda package) (getting started with Jupyter)
- Set up a Microsoft Azure Notebooks instance online (if you have 365 or Skype account)
- Set up a Google Colab notebooks instance online (if you have Gmail or Google account) (getting started with Colab)
The session notebooks can also be downloaded as a bundle and run locally using any of the above tools

Exercise description

You are asked to write a simple python program by modifying the code we provided in notebook Pandas_BeautifulSoup.ipynb, section "XML data → DataFrame"; the current code looks for <name> element and creates a DataFrame out of it. For the exercise you are asked to do something similar, but for a different set of TEI/EpiDoc elements of your choice.
These are the steps to follow:
1. to identify one or more TEI elements of interest (can be lemmata, variants, bibliographic elements, metadata, etc.);
2. to specify what information you to retain from them, and extract it from the XML (via BeautifulSoup) by modifying the code provided;
3. convert it to a pandas.DataFrame and explore some statistics (for example by using value_counts()).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly