Digital Methods for Analysing Texts

This course familiarises PhD students with the main text mining techniques in social science and develops basic skills in digital methods. After completion you are familiar with the theoretical and methodological underpinnings of natural language processing perspective and are able to conduct a basic text analysis. Throughout the course we will focus on applying text analysis to empirical data, where possible related to the students own research.

Students will become familiar with digital methods in text analysis as a flexible approach that comes with a practical set of research instruments to empirically investigate a range of questions in social science. They will learn how to approach and manage text data, analyse texts, and visualize this information.

Course schedule

Session date	Session	Lecture Topic	Seminar topic
12 April	1	Introduction to text mining	Import text data
14 April	2	Analysing text	Methods for text preprocessing
19 April	3	Analysing words	Methods for word analysis
21 April	4	Topic modelling	Methods for analysing topics
26 April	5	NLP Ethics + live coding	Biases and real example analysis
28 April	6	Text mining in the real world	Analysing your own text

Eligibility

You must be a PhD student at King’s, Queen Mary or Imperial, and you must have already registered as a LISS DTP student via the following link: https://www.liss-dtp.ac.uk/registration/.

Reading list

1. Intro

📝 Turing, A.M. and Haugeland, J., 1950. Computing machinery and intelligence. The Turing Test: Verbal Behavior as the Hallmark of Intelligence, pp.29-56.

📝 Weizenbaum, J., 1966. ELIZA—a computer program for the study of natural language communication between man and machine. Communications of the ACM, 9(1), pp.36-45.

📝 Hutchins, W.J., 2004, September. The Georgetown-IBM experiment demonstrated in January 1954. In Conference of the Association for Machine Translation in the Americas (pp. 102-114). Springer, Berlin, Heidelberg.

🌍 https://www.ibm.com/ibm/history/exhibits/701/701_translator.html

📝 Bender, E.M., Hovy, D. and Schofield, A., 2020, July. Integrating ethics into the NLP curriculum. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts (pp. 6-9).

2. Analysing Text

📝 Friedl, J.E., 2006. Mastering regular expressions. " O'Reilly Media, Inc.". [Introduction]

📝 Anandarajan, M., Hill, C. and Nolan, T., 2019. Term-document representation. In Practical text analytics (pp. 61-73). Springer, Cham. [Chapter 4 and 5]

📝 Bird, S., Klein, E. and Loper, E., 2009. Natural language processing with Python: analyzing text with the natural language toolkit. " O'Reilly Media, Inc.". [Chapter 3 and 7]

3. Analysing words

📝 Mikolov, T., Chen, K., Corrado, G. and Dean, J., 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.

📝 Rong, X., 2014. word2vec parameter learning explained. arXiv preprint arXiv:1411.2738.

📝 Bird, S., Klein, E. and Loper, E., 2009. Natural language processing with Python: analyzing text with the natural language toolkit. " O'Reilly Media, Inc.". [Chapter 9]

4. Topic modelling

📝 Blei, D.M., Ng, A.Y. and Jordan, M.I., 2003. Latent dirichlet allocation. Journal of machine Learning research, 3(Jan), pp.993-1022.

📝 Anandarajan, M., Hill, C. and Nolan, T., 2019. Term-document representation. In Practical text analytics (pp. 61-73). Springer, Cham. [Chapter 7]

📝 Bird, S., Klein, E. and Loper, E., 2009. Natural language processing with Python: analyzing text with the natural language toolkit. " O'Reilly Media, Inc.". [Chapter 6]

5. NLP Ethics

📝 Bender, E.M., Gebru, T., McMillan-Major, A. and Shmitchell, S., 2021, March. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?🦜. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (pp. 610-623).

📝 Bolukbasi, T., Chang, K.W., Zou, J.Y., Saligrama, V. and Kalai, A.T., 2016. Man is to computer programmer as woman is to homemaker? debiasing word embeddings. Advances in neural information processing systems, 29.

📝 Caliskan, A., Bryson, J.J. and Narayanan, A., 2017. Semantics derived automatically from language corpora contain human-like biases. Science, 356(6334), pp.183-186.

📝 Garg, N., Schiebinger, L., Jurafsky, D. and Zou, J., 2018. Word embeddings quantify 100 years of gender and ethnic stereotypes. Proceedings of the National Academy of Sciences, 115(16), pp.E3635-E3644.

6. Text mining in the real world

📝 Anandarajan, M., Hill, C. and Nolan, T., 2019. Term-document representation. In Practical text analytics (pp. 61-73). Springer, Cham. [Chapter 12]

📝 Bird, S., Klein, E. and Loper, E., 2009. Natural language processing with Python: analyzing text with the natural language toolkit. " O'Reilly Media, Inc.". [Chapter 11].

General Bibliography

📕 Bengfort, B., Bilbro, R., & Ojeda, T. (2018). Applied text analysis with python: Enabling language-aware data products with machine learning. O'Reilly Media, Inc.

📕 Bird, S., Klein, E., & Loper, E. (2009). Natural language processing with Python: analyzing text with the natural language toolkit. " O'Reilly Media, Inc.".

📕 Eisenstein, J. (2018). Natural language processing.

📕 Hovy, D. (2020). Text Analysis in Python for Social Scientists: Discovery and Exploration. Cambridge University Press.

📕 Manning, C., & Schutze, H. (1999). Foundations of statistical natural language processing. MIT press.

🌍 https://course.spacy.io

🌍 https://www.nltk.org

Course featured by:

Name		Name	Last commit message	Last commit date
Latest commit History 93 Commits
data		data
notebooks		notebooks
slides		slides
LICENSE		LICENSE
README.md		README.md
apt.txt		apt.txt
liss-dtp-logo-banner-613-x-613.gif		liss-dtp-logo-banner-613-x-613.gif
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Digital Methods for Analysing Texts

Course schedule

Eligibility

Reading list

1. Intro

2. Analysing Text

3. Analysing words

4. Topic modelling

5. NLP Ethics

6. Text mining in the real world

General Bibliography

About

Releases

Packages

Languages

License

anavaldi/digital-methods-text-mining

Folders and files

Latest commit

History

Repository files navigation

Digital Methods for Analysing Texts

Course schedule

Eligibility

Reading list

1. Intro

2. Analysing Text

3. Analysing words

4. Topic modelling

5. NLP Ethics

6. Text mining in the real world

General Bibliography

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages