Amsterdam University College -- Text Mining -- Winter/Spring 2021.
You can use the Hello World notebooks to check that everything is working.
Week | Topic | Materials |
---|---|---|
1 | Introduction and Python refresher | slides + notebooks 1, 2, 3, 4, 5 |
2 | Introduction to NLP and NLP pipelines | slides + notebook |
3 | Language modelling | slides + notebooks 1, 2 |
4 | Vector space semantics | slides + notebook |
5 | Word embeddings | slides + notebook |
6 | Machine learning fundamentals and PyTorch | slides + notebook |
7 | Text classification | |
8 | Advanced architectures and NER | |
9 | Web scraping and APIs | notebook |
10 | Recommender systems | slides + notebook |
11 | Creating annotated corpora and sentiment analysis | slides + notebook |
12 | Clustering and topic modelling | slides + notebook |
13 | Trendy research topics |
See the projects folder for info.
- Clone the repository locally:
git clone https://github.com/Giovanni1085/AUC_TMCI_2021.git
- Get updates (from time to time):
git pull
- Create a conda environemnt:
conda create -n myenv python=3.7 anaconda
(wheremyenv
is the envirnoment name) - Activate it:
conda activate myenv
- Install packages (see the
requirements.txt
file), e.g.conda install pandas
- Launch a Jupyter notebook:
jupyter notebook
- More on conda enviroments
- Conda cheatsheet
- Getting started with Jupyter notebooks
- On using git and GitHub for version control
Alternatively, use Binder (link above).
A more detailed guide to setup your environment, with multiple options.
- The previous-year edition of this course.
- Michael Repplinger, who ran the 2018/19 edition and Gianluca Lebani, who ran the 2017/18 edition.
- Giovanni Colavizza and Matteo Romanello, Applied Data Analysis course for the Oxford Digitial Humanities Summer School
- James Hetherington and Giovanni Colavizza, Research Software Engineering with Python
Everything in this repository which is not already attributed to someone else is released under CC BY 4.0.