This is a toy project used in the tutorial Software Engineering for ML Systems at the EibAIS 2025 school. The goal of this project is to provide a starting point for building a machine learning project using MLOps practices.
├── README.md <- The top-level README for developers using this project.
├── data
│ ├── external <- Data from third party sources.
│ ├── interim <- Intermediate data that has been transformed.
│ ├── processed <- The final, canonical data sets for modeling.
│ └── raw <- The original, immutable data dump.
│
├── models <- Trained and serialized models, model predictions, or model summaries
│
├── notebooks <- Jupyter notebooks. Naming convention is a number (for ordering),
│ the creator's initials, and a short `-` delimited description, e.g.
│ `1.0-jqp-initial-data-exploration`.
│
├── pyproject.toml <- Project configuration file with package metadata for
│ src and configuration for tools like black
│
├── references <- Data dictionaries, manuals, and all other explanatory materials.
│
├── reports <- Generated analysis as HTML, PDF, LaTeX, etc.
│ └── figures <- Generated graphics and figures to be used in reporting
│
└── src <- Source code for use in this project.
│
├── __init__.py <- Makes src a Python module
│
├── config.py <- Store useful variables and configuration
│
└── main.py <- Main script to run the project
To complete this excercise, you will need access to the following:
- A GitHub account
- A DagsHub account linked to your GitHub account
- A Hugging Face account and an access token. You can create a token by going to your profile settings and selecting "Access Tokens". Make sure to select the
writescope for the token. You can find more information on how to create a token here.
- Git
- The uv dependency manager
- Python 3.11 or later (can be installed using uv)
If it is the first time using GitHub, follow this guidelines to connect to GitHub with SSH with your local Git installation.
Note: For attendees using Windows 10/11 it is highly recommended to use the Windows Subsystem for Linux (WSL). For instructions on how to set it up see here.
- Fork this repository to your GitHub account. See this for instructions.
- Connect your forked repository to your DagsHub account. See this for instructions.
- Clone the forked repository to your local machine.
- Copy the
.env-templatefile to.envand fill in the required variables. The.envfile is used to store environment variables that are used in the project. - Install the required dependencies using
uv:uv sync
For this excercise, we will be using the Emotion Dataset from the following paper:
@inproceedings{saravia-etal-2018-carer,
title = "{CARER}: Contextualized Affect Representations for Emotion Recognition",
author = "Saravia, Elvis and
Liu, Hsien-Chi Toby and
Huang, Yen-Hao and
Wu, Junlin and
Chen, Yi-Shin",
booktitle = "Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing",
month = oct # "-" # nov,
year = "2018",
address = "Brussels, Belgium",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/D18-1404",
doi = "10.18653/v1/D18-1404",
pages = "3687--3697",
abstract = "Emotions are expressed in nuanced ways, which varies by collective or individual experiences, knowledge, and beliefs. Therefore, to understand emotion, as conveyed through text, a robust mechanism capable of capturing and modeling different linguistic nuances and phenomena is needed. We propose a semi-supervised, graph-based algorithm to produce rich structural descriptors which serve as the building blocks for constructing contextualized affect representations from text. The pattern-based representations are further enriched with word embeddings and evaluated through several emotion recognition tasks. Our experimental results demonstrate that the proposed method outperforms state-of-the-art techniques on emotion recognition tasks.",
}