This repository contains the code of our group project for the course Deep Learning (AS22 ETH Zürich).
We use the causal framework from ReLIC (Mitrovic et al., 2020) to modify the patch localization pretext task (Doersch et al., 2015). Given a center patch and 1 of its 8 neighboring patches, we generate two patch localization tasks by applying two random style augmentations to the neighboring patch. In addition to minimizing the cross entropy losses from the localization tasks, we also minimize the KL divergence between the two output probability distributions to enforce style invariance in the embedding network.
- Elior Ben Arous
- Dustin Brunner
- Jonathan Manz
- Felix Yang
- Create an account and log into the ImageNet website.
- Download the following samples and labels. Due to computational reasons, we use the ImageNet validation set as our dataset. Note: The labels are only used to get stratified subsets of the validation set with respect to the image classes such that we get a balanced dataset when experimenting with fewer samples.
- Unzip the samples and labels and put both folders into the
data
directory. All images should be located at./data/ILSVRC2012_img_val/*.JPEG
.
- Download and unzip the following dataset.
- Put the directory again into the
data
directory. It should be located at./data/tiny-imagenet-200
.
- Install miniconda (installation instructions).
- Create new environment named
ESIPL
with python and activate it.conda create -n ESIPL python
conda activate ESIPL
- Install requirements listed in requirements.txt in the root directory of the repository.
pip install -r requirements.txt
- Optional: Install additional packages in the environment such as Jupyter Notebook or JupyterLab (installation instructions).
Follow these steps to train our final model and to reproduce our results from the downstream task:
- Follow the setup instructions above.
- Run
python run_pretext_script.py
to train our final pretext task model. All the best parameters are already set. Feel free to specify some pretext experiment ID (not necessary). Note: The training takes about 8.5 hours on the Euler cluster. - Run
python run_downstream_script.py
to evaluate the trained pretext task model on the downstream task. If you changed the pretext experiment ID in the previous step, you need to use the same pretext ID in this script!
run_pretext_script.py
: Script to run pretext task experiments.run_downstream_script.py
: Script to run downstream task experiments.optuna_pretext_script.py
: Script to find good hyperparameters for the pretext tasks using Optuna.optuna_downstream_script.py
: Script to find good hyperparameters for the downstream tasks using Optuna.
StyleAugmentations.ipynb
: Visualizes the various transformations and augmentations we apply to the ImageNet images.EmbeddingVisualization.ipynb
: Analyses, compares and visualizes the embedding spaces of the original method and our method.HyperparameterOptimization.ipynb
: Explains how to optimize hyperparameters using Optuna.
src/dataset.py
: Our custom dataset classes for the pretext and downstream tasks.src/loss.py
: Custom loss function for our proposed pretext tasks.src/models.py
: Our pretext and downstream models with support for different backbones.src/optuna.py
: Functions to find good hyperparameters with Optuna.src/train.py
: Training loop for both pretext and downstream tasks.src/transforms.py
: Image transformations and augmentations.src/utils.py
: Logging, model saving, checkpointing, plotting, etc.
We wrote most of the code ourselves from scratch. We thereby used this repository by Microsoft as a guide for the structure of our training loop and for the logging of the training progress.