This repo contains experiments comparing the accuracy of transfer-learning versus from-scratch learning and several different network architectures for image classification.
I implemented several simple CNN architectures using increasing an number of feature maps at each layer plus dropouts. I also transfer learned from the convolutional bases of the Inception-ResNet V2 and VGG16 pre-trained architectures. I experimented with flattening versus global average pooling the final convolutional layers for the pre-trained models. I experimented with the depth and regularization of the final fully connected layers sitting on top of the pre-trained bases. I experimented with "fine-tune" training the last few layers of the pre-trained models using very low learning rates. I experimented with "warm-start" continuing training from the most promising models. Finally, I experimented with applying different image augmentations (stretching, rotating, cropping) to training data set.
This project is an attempt to practice iterative, recorded, and reproducible search for optimal hyper-parameters and architecture.
The nbrun package is used to execute a base experiment notebook with different combinations of parameters specifying the model architecture and other hyper-parameters. Each time an experiment is run a copy of the notebook is saved (as .ipynb and and .html) allowing reproducibility and later reference.
A logging framework is also defined which allows logging of metrics from each experiment as well as specifications of the data generators and models. Combined with saved model weights, the logged specifications allow for reconstruction of the model pipeline to predict on new data in a new python session without re-training the models.
A plot of the training loss and accuracy is also saved from each experiment.
Example Training Plot
code/Base Experiment.ipynb - the base notebook used to run experiments.
code/build_models.py - this file defines all of the various model architectures tested.
code/build_data_gens.py - this file defines the the various data augmentation generators used.
code/saved experiments/ - this directory stores the saved copies of each experiment notebook.
Data is from this kaggle data set: https://www.kaggle.com/asdasdasasdas/garbage-classification
Data is images of five classes of recyclable material and one category of trash. A high degree of regularity in train images makes these models VERY poor at external generalization, i.e. while the models can accurately detect a image of paper from this dataset, they will do a poor job of identifying any random image of paper from the internet. That is because the images in the data set are all single items on a white background, under consistent lighting, at a uniform distance; random images from the internet will not have those same features.
See the top ten highest performing models.
Note: See code/build_models.py for the exact model configurations represented by the (non-timestamp) names in the "MODEL" column.
Note: Names in the "MODEL" column that are time stamps represent "warm-started" models which continued training from a previously trained configuration.
metrics_report.weighted avg.recall | EPOCH | MODEL | run_id |
---|---|---|---|
0.859756 | 70.0 | VGG16 Fine-tuning | 2020-02-09_21h53m57s |
0.841463 | 40.0 | 2020-02-07_01h10m05s | 2020-02-09_14h25m27s |
0.814024 | 40.0 | 2020-02-07_01h36m15s | 2020-02-09_14h07m18s |
0.807927 | 70.0 | 2020-02-08_23h29m06s | 2020-02-09_15h50m36s |
0.786585 | 100.0 | VGG16 Model | 2020-02-09_04h52m57s |
0.777439 | 40.0 | 2020-02-09_07h23m26s | 2020-02-09_17h27m39s |
0.777439 | 100.0 | Inception-ResNet V2 finetuning final-module | 2020-02-09_07h23m26s |
0.774390 | NaN | Lite Test | 2020-02-07_01h36m15s |
0.768293 | NaN | Lite Test | 2020-02-07_01h10m05s |
0.746951 | 70.0 | Inception-ResNet V2 w. Dropout Model | 2020-02-09_12h22m07s |