This is a repo for course project of DD2424 Deep Learning in Data Science at KTH.
This project is a GoogLeNet Implementation of Fully Convolutional Networks for Semantic Segmentation, CVPR 2015 in TensorFlow. Another Tensorflow implementation: FCN.tensorflow.
Our project is mainly based on these previous works and we performed several changes from them. We attach our report and slides (with several introductory pages skipped for presentation) here for reference.
Model Downloads
We provide models trained on two different datasets - PASCAL VOC 2012 and MIT Scene Parsing. Please download the corresponding folder, rename it to logs
and put them in your local repo to replace the old one. For more details, please read the subsection Visualize and test results.
Detailed Origins
- The "main" function was from FCN.tensorflow/FCN.py with a little bit adaptation from slim/train_image_classifier.py with a few FLAGs and the "two-step" training procedure.
- The network took inception_v3 directly and warm-started at checkpoint.
- The upsampling layers were defined using slim.
- The utility functions came from various projects with corresponding datasets.
- The bash scripts were written for PDC with GPU acceleration nodes.
In the original paper Fully Convolutional Networks for Semantic Segmentation, the authors mentioned several results of FCN-GoogLeNet and compared them with FCN-VGG16. The results showed a worse performance of GoogLeNet than VGG16 in semantic segmentation tasks. Two things make this conclusion questionable:
- Their GoogLeNet implementation is, however, still not open-sourced though they mentioned in their repo documentation that it is coming soon.
- When the authors performed their training, they used their own reimplementation of GoogLeNet as the pre-trained model since there was no publicly available version of GoogLeNet at that time.
Given the above two points, we are quite curious about how it would perform if a public version of GoogLeNet is actually put into use, and it would also be a good practice to fill the vacancy of open-source FCN-GoogLeNet. That's basically why we make this repo.
- Pre-trained model: VGG16 -> GoogLeNet (inception v3)
- Framework: Caffe -> TensorFlow
- Datasets: PASCAL VOC 2012 (20 classes) + MIT Scene Parsing (150 classes)
- Convolutionalize GoogLeNet into FCN-GoogLeNet
- Add upsampling layers on the top
- Fuse skip layers in network
- Fine-tune whole net from end to end
- Python 3.5.0+
- TensorFlow 1.0+
- matplotlib 2.0.2+
- Cython 0.22+
- PyDenseCRF v2
First download model checkpoints (PASCAL VOC and MIT Scene Parsing) we've trained and put it in folder logs
and replace any other checkpoints if exist. Note if directory /logs/all
doesn't exist, please create it by mkdir FCN-GoogLeNet/logs/all
. Then change the flags tf.flags.DEFINE_string('mode', "visualize", "Mode: train/ test/ visualize")
at the beginning of the script inception_FCN.py
to set the mode to visulize or test results. After that, run python inception_FCN.py
from terminal to start running. The segmentation results are saved in the folder results
.
After training FCN (or downloading our models), you can launch Tensorboard by typing in tensorboard --logdir=logs/all
in terminal when you are inside the folder FCN-GoogLeNet
. Then open your web browser and navigate to localhost:6060
. Graph of pixelwise training loss and validation loss is expected to view now.
The following operations are needed if you want to train your own model from scratch.
First delete all the files in /logs
and /logs/all
. After this, you need to provide the path to a checkpoint from which to fine-tune. You can download the checkpoint of inception v3 model and correspondingly change tf.flags.DEFINE_string('checkpoint_path', '/path/to/checkpoint', 'The path to a checkpoint from which to fine-tune.')
. To avoid problems, it's better to directly copy the inception v3 model to /logs
and change the above flag to tf.flags.DEFINE_string('checkpoint_path', 'logs/inception_v3.ckpt', ...)
, although this seems to be an unclever way. To train whole net, we need two steps:
(1) Add upsampling layer on the top of inception v3; freeze lower layers and just train the output layer of the pretrained model and the upsampling layers:
To acheive this, change tf.flags.DEFINE_string('trainable_scopes', ...)
to 'InceptionV3/Logits,InceptionV3/Upsampling'
. Make sure you've set the flag of skip_layers
to the architecture you want. Set mode to train and run inception_FCN.py
. If the code is planned to run on PDC clusters, run sbatch ./batchpyjobunix.sh
to submit your job to the queuing system Slurm.
(2) Fine-tune all the variables:
Change tf.flags.DEFINE_string('trainable_scopes', ...)
to be None. Also remember to change tf.flags.DEFINE_string('checkpoint_path', ...)
to 'logs'
. Run inception_FCN.py
again. If the code is planned to run on PDC clusters, run sbatch ./batchpyjobunix.sh
to submit your job to the queuing system Slurm.
To train and test FCN on MIT Scene Parsing, two scripts should be changed manually as follow. Afterwards, you can play around with this new dataset according to the steps mentioned above.
(1) Script inception_FCN.py
:
- Import module
read_MITSceneParsingData
and comment outread_PascalVocData
(line 8-9); - Change the flag of
data_dir
to the path of MIT Scene Parsing (line 20-21); - Set variable
NUM_OF_CLASSES = 151
(line 59).
(2) Script BatchDatSetReader.py
:
- Change
...[np.expand_dims(self._transform(filename['annotation'], True), ...)
to...[np.expand_dims(self._transform(filename['annotation'], False), ...)
(line 39).
- VGG16 net outperforms GoogLeNet when dealt with semantic segmentation tasks;
- A performance drop is expected when FCN is fed with a dataset containing large number of object classes;
- Objects which have more examples in the training set, e.g. vehicles, humans, are easier to be correctly identified.
- Rewrite code so that some manual operations (like copying pretrained model, changing file path) can be avoided;
- Play around with parameters of the FCN trained on PASCAL VOC and try to find a better initialization; try to implement grid search or random search for some major parameters;
- Train FCNs on other segmentation dataset such as MS COCO.
http://techtalks.tv/talks/fully-convolutional-networks-for-semantic-segmentation/61606/
http://cs231n.github.io/convolutional-networks/#convert
This guy's Blog and his TensorFlow Image Segmentation can be useful.