+
+## Contacts (Maintainers)
+
+* Liang-Chieh Chen, github: [aquariusjay](https://github.com/aquariusjay)
+* YuKun Zhu, github: [yknzhu](https://github.com/YknZhu)
+* George Papandreou, github: [gpapan](https://github.com/gpapan)
+* Hui Hui, github: [huihui-personal](https://github.com/huihui-personal)
+* Maxwell D. Collins, github: [mcollinswisc](https://github.com/mcollinswisc)
+* Ting Liu: github: [tingliu](https://github.com/tingliu)
+
+## Tables of Contents
+
+Demo:
+
+* Colab notebook for off-the-shelf inference.
+
+Running:
+
+* Installation.
+* Running DeepLab on PASCAL VOC 2012 semantic segmentation dataset.
+* Running DeepLab on Cityscapes semantic segmentation dataset.
+* Running DeepLab on ADE20K semantic segmentation dataset.
+
+Models:
+
+* Checkpoints and frozen inference graphs.
+
+Misc:
+
+* Please check FAQ if you have some questions before reporting the issues.
+
+## Getting Help
+
+To get help with issues you may encounter while using the DeepLab Tensorflow
+implementation, create a new question on
+[StackOverflow](https://stackoverflow.com/) with the tag "tensorflow".
+
+Please report bugs (i.e., broken code, not usage questions) to the
+tensorflow/models GitHub [issue
+tracker](https://github.com/tensorflow/models/issues), prefixing the issue name
+with "deeplab".
+
+## License
+
+All the codes in deeplab folder is covered by the [LICENSE](https://github.com/tensorflow/models/blob/master/LICENSE)
+under tensorflow/models. Please refer to the LICENSE for details.
+
+## Change Logs
+
+### March 26, 2020
+* Supported EdgeTPU-DeepLab and EdgeTPU-DeepLab-slim on Cityscapes.
+**Contributor**: Yun Long.
+
+### November 20, 2019
+* Supported MobileNetV3 large and small model variants on Cityscapes.
+**Contributor**: Yukun Zhu.
+
+
+### March 27, 2019
+
+* Supported using different loss weights on different classes during training.
+**Contributor**: Yuwei Yang.
+
+
+### March 26, 2019
+
+* Supported ResNet-v1-18. **Contributor**: Michalis Raptis.
+
+
+### March 6, 2019
+
+* Released the evaluation code (under the `evaluation` folder) for image
+parsing, a.k.a. panoptic segmentation. In particular, the released code supports
+evaluating the parsing results in terms of both the parsing covering and
+panoptic quality metrics. **Contributors**: Maxwell Collins and Ting Liu.
+
+
+### February 6, 2019
+
+* Updated decoder module to exploit multiple low-level features with different
+output_strides.
+
+### December 3, 2018
+
+* Released the MobileNet-v2 checkpoint on ADE20K.
+
+
+### November 19, 2018
+
+* Supported NAS architecture for feature extraction. **Contributor**: Chenxi Liu.
+
+* Supported hard pixel mining during training.
+
+
+### October 1, 2018
+
+* Released MobileNet-v2 depth-multiplier = 0.5 COCO-pretrained checkpoints on
+PASCAL VOC 2012, and Xception-65 COCO pretrained checkpoint (i.e., no PASCAL
+pretrained).
+
+
+### September 5, 2018
+
+* Released Cityscapes pretrained checkpoints with found best dense prediction cell.
+
+
+### May 26, 2018
+
+* Updated ADE20K pretrained checkpoint.
+
+
+### May 18, 2018
+* Added builders for ResNet-v1 and Xception model variants.
+* Added ADE20K support, including colormap and pretrained Xception_65 checkpoint.
+* Fixed a bug on using non-default depth_multiplier for MobileNet-v2.
+
+
+### March 22, 2018
+
+* Released checkpoints using MobileNet-V2 as network backbone and pretrained on
+PASCAL VOC 2012 and Cityscapes.
+
+
+### March 5, 2018
+
+* First release of DeepLab in TensorFlow including deeper Xception network
+backbone. Included chekcpoints that have been pretrained on PASCAL VOC 2012
+and Cityscapes.
+
+## References
+
+1. **Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs**
+ Liang-Chieh Chen+, George Papandreou+, Iasonas Kokkinos, Kevin Murphy, Alan L. Yuille (+ equal
+ contribution).
+ [[link]](https://arxiv.org/abs/1412.7062). In ICLR, 2015.
+
+2. **DeepLab: Semantic Image Segmentation with Deep Convolutional Nets,**
+ **Atrous Convolution, and Fully Connected CRFs**
+ Liang-Chieh Chen+, George Papandreou+, Iasonas Kokkinos, Kevin Murphy, and Alan L Yuille (+ equal
+ contribution).
+ [[link]](http://arxiv.org/abs/1606.00915). TPAMI 2017.
+
+3. **Rethinking Atrous Convolution for Semantic Image Segmentation**
+ Liang-Chieh Chen, George Papandreou, Florian Schroff, Hartwig Adam.
+ [[link]](http://arxiv.org/abs/1706.05587). arXiv: 1706.05587, 2017.
+
+4. **Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation**
+ Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, Hartwig Adam.
+ [[link]](https://arxiv.org/abs/1802.02611). In ECCV, 2018.
+
+5. **ParseNet: Looking Wider to See Better**
+ Wei Liu, Andrew Rabinovich, Alexander C Berg
+ [[link]](https://arxiv.org/abs/1506.04579). arXiv:1506.04579, 2015.
+
+6. **Pyramid Scene Parsing Network**
+ Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, Jiaya Jia
+ [[link]](https://arxiv.org/abs/1612.01105). In CVPR, 2017.
+
+7. **Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate shift**
+ Sergey Ioffe, Christian Szegedy
+ [[link]](https://arxiv.org/abs/1502.03167). In ICML, 2015.
+
+8. **MobileNetV2: Inverted Residuals and Linear Bottlenecks**
+ Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, Liang-Chieh Chen
+ [[link]](https://arxiv.org/abs/1801.04381). In CVPR, 2018.
+
+9. **Xception: Deep Learning with Depthwise Separable Convolutions**
+ François Chollet
+ [[link]](https://arxiv.org/abs/1610.02357). In CVPR, 2017.
+
+10. **Deformable Convolutional Networks -- COCO Detection and Segmentation Challenge 2017 Entry**
+ Haozhi Qi, Zheng Zhang, Bin Xiao, Han Hu, Bowen Cheng, Yichen Wei, Jifeng Dai
+ [[link]](http://presentations.cocodataset.org/COCO17-Detect-MSRA.pdf). ICCV COCO Challenge
+ Workshop, 2017.
+
+11. **Tensorflow: Large-Scale Machine Learning on Heterogeneous Distributed Systems**
+ M. Abadi, A. Agarwal, et al.
+ [[link]](https://arxiv.org/abs/1603.04467). arXiv:1603.04467, 2016.
+
+12. **The Pascal Visual Object Classes Challenge – A Retrospective,**
+ Mark Everingham, S. M. Ali Eslami, Luc Van Gool, Christopher K. I. Williams, John
+ Winn, and Andrew Zisserma.
+ [[link]](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/). IJCV, 2014.
+
+13. **The Cityscapes Dataset for Semantic Urban Scene Understanding**
+ Cordts, Marius, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, Bernt Schiele.
+ [[link]](https://www.cityscapes-dataset.com/). In CVPR, 2016.
+
+14. **Deep Residual Learning for Image Recognition**
+ Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun.
+ [[link]](https://arxiv.org/abs/1512.03385). In CVPR, 2016.
+
+15. **Progressive Neural Architecture Search**
+ Chenxi Liu, Barret Zoph, Maxim Neumann, Jonathon Shlens, Wei Hua, Li-Jia Li, Li Fei-Fei, Alan Yuille, Jonathan Huang, Kevin Murphy.
+ [[link]](https://arxiv.org/abs/1712.00559). In ECCV, 2018.
+
+16. **Searching for MobileNetV3**
+ Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, Quoc V. Le, Hartwig Adam.
+ [[link]](https://arxiv.org/abs/1905.02244). In ICCV, 2019.
diff --git a/deeplab/models/research/deeplab/__init__.py b/deeplab/models/research/deeplab/__init__.py
new file mode 100644
index 0000000..e69de29
diff --git a/deeplab/models/research/deeplab/common.py b/deeplab/models/research/deeplab/common.py
new file mode 100644
index 0000000..928f717
--- /dev/null
+++ b/deeplab/models/research/deeplab/common.py
@@ -0,0 +1,295 @@
+# Copyright 2018 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Provides flags that are common to scripts.
+
+Common flags from train/eval/vis/export_model.py are collected in this script.
+"""
+import collections
+import copy
+import json
+import tensorflow as tf
+
+flags = tf.app.flags
+
+# Flags for input preprocessing.
+
+flags.DEFINE_integer('min_resize_value', None,
+ 'Desired size of the smaller image side.')
+
+flags.DEFINE_integer('max_resize_value', None,
+ 'Maximum allowed size of the larger image side.')
+
+flags.DEFINE_integer('resize_factor', None,
+ 'Resized dimensions are multiple of factor plus one.')
+
+flags.DEFINE_boolean('keep_aspect_ratio', True,
+ 'Keep aspect ratio after resizing or not.')
+
+# Model dependent flags.
+
+flags.DEFINE_integer('logits_kernel_size', 1,
+ 'The kernel size for the convolutional kernel that '
+ 'generates logits.')
+
+# When using 'mobilent_v2', we set atrous_rates = decoder_output_stride = None.
+# When using 'xception_65' or 'resnet_v1' model variants, we set
+# atrous_rates = [6, 12, 18] (output stride 16) and decoder_output_stride = 4.
+# See core/feature_extractor.py for supported model variants.
+flags.DEFINE_string('model_variant', 'mobilenet_v2', 'DeepLab model variant.')
+
+flags.DEFINE_multi_float('image_pyramid', None,
+ 'Input scales for multi-scale feature extraction.')
+
+flags.DEFINE_boolean('add_image_level_feature', True,
+ 'Add image level feature.')
+
+flags.DEFINE_list(
+ 'image_pooling_crop_size', None,
+ 'Image pooling crop size [height, width] used in the ASPP module. When '
+ 'value is None, the model performs image pooling with "crop_size". This'
+ 'flag is useful when one likes to use different image pooling sizes.')
+
+flags.DEFINE_list(
+ 'image_pooling_stride', '1,1',
+ 'Image pooling stride [height, width] used in the ASPP image pooling. ')
+
+flags.DEFINE_boolean('aspp_with_batch_norm', True,
+ 'Use batch norm parameters for ASPP or not.')
+
+flags.DEFINE_boolean('aspp_with_separable_conv', True,
+ 'Use separable convolution for ASPP or not.')
+
+# Defaults to None. Set multi_grid = [1, 2, 4] when using provided
+# 'resnet_v1_{50,101}_beta' checkpoints.
+flags.DEFINE_multi_integer('multi_grid', None,
+ 'Employ a hierarchy of atrous rates for ResNet.')
+
+flags.DEFINE_float('depth_multiplier', 1.0,
+ 'Multiplier for the depth (number of channels) for all '
+ 'convolution ops used in MobileNet.')
+
+flags.DEFINE_integer('divisible_by', None,
+ 'An integer that ensures the layer # channels are '
+ 'divisible by this value. Used in MobileNet.')
+
+# For `xception_65`, use decoder_output_stride = 4. For `mobilenet_v2`, use
+# decoder_output_stride = None.
+flags.DEFINE_list('decoder_output_stride', None,
+ 'Comma-separated list of strings with the number specifying '
+ 'output stride of low-level features at each network level.'
+ 'Current semantic segmentation implementation assumes at '
+ 'most one output stride (i.e., either None or a list with '
+ 'only one element.')
+
+flags.DEFINE_boolean('decoder_use_separable_conv', True,
+ 'Employ separable convolution for decoder or not.')
+
+flags.DEFINE_enum('merge_method', 'max', ['max', 'avg'],
+ 'Scheme to merge multi scale features.')
+
+flags.DEFINE_boolean(
+ 'prediction_with_upsampled_logits', True,
+ 'When performing prediction, there are two options: (1) bilinear '
+ 'upsampling the logits followed by softmax, or (2) softmax followed by '
+ 'bilinear upsampling.')
+
+flags.DEFINE_string(
+ 'dense_prediction_cell_json',
+ '',
+ 'A JSON file that specifies the dense prediction cell.')
+
+flags.DEFINE_integer(
+ 'nas_stem_output_num_conv_filters', 20,
+ 'Number of filters of the stem output tensor in NAS models.')
+
+flags.DEFINE_bool('nas_use_classification_head', False,
+ 'Use image classification head for NAS model variants.')
+
+flags.DEFINE_bool('nas_remove_os32_stride', False,
+ 'Remove the stride in the output stride 32 branch.')
+
+flags.DEFINE_bool('use_bounded_activation', False,
+ 'Whether or not to use bounded activations. Bounded '
+ 'activations better lend themselves to quantized inference.')
+
+flags.DEFINE_boolean('aspp_with_concat_projection', True,
+ 'ASPP with concat projection.')
+
+flags.DEFINE_boolean('aspp_with_squeeze_and_excitation', False,
+ 'ASPP with squeeze and excitation.')
+
+flags.DEFINE_integer('aspp_convs_filters', 256, 'ASPP convolution filters.')
+
+flags.DEFINE_boolean('decoder_use_sum_merge', False,
+ 'Decoder uses simply sum merge.')
+
+flags.DEFINE_integer('decoder_filters', 256, 'Decoder filters.')
+
+flags.DEFINE_boolean('decoder_output_is_logits', False,
+ 'Use decoder output as logits or not.')
+
+flags.DEFINE_boolean('image_se_uses_qsigmoid', False, 'Use q-sigmoid.')
+
+flags.DEFINE_multi_float(
+ 'label_weights', None,
+ 'A list of label weights, each element represents the weight for the label '
+ 'of its index, for example, label_weights = [0.1, 0.5] means the weight '
+ 'for label 0 is 0.1 and the weight for label 1 is 0.5. If set as None, all '
+ 'the labels have the same weight 1.0.')
+
+flags.DEFINE_float('batch_norm_decay', 0.9997, 'Batchnorm decay.')
+
+FLAGS = flags.FLAGS
+
+# Constants
+
+# Perform semantic segmentation predictions.
+OUTPUT_TYPE = 'semantic'
+
+# Semantic segmentation item names.
+LABELS_CLASS = 'labels_class'
+IMAGE = 'image'
+HEIGHT = 'height'
+WIDTH = 'width'
+IMAGE_NAME = 'image_name'
+LABEL = 'label'
+ORIGINAL_IMAGE = 'original_image'
+
+# Test set name.
+TEST_SET = 'test'
+
+
+class ModelOptions(
+ collections.namedtuple('ModelOptions', [
+ 'outputs_to_num_classes',
+ 'crop_size',
+ 'atrous_rates',
+ 'output_stride',
+ 'preprocessed_images_dtype',
+ 'merge_method',
+ 'add_image_level_feature',
+ 'image_pooling_crop_size',
+ 'image_pooling_stride',
+ 'aspp_with_batch_norm',
+ 'aspp_with_separable_conv',
+ 'multi_grid',
+ 'decoder_output_stride',
+ 'decoder_use_separable_conv',
+ 'logits_kernel_size',
+ 'model_variant',
+ 'depth_multiplier',
+ 'divisible_by',
+ 'prediction_with_upsampled_logits',
+ 'dense_prediction_cell_config',
+ 'nas_architecture_options',
+ 'use_bounded_activation',
+ 'aspp_with_concat_projection',
+ 'aspp_with_squeeze_and_excitation',
+ 'aspp_convs_filters',
+ 'decoder_use_sum_merge',
+ 'decoder_filters',
+ 'decoder_output_is_logits',
+ 'image_se_uses_qsigmoid',
+ 'label_weights',
+ 'sync_batch_norm_method',
+ 'batch_norm_decay',
+ ])):
+ """Immutable class to hold model options."""
+
+ __slots__ = ()
+
+ def __new__(cls,
+ outputs_to_num_classes,
+ crop_size=None,
+ atrous_rates=None,
+ output_stride=8,
+ preprocessed_images_dtype=tf.float32):
+ """Constructor to set default values.
+
+ Args:
+ outputs_to_num_classes: A dictionary from output type to the number of
+ classes. For example, for the task of semantic segmentation with 21
+ semantic classes, we would have outputs_to_num_classes['semantic'] = 21.
+ crop_size: A tuple [crop_height, crop_width].
+ atrous_rates: A list of atrous convolution rates for ASPP.
+ output_stride: The ratio of input to output spatial resolution.
+ preprocessed_images_dtype: The type after the preprocessing function.
+
+ Returns:
+ A new ModelOptions instance.
+ """
+ dense_prediction_cell_config = None
+ if FLAGS.dense_prediction_cell_json:
+ with tf.gfile.Open(FLAGS.dense_prediction_cell_json, 'r') as f:
+ dense_prediction_cell_config = json.load(f)
+ decoder_output_stride = None
+ if FLAGS.decoder_output_stride:
+ decoder_output_stride = [
+ int(x) for x in FLAGS.decoder_output_stride]
+ if sorted(decoder_output_stride, reverse=True) != decoder_output_stride:
+ raise ValueError('Decoder output stride need to be sorted in the '
+ 'descending order.')
+ image_pooling_crop_size = None
+ if FLAGS.image_pooling_crop_size:
+ image_pooling_crop_size = [int(x) for x in FLAGS.image_pooling_crop_size]
+ image_pooling_stride = [1, 1]
+ if FLAGS.image_pooling_stride:
+ image_pooling_stride = [int(x) for x in FLAGS.image_pooling_stride]
+ label_weights = FLAGS.label_weights
+ if label_weights is None:
+ label_weights = 1.0
+ nas_architecture_options = {
+ 'nas_stem_output_num_conv_filters': (
+ FLAGS.nas_stem_output_num_conv_filters),
+ 'nas_use_classification_head': FLAGS.nas_use_classification_head,
+ 'nas_remove_os32_stride': FLAGS.nas_remove_os32_stride,
+ }
+ return super(ModelOptions, cls).__new__(
+ cls, outputs_to_num_classes, crop_size, atrous_rates, output_stride,
+ preprocessed_images_dtype,
+ FLAGS.merge_method,
+ FLAGS.add_image_level_feature,
+ image_pooling_crop_size,
+ image_pooling_stride,
+ FLAGS.aspp_with_batch_norm,
+ FLAGS.aspp_with_separable_conv,
+ FLAGS.multi_grid,
+ decoder_output_stride,
+ FLAGS.decoder_use_separable_conv,
+ FLAGS.logits_kernel_size,
+ FLAGS.model_variant,
+ FLAGS.depth_multiplier,
+ FLAGS.divisible_by,
+ FLAGS.prediction_with_upsampled_logits,
+ dense_prediction_cell_config,
+ nas_architecture_options,
+ FLAGS.use_bounded_activation,
+ FLAGS.aspp_with_concat_projection,
+ FLAGS.aspp_with_squeeze_and_excitation,
+ FLAGS.aspp_convs_filters,
+ FLAGS.decoder_use_sum_merge,
+ FLAGS.decoder_filters,
+ FLAGS.decoder_output_is_logits,
+ FLAGS.image_se_uses_qsigmoid,
+ label_weights,
+ 'None',
+ FLAGS.batch_norm_decay)
+
+ def __deepcopy__(self, memo):
+ return ModelOptions(copy.deepcopy(self.outputs_to_num_classes),
+ self.crop_size,
+ self.atrous_rates,
+ self.output_stride,
+ self.preprocessed_images_dtype)
diff --git a/deeplab/models/research/deeplab/common_test.py b/deeplab/models/research/deeplab/common_test.py
new file mode 100644
index 0000000..45b64e5
--- /dev/null
+++ b/deeplab/models/research/deeplab/common_test.py
@@ -0,0 +1,52 @@
+# Copyright 2018 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+
+"""Tests for common.py."""
+import copy
+
+import tensorflow as tf
+
+from deeplab import common
+
+
+class CommonTest(tf.test.TestCase):
+
+ def testOutputsToNumClasses(self):
+ num_classes = 21
+ model_options = common.ModelOptions(
+ outputs_to_num_classes={common.OUTPUT_TYPE: num_classes})
+ self.assertEqual(model_options.outputs_to_num_classes[common.OUTPUT_TYPE],
+ num_classes)
+
+ def testDeepcopy(self):
+ num_classes = 21
+ model_options = common.ModelOptions(
+ outputs_to_num_classes={common.OUTPUT_TYPE: num_classes})
+ model_options_new = copy.deepcopy(model_options)
+ self.assertEqual((model_options_new.
+ outputs_to_num_classes[common.OUTPUT_TYPE]),
+ num_classes)
+
+ num_classes_new = 22
+ model_options_new.outputs_to_num_classes[common.OUTPUT_TYPE] = (
+ num_classes_new)
+ self.assertEqual(model_options.outputs_to_num_classes[common.OUTPUT_TYPE],
+ num_classes)
+ self.assertEqual((model_options_new.
+ outputs_to_num_classes[common.OUTPUT_TYPE]),
+ num_classes_new)
+
+if __name__ == '__main__':
+ tf.test.main()
diff --git a/deeplab/models/research/deeplab/convert_to_tflite.py b/deeplab/models/research/deeplab/convert_to_tflite.py
new file mode 100644
index 0000000..d23ce9e
--- /dev/null
+++ b/deeplab/models/research/deeplab/convert_to_tflite.py
@@ -0,0 +1,112 @@
+# Copyright 2018 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Tools to convert a quantized deeplab model to tflite."""
+
+from absl import app
+from absl import flags
+import numpy as np
+from PIL import Image
+import tensorflow as tf
+
+
+flags.DEFINE_string('quantized_graph_def_path', None,
+ 'Path to quantized graphdef.')
+flags.DEFINE_string('output_tflite_path', None, 'Output TFlite model path.')
+flags.DEFINE_string(
+ 'input_tensor_name', None,
+ 'Input tensor to TFlite model. This usually should be the input tensor to '
+ 'model backbone.'
+)
+flags.DEFINE_string(
+ 'output_tensor_name', 'ArgMax:0',
+ 'Output tensor name of TFlite model. By default we output the raw semantic '
+ 'label predictions.'
+)
+flags.DEFINE_string(
+ 'test_image_path', None,
+ 'Path to an image to test the consistency between input graphdef / '
+ 'converted tflite model.'
+)
+
+FLAGS = flags.FLAGS
+
+
+def convert_to_tflite(quantized_graphdef,
+ backbone_input_tensor,
+ output_tensor):
+ """Helper method to convert quantized deeplab model to TFlite."""
+ with tf.Graph().as_default() as graph:
+ tf.graph_util.import_graph_def(quantized_graphdef, name='')
+ sess = tf.compat.v1.Session()
+
+ tflite_input = graph.get_tensor_by_name(backbone_input_tensor)
+ tflite_output = graph.get_tensor_by_name(output_tensor)
+ converter = tf.compat.v1.lite.TFLiteConverter.from_session(
+ sess, [tflite_input], [tflite_output])
+ converter.inference_type = tf.compat.v1.lite.constants.QUANTIZED_UINT8
+ input_arrays = converter.get_input_arrays()
+ converter.quantized_input_stats = {input_arrays[0]: (127.5, 127.5)}
+ return converter.convert()
+
+
+def check_tflite_consistency(graph_def, tflite_model, image_path):
+ """Runs tflite and frozen graph on same input, check their outputs match."""
+ # Load tflite model and check input size.
+ interpreter = tf.lite.Interpreter(model_content=tflite_model)
+ interpreter.allocate_tensors()
+ input_details = interpreter.get_input_details()
+ output_details = interpreter.get_output_details()
+ height, width = input_details[0]['shape'][1:3]
+
+ # Prepare input image data.
+ with tf.io.gfile.GFile(image_path, 'rb') as f:
+ image = Image.open(f)
+ image = np.asarray(image.convert('RGB').resize((width, height)))
+ image = np.expand_dims(image, 0)
+
+ # Output from tflite model.
+ interpreter.set_tensor(input_details[0]['index'], image)
+ interpreter.invoke()
+ output_tflite = interpreter.get_tensor(output_details[0]['index'])
+
+ with tf.Graph().as_default():
+ tf.graph_util.import_graph_def(graph_def, name='')
+ with tf.compat.v1.Session() as sess:
+ # Note here the graph will include preprocessing part of the graph
+ # (e.g. resize, pad, normalize). Given the input image size is at the
+ # crop size (backbone input size), resize / pad should be an identity op.
+ output_graph = sess.run(
+ FLAGS.output_tensor_name, feed_dict={'ImageTensor:0': image})
+
+ print('%.2f%% pixels have matched semantic labels.' % (
+ 100 * np.mean(output_graph == output_tflite)))
+
+
+def main(unused_argv):
+ with tf.io.gfile.GFile(FLAGS.quantized_graph_def_path, 'rb') as f:
+ graph_def = tf.compat.v1.GraphDef.FromString(f.read())
+ tflite_model = convert_to_tflite(
+ graph_def, FLAGS.input_tensor_name, FLAGS.output_tensor_name)
+
+ if FLAGS.output_tflite_path:
+ with tf.io.gfile.GFile(FLAGS.output_tflite_path, 'wb') as f:
+ f.write(tflite_model)
+
+ if FLAGS.test_image_path:
+ check_tflite_consistency(graph_def, tflite_model, FLAGS.test_image_path)
+
+
+if __name__ == '__main__':
+ app.run(main)
diff --git a/deeplab/models/research/deeplab/core/__init__.py b/deeplab/models/research/deeplab/core/__init__.py
new file mode 100644
index 0000000..e69de29
diff --git a/deeplab/models/research/deeplab/core/conv2d_ws.py b/deeplab/models/research/deeplab/core/conv2d_ws.py
new file mode 100644
index 0000000..9aaaf33
--- /dev/null
+++ b/deeplab/models/research/deeplab/core/conv2d_ws.py
@@ -0,0 +1,369 @@
+# Lint as: python2, python3
+# Copyright 2019 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+
+"""Augment slim.conv2d with optional Weight Standardization (WS).
+
+WS is a normalization method to accelerate micro-batch training. When used with
+Group Normalization and trained with 1 image/GPU, WS is able to match or
+outperform the performances of BN trained with large batch sizes.
+[1] Siyuan Qiao, Huiyu Wang, Chenxi Liu, Wei Shen, Alan Yuille
+ Weight Standardization. arXiv:1903.10520
+[2] Lei Huang, Xianglong Liu, Yang Liu, Bo Lang, Dacheng Tao
+ Centered Weight Normalization in Accelerating Training of Deep Neural
+ Networks. ICCV 2017
+"""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import tensorflow as tf
+from tensorflow.contrib import framework as contrib_framework
+from tensorflow.contrib import layers as contrib_layers
+
+from tensorflow.contrib.layers.python.layers import layers
+from tensorflow.contrib.layers.python.layers import utils
+
+
+class Conv2D(tf.keras.layers.Conv2D, tf.layers.Layer):
+ """2D convolution layer (e.g. spatial convolution over images).
+
+ This layer creates a convolution kernel that is convolved
+ (actually cross-correlated) with the layer input to produce a tensor of
+ outputs. If `use_bias` is True (and a `bias_initializer` is provided),
+ a bias vector is created and added to the outputs. Finally, if
+ `activation` is not `None`, it is applied to the outputs as well.
+ """
+
+ def __init__(self,
+ filters,
+ kernel_size,
+ strides=(1, 1),
+ padding='valid',
+ data_format='channels_last',
+ dilation_rate=(1, 1),
+ activation=None,
+ use_bias=True,
+ kernel_initializer=None,
+ bias_initializer=tf.zeros_initializer(),
+ kernel_regularizer=None,
+ bias_regularizer=None,
+ use_weight_standardization=False,
+ activity_regularizer=None,
+ kernel_constraint=None,
+ bias_constraint=None,
+ trainable=True,
+ name=None,
+ **kwargs):
+ """Constructs the 2D convolution layer.
+
+ Args:
+ filters: Integer, the dimensionality of the output space (i.e. the number
+ of filters in the convolution).
+ kernel_size: An integer or tuple/list of 2 integers, specifying the height
+ and width of the 2D convolution window. Can be a single integer to
+ specify the same value for all spatial dimensions.
+ strides: An integer or tuple/list of 2 integers, specifying the strides of
+ the convolution along the height and width. Can be a single integer to
+ specify the same value for all spatial dimensions. Specifying any stride
+ value != 1 is incompatible with specifying any `dilation_rate` value !=
+ 1.
+ padding: One of `"valid"` or `"same"` (case-insensitive).
+ data_format: A string, one of `channels_last` (default) or
+ `channels_first`. The ordering of the dimensions in the inputs.
+ `channels_last` corresponds to inputs with shape `(batch, height, width,
+ channels)` while `channels_first` corresponds to inputs with shape
+ `(batch, channels, height, width)`.
+ dilation_rate: An integer or tuple/list of 2 integers, specifying the
+ dilation rate to use for dilated convolution. Can be a single integer to
+ specify the same value for all spatial dimensions. Currently, specifying
+ any `dilation_rate` value != 1 is incompatible with specifying any
+ stride value != 1.
+ activation: Activation function. Set it to None to maintain a linear
+ activation.
+ use_bias: Boolean, whether the layer uses a bias.
+ kernel_initializer: An initializer for the convolution kernel.
+ bias_initializer: An initializer for the bias vector. If None, the default
+ initializer will be used.
+ kernel_regularizer: Optional regularizer for the convolution kernel.
+ bias_regularizer: Optional regularizer for the bias vector.
+ use_weight_standardization: Boolean, whether the layer uses weight
+ standardization.
+ activity_regularizer: Optional regularizer function for the output.
+ kernel_constraint: Optional projection function to be applied to the
+ kernel after being updated by an `Optimizer` (e.g. used to implement
+ norm constraints or value constraints for layer weights). The function
+ must take as input the unprojected variable and must return the
+ projected variable (which must have the same shape). Constraints are not
+ safe to use when doing asynchronous distributed training.
+ bias_constraint: Optional projection function to be applied to the bias
+ after being updated by an `Optimizer`.
+ trainable: Boolean, if `True` also add variables to the graph collection
+ `GraphKeys.TRAINABLE_VARIABLES` (see `tf.Variable`).
+ name: A string, the name of the layer.
+ **kwargs: Arbitrary keyword arguments passed to tf.keras.layers.Conv2D
+ """
+
+ super(Conv2D, self).__init__(
+ filters=filters,
+ kernel_size=kernel_size,
+ strides=strides,
+ padding=padding,
+ data_format=data_format,
+ dilation_rate=dilation_rate,
+ activation=activation,
+ use_bias=use_bias,
+ kernel_initializer=kernel_initializer,
+ bias_initializer=bias_initializer,
+ kernel_regularizer=kernel_regularizer,
+ bias_regularizer=bias_regularizer,
+ activity_regularizer=activity_regularizer,
+ kernel_constraint=kernel_constraint,
+ bias_constraint=bias_constraint,
+ trainable=trainable,
+ name=name,
+ **kwargs)
+ self.use_weight_standardization = use_weight_standardization
+
+ def call(self, inputs):
+ if self.use_weight_standardization:
+ mean, var = tf.nn.moments(self.kernel, [0, 1, 2], keep_dims=True)
+ kernel = (self.kernel - mean) / tf.sqrt(var + 1e-5)
+ outputs = self._convolution_op(inputs, kernel)
+ else:
+ outputs = self._convolution_op(inputs, self.kernel)
+
+ if self.use_bias:
+ if self.data_format == 'channels_first':
+ if self.rank == 1:
+ # tf.nn.bias_add does not accept a 1D input tensor.
+ bias = tf.reshape(self.bias, (1, self.filters, 1))
+ outputs += bias
+ else:
+ outputs = tf.nn.bias_add(outputs, self.bias, data_format='NCHW')
+ else:
+ outputs = tf.nn.bias_add(outputs, self.bias, data_format='NHWC')
+
+ if self.activation is not None:
+ return self.activation(outputs)
+ return outputs
+
+
+@contrib_framework.add_arg_scope
+def conv2d(inputs,
+ num_outputs,
+ kernel_size,
+ stride=1,
+ padding='SAME',
+ data_format=None,
+ rate=1,
+ activation_fn=tf.nn.relu,
+ normalizer_fn=None,
+ normalizer_params=None,
+ weights_initializer=contrib_layers.xavier_initializer(),
+ weights_regularizer=None,
+ biases_initializer=tf.zeros_initializer(),
+ biases_regularizer=None,
+ use_weight_standardization=False,
+ reuse=None,
+ variables_collections=None,
+ outputs_collections=None,
+ trainable=True,
+ scope=None):
+ """Adds a 2D convolution followed by an optional batch_norm layer.
+
+ `convolution` creates a variable called `weights`, representing the
+ convolutional kernel, that is convolved (actually cross-correlated) with the
+ `inputs` to produce a `Tensor` of activations. If a `normalizer_fn` is
+ provided (such as `batch_norm`), it is then applied. Otherwise, if
+ `normalizer_fn` is None and a `biases_initializer` is provided then a `biases`
+ variable would be created and added the activations. Finally, if
+ `activation_fn` is not `None`, it is applied to the activations as well.
+
+ Performs atrous convolution with input stride/dilation rate equal to `rate`
+ if a value > 1 for any dimension of `rate` is specified. In this case
+ `stride` values != 1 are not supported.
+
+ Args:
+ inputs: A Tensor of rank N+2 of shape `[batch_size] + input_spatial_shape +
+ [in_channels]` if data_format does not start with "NC" (default), or
+ `[batch_size, in_channels] + input_spatial_shape` if data_format starts
+ with "NC".
+ num_outputs: Integer, the number of output filters.
+ kernel_size: A sequence of N positive integers specifying the spatial
+ dimensions of the filters. Can be a single integer to specify the same
+ value for all spatial dimensions.
+ stride: A sequence of N positive integers specifying the stride at which to
+ compute output. Can be a single integer to specify the same value for all
+ spatial dimensions. Specifying any `stride` value != 1 is incompatible
+ with specifying any `rate` value != 1.
+ padding: One of `"VALID"` or `"SAME"`.
+ data_format: A string or None. Specifies whether the channel dimension of
+ the `input` and output is the last dimension (default, or if `data_format`
+ does not start with "NC"), or the second dimension (if `data_format`
+ starts with "NC"). For N=1, the valid values are "NWC" (default) and
+ "NCW". For N=2, the valid values are "NHWC" (default) and "NCHW". For
+ N=3, the valid values are "NDHWC" (default) and "NCDHW".
+ rate: A sequence of N positive integers specifying the dilation rate to use
+ for atrous convolution. Can be a single integer to specify the same value
+ for all spatial dimensions. Specifying any `rate` value != 1 is
+ incompatible with specifying any `stride` value != 1.
+ activation_fn: Activation function. The default value is a ReLU function.
+ Explicitly set it to None to skip it and maintain a linear activation.
+ normalizer_fn: Normalization function to use instead of `biases`. If
+ `normalizer_fn` is provided then `biases_initializer` and
+ `biases_regularizer` are ignored and `biases` are not created nor added.
+ default set to None for no normalizer function
+ normalizer_params: Normalization function parameters.
+ weights_initializer: An initializer for the weights.
+ weights_regularizer: Optional regularizer for the weights.
+ biases_initializer: An initializer for the biases. If None skip biases.
+ biases_regularizer: Optional regularizer for the biases.
+ use_weight_standardization: Boolean, whether the layer uses weight
+ standardization.
+ reuse: Whether or not the layer and its variables should be reused. To be
+ able to reuse the layer scope must be given.
+ variables_collections: Optional list of collections for all the variables or
+ a dictionary containing a different list of collection per variable.
+ outputs_collections: Collection to add the outputs.
+ trainable: If `True` also add variables to the graph collection
+ `GraphKeys.TRAINABLE_VARIABLES` (see tf.Variable).
+ scope: Optional scope for `variable_scope`.
+
+ Returns:
+ A tensor representing the output of the operation.
+
+ Raises:
+ ValueError: If `data_format` is invalid.
+ ValueError: Both 'rate' and `stride` are not uniformly 1.
+ """
+ if data_format not in [None, 'NWC', 'NCW', 'NHWC', 'NCHW', 'NDHWC', 'NCDHW']:
+ raise ValueError('Invalid data_format: %r' % (data_format,))
+
+ # pylint: disable=protected-access
+ layer_variable_getter = layers._build_variable_getter({
+ 'bias': 'biases',
+ 'kernel': 'weights'
+ })
+ # pylint: enable=protected-access
+ with tf.variable_scope(
+ scope, 'Conv', [inputs], reuse=reuse,
+ custom_getter=layer_variable_getter) as sc:
+ inputs = tf.convert_to_tensor(inputs)
+ input_rank = inputs.get_shape().ndims
+
+ if input_rank != 4:
+ raise ValueError('Convolution expects input with rank %d, got %d' %
+ (4, input_rank))
+
+ data_format = ('channels_first' if data_format and
+ data_format.startswith('NC') else 'channels_last')
+ layer = Conv2D(
+ filters=num_outputs,
+ kernel_size=kernel_size,
+ strides=stride,
+ padding=padding,
+ data_format=data_format,
+ dilation_rate=rate,
+ activation=None,
+ use_bias=not normalizer_fn and biases_initializer,
+ kernel_initializer=weights_initializer,
+ bias_initializer=biases_initializer,
+ kernel_regularizer=weights_regularizer,
+ bias_regularizer=biases_regularizer,
+ use_weight_standardization=use_weight_standardization,
+ activity_regularizer=None,
+ trainable=trainable,
+ name=sc.name,
+ dtype=inputs.dtype.base_dtype,
+ _scope=sc,
+ _reuse=reuse)
+ outputs = layer.apply(inputs)
+
+ # Add variables to collections.
+ # pylint: disable=protected-access
+ layers._add_variable_to_collections(layer.kernel, variables_collections,
+ 'weights')
+ if layer.use_bias:
+ layers._add_variable_to_collections(layer.bias, variables_collections,
+ 'biases')
+ # pylint: enable=protected-access
+ if normalizer_fn is not None:
+ normalizer_params = normalizer_params or {}
+ outputs = normalizer_fn(outputs, **normalizer_params)
+
+ if activation_fn is not None:
+ outputs = activation_fn(outputs)
+ return utils.collect_named_outputs(outputs_collections, sc.name, outputs)
+
+
+def conv2d_same(inputs, num_outputs, kernel_size, stride, rate=1, scope=None):
+ """Strided 2-D convolution with 'SAME' padding.
+
+ When stride > 1, then we do explicit zero-padding, followed by conv2d with
+ 'VALID' padding.
+
+ Note that
+
+ net = conv2d_same(inputs, num_outputs, 3, stride=stride)
+
+ is equivalent to
+
+ net = conv2d(inputs, num_outputs, 3, stride=1, padding='SAME')
+ net = subsample(net, factor=stride)
+
+ whereas
+
+ net = conv2d(inputs, num_outputs, 3, stride=stride, padding='SAME')
+
+ is different when the input's height or width is even, which is why we add the
+ current function. For more details, see ResnetUtilsTest.testConv2DSameEven().
+
+ Args:
+ inputs: A 4-D tensor of size [batch, height_in, width_in, channels].
+ num_outputs: An integer, the number of output filters.
+ kernel_size: An int with the kernel_size of the filters.
+ stride: An integer, the output stride.
+ rate: An integer, rate for atrous convolution.
+ scope: Scope.
+
+ Returns:
+ output: A 4-D tensor of size [batch, height_out, width_out, channels] with
+ the convolution output.
+ """
+ if stride == 1:
+ return conv2d(
+ inputs,
+ num_outputs,
+ kernel_size,
+ stride=1,
+ rate=rate,
+ padding='SAME',
+ scope=scope)
+ else:
+ kernel_size_effective = kernel_size + (kernel_size - 1) * (rate - 1)
+ pad_total = kernel_size_effective - 1
+ pad_beg = pad_total // 2
+ pad_end = pad_total - pad_beg
+ inputs = tf.pad(inputs,
+ [[0, 0], [pad_beg, pad_end], [pad_beg, pad_end], [0, 0]])
+ return conv2d(
+ inputs,
+ num_outputs,
+ kernel_size,
+ stride=stride,
+ rate=rate,
+ padding='VALID',
+ scope=scope)
diff --git a/deeplab/models/research/deeplab/core/conv2d_ws_test.py b/deeplab/models/research/deeplab/core/conv2d_ws_test.py
new file mode 100644
index 0000000..b6bea85
--- /dev/null
+++ b/deeplab/models/research/deeplab/core/conv2d_ws_test.py
@@ -0,0 +1,420 @@
+# Lint as: python2, python3
+# Copyright 2019 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+
+"""Tests for conv2d_ws."""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import numpy as np
+import tensorflow as tf
+from tensorflow.contrib import framework as contrib_framework
+from tensorflow.contrib import layers as contrib_layers
+from deeplab.core import conv2d_ws
+
+
+class ConvolutionTest(tf.test.TestCase):
+
+ def testInvalidShape(self):
+ with self.cached_session():
+ images_3d = tf.random_uniform((5, 6, 7, 9, 3), seed=1)
+ with self.assertRaisesRegexp(
+ ValueError, 'Convolution expects input with rank 4, got 5'):
+ conv2d_ws.conv2d(images_3d, 32, 3)
+
+ def testInvalidDataFormat(self):
+ height, width = 7, 9
+ with self.cached_session():
+ images = tf.random_uniform((5, height, width, 3), seed=1)
+ with self.assertRaisesRegexp(ValueError, 'data_format'):
+ conv2d_ws.conv2d(images, 32, 3, data_format='CHWN')
+
+ def testCreateConv(self):
+ height, width = 7, 9
+ with self.cached_session():
+ images = np.random.uniform(size=(5, height, width, 4)).astype(np.float32)
+ output = conv2d_ws.conv2d(images, 32, [3, 3])
+ self.assertEqual(output.op.name, 'Conv/Relu')
+ self.assertListEqual(output.get_shape().as_list(), [5, height, width, 32])
+ weights = contrib_framework.get_variables_by_name('weights')[0]
+ self.assertListEqual(weights.get_shape().as_list(), [3, 3, 4, 32])
+ biases = contrib_framework.get_variables_by_name('biases')[0]
+ self.assertListEqual(biases.get_shape().as_list(), [32])
+
+ def testCreateConvWithWS(self):
+ height, width = 7, 9
+ with self.cached_session():
+ images = np.random.uniform(size=(5, height, width, 4)).astype(np.float32)
+ output = conv2d_ws.conv2d(
+ images, 32, [3, 3], use_weight_standardization=True)
+ self.assertEqual(output.op.name, 'Conv/Relu')
+ self.assertListEqual(output.get_shape().as_list(), [5, height, width, 32])
+ weights = contrib_framework.get_variables_by_name('weights')[0]
+ self.assertListEqual(weights.get_shape().as_list(), [3, 3, 4, 32])
+ biases = contrib_framework.get_variables_by_name('biases')[0]
+ self.assertListEqual(biases.get_shape().as_list(), [32])
+
+ def testCreateConvNCHW(self):
+ height, width = 7, 9
+ with self.cached_session():
+ images = np.random.uniform(size=(5, 4, height, width)).astype(np.float32)
+ output = conv2d_ws.conv2d(images, 32, [3, 3], data_format='NCHW')
+ self.assertEqual(output.op.name, 'Conv/Relu')
+ self.assertListEqual(output.get_shape().as_list(), [5, 32, height, width])
+ weights = contrib_framework.get_variables_by_name('weights')[0]
+ self.assertListEqual(weights.get_shape().as_list(), [3, 3, 4, 32])
+ biases = contrib_framework.get_variables_by_name('biases')[0]
+ self.assertListEqual(biases.get_shape().as_list(), [32])
+
+ def testCreateSquareConv(self):
+ height, width = 7, 9
+ with self.cached_session():
+ images = tf.random_uniform((5, height, width, 3), seed=1)
+ output = conv2d_ws.conv2d(images, 32, 3)
+ self.assertEqual(output.op.name, 'Conv/Relu')
+ self.assertListEqual(output.get_shape().as_list(), [5, height, width, 32])
+
+ def testCreateConvWithTensorShape(self):
+ height, width = 7, 9
+ with self.cached_session():
+ images = tf.random_uniform((5, height, width, 3), seed=1)
+ output = conv2d_ws.conv2d(images, 32, images.get_shape()[1:3])
+ self.assertEqual(output.op.name, 'Conv/Relu')
+ self.assertListEqual(output.get_shape().as_list(), [5, height, width, 32])
+
+ def testCreateFullyConv(self):
+ height, width = 7, 9
+ with self.cached_session():
+ images = tf.random_uniform((5, height, width, 32), seed=1)
+ output = conv2d_ws.conv2d(
+ images, 64, images.get_shape()[1:3], padding='VALID')
+ self.assertEqual(output.op.name, 'Conv/Relu')
+ self.assertListEqual(output.get_shape().as_list(), [5, 1, 1, 64])
+ biases = contrib_framework.get_variables_by_name('biases')[0]
+ self.assertListEqual(biases.get_shape().as_list(), [64])
+
+ def testFullyConvWithCustomGetter(self):
+ height, width = 7, 9
+ with self.cached_session():
+ called = [0]
+
+ def custom_getter(getter, *args, **kwargs):
+ called[0] += 1
+ return getter(*args, **kwargs)
+
+ with tf.variable_scope('test', custom_getter=custom_getter):
+ images = tf.random_uniform((5, height, width, 32), seed=1)
+ conv2d_ws.conv2d(images, 64, images.get_shape()[1:3])
+ self.assertEqual(called[0], 2) # Custom getter called twice.
+
+ def testCreateVerticalConv(self):
+ height, width = 7, 9
+ with self.cached_session():
+ images = tf.random_uniform((5, height, width, 4), seed=1)
+ output = conv2d_ws.conv2d(images, 32, [3, 1])
+ self.assertEqual(output.op.name, 'Conv/Relu')
+ self.assertListEqual(output.get_shape().as_list(), [5, height, width, 32])
+ weights = contrib_framework.get_variables_by_name('weights')[0]
+ self.assertListEqual(weights.get_shape().as_list(), [3, 1, 4, 32])
+ biases = contrib_framework.get_variables_by_name('biases')[0]
+ self.assertListEqual(biases.get_shape().as_list(), [32])
+
+ def testCreateHorizontalConv(self):
+ height, width = 7, 9
+ with self.cached_session():
+ images = tf.random_uniform((5, height, width, 4), seed=1)
+ output = conv2d_ws.conv2d(images, 32, [1, 3])
+ self.assertEqual(output.op.name, 'Conv/Relu')
+ self.assertListEqual(output.get_shape().as_list(), [5, height, width, 32])
+ weights = contrib_framework.get_variables_by_name('weights')[0]
+ self.assertListEqual(weights.get_shape().as_list(), [1, 3, 4, 32])
+
+ def testCreateConvWithStride(self):
+ height, width = 6, 8
+ with self.cached_session():
+ images = tf.random_uniform((5, height, width, 3), seed=1)
+ output = conv2d_ws.conv2d(images, 32, [3, 3], stride=2)
+ self.assertEqual(output.op.name, 'Conv/Relu')
+ self.assertListEqual(output.get_shape().as_list(),
+ [5, height / 2, width / 2, 32])
+
+ def testCreateConvCreatesWeightsAndBiasesVars(self):
+ height, width = 7, 9
+ images = tf.random_uniform((5, height, width, 3), seed=1)
+ with self.cached_session():
+ self.assertFalse(contrib_framework.get_variables('conv1/weights'))
+ self.assertFalse(contrib_framework.get_variables('conv1/biases'))
+ conv2d_ws.conv2d(images, 32, [3, 3], scope='conv1')
+ self.assertTrue(contrib_framework.get_variables('conv1/weights'))
+ self.assertTrue(contrib_framework.get_variables('conv1/biases'))
+
+ def testCreateConvWithScope(self):
+ height, width = 7, 9
+ with self.cached_session():
+ images = tf.random_uniform((5, height, width, 3), seed=1)
+ output = conv2d_ws.conv2d(images, 32, [3, 3], scope='conv1')
+ self.assertEqual(output.op.name, 'conv1/Relu')
+
+ def testCreateConvWithCollection(self):
+ height, width = 7, 9
+ images = tf.random_uniform((5, height, width, 3), seed=1)
+ with tf.name_scope('fe'):
+ conv = conv2d_ws.conv2d(
+ images, 32, [3, 3], outputs_collections='outputs', scope='Conv')
+ output_collected = tf.get_collection('outputs')[0]
+ self.assertEqual(output_collected.aliases, ['Conv'])
+ self.assertEqual(output_collected, conv)
+
+ def testCreateConvWithoutActivation(self):
+ height, width = 7, 9
+ with self.cached_session():
+ images = tf.random_uniform((5, height, width, 3), seed=1)
+ output = conv2d_ws.conv2d(images, 32, [3, 3], activation_fn=None)
+ self.assertEqual(output.op.name, 'Conv/BiasAdd')
+
+ def testCreateConvValid(self):
+ height, width = 7, 9
+ with self.cached_session():
+ images = tf.random_uniform((5, height, width, 3), seed=1)
+ output = conv2d_ws.conv2d(images, 32, [3, 3], padding='VALID')
+ self.assertListEqual(output.get_shape().as_list(), [5, 5, 7, 32])
+
+ def testCreateConvWithWD(self):
+ height, width = 7, 9
+ weight_decay = 0.01
+ with self.cached_session() as sess:
+ images = tf.random_uniform((5, height, width, 3), seed=1)
+ regularizer = contrib_layers.l2_regularizer(weight_decay)
+ conv2d_ws.conv2d(images, 32, [3, 3], weights_regularizer=regularizer)
+ l2_loss = tf.nn.l2_loss(
+ contrib_framework.get_variables_by_name('weights')[0])
+ wd = tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES)[0]
+ self.assertEqual(wd.op.name, 'Conv/kernel/Regularizer/l2_regularizer')
+ sess.run(tf.global_variables_initializer())
+ self.assertAlmostEqual(sess.run(wd), weight_decay * l2_loss.eval())
+
+ def testCreateConvNoRegularizers(self):
+ height, width = 7, 9
+ with self.cached_session():
+ images = tf.random_uniform((5, height, width, 3), seed=1)
+ conv2d_ws.conv2d(images, 32, [3, 3])
+ self.assertEqual(
+ tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES), [])
+
+ def testReuseVars(self):
+ height, width = 7, 9
+ with self.cached_session():
+ images = tf.random_uniform((5, height, width, 3), seed=1)
+ conv2d_ws.conv2d(images, 32, [3, 3], scope='conv1')
+ self.assertEqual(len(contrib_framework.get_variables()), 2)
+ conv2d_ws.conv2d(images, 32, [3, 3], scope='conv1', reuse=True)
+ self.assertEqual(len(contrib_framework.get_variables()), 2)
+
+ def testNonReuseVars(self):
+ height, width = 7, 9
+ with self.cached_session():
+ images = tf.random_uniform((5, height, width, 3), seed=1)
+ conv2d_ws.conv2d(images, 32, [3, 3])
+ self.assertEqual(len(contrib_framework.get_variables()), 2)
+ conv2d_ws.conv2d(images, 32, [3, 3])
+ self.assertEqual(len(contrib_framework.get_variables()), 4)
+
+ def testReuseConvWithWD(self):
+ height, width = 7, 9
+ with self.cached_session():
+ images = tf.random_uniform((5, height, width, 3), seed=1)
+ weight_decay = contrib_layers.l2_regularizer(0.01)
+ with contrib_framework.arg_scope([conv2d_ws.conv2d],
+ weights_regularizer=weight_decay):
+ conv2d_ws.conv2d(images, 32, [3, 3], scope='conv1')
+ self.assertEqual(len(contrib_framework.get_variables()), 2)
+ self.assertEqual(
+ len(tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES)), 1)
+ conv2d_ws.conv2d(images, 32, [3, 3], scope='conv1', reuse=True)
+ self.assertEqual(len(contrib_framework.get_variables()), 2)
+ self.assertEqual(
+ len(tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES)), 1)
+
+ def testConvWithBatchNorm(self):
+ height, width = 7, 9
+ with self.cached_session():
+ images = tf.random_uniform((5, height, width, 32), seed=1)
+ with contrib_framework.arg_scope([conv2d_ws.conv2d],
+ normalizer_fn=contrib_layers.batch_norm,
+ normalizer_params={'decay': 0.9}):
+ net = conv2d_ws.conv2d(images, 32, [3, 3])
+ net = conv2d_ws.conv2d(net, 32, [3, 3])
+ self.assertEqual(len(contrib_framework.get_variables()), 8)
+ self.assertEqual(
+ len(contrib_framework.get_variables('Conv/BatchNorm')), 3)
+ self.assertEqual(
+ len(contrib_framework.get_variables('Conv_1/BatchNorm')), 3)
+
+ def testReuseConvWithBatchNorm(self):
+ height, width = 7, 9
+ with self.cached_session():
+ images = tf.random_uniform((5, height, width, 32), seed=1)
+ with contrib_framework.arg_scope([conv2d_ws.conv2d],
+ normalizer_fn=contrib_layers.batch_norm,
+ normalizer_params={'decay': 0.9}):
+ net = conv2d_ws.conv2d(images, 32, [3, 3], scope='Conv')
+ net = conv2d_ws.conv2d(net, 32, [3, 3], scope='Conv', reuse=True)
+ self.assertEqual(len(contrib_framework.get_variables()), 4)
+ self.assertEqual(
+ len(contrib_framework.get_variables('Conv/BatchNorm')), 3)
+ self.assertEqual(
+ len(contrib_framework.get_variables('Conv_1/BatchNorm')), 0)
+
+ def testCreateConvCreatesWeightsAndBiasesVarsWithRateTwo(self):
+ height, width = 7, 9
+ images = tf.random_uniform((5, height, width, 3), seed=1)
+ with self.cached_session():
+ self.assertFalse(contrib_framework.get_variables('conv1/weights'))
+ self.assertFalse(contrib_framework.get_variables('conv1/biases'))
+ conv2d_ws.conv2d(images, 32, [3, 3], rate=2, scope='conv1')
+ self.assertTrue(contrib_framework.get_variables('conv1/weights'))
+ self.assertTrue(contrib_framework.get_variables('conv1/biases'))
+
+ def testOutputSizeWithRateTwoSamePadding(self):
+ num_filters = 32
+ input_size = [5, 10, 12, 3]
+ expected_size = [5, 10, 12, num_filters]
+
+ images = tf.random_uniform(input_size, seed=1)
+ output = conv2d_ws.conv2d(
+ images, num_filters, [3, 3], rate=2, padding='SAME')
+ self.assertListEqual(list(output.get_shape().as_list()), expected_size)
+ with self.cached_session() as sess:
+ sess.run(tf.global_variables_initializer())
+ self.assertEqual(output.op.name, 'Conv/Relu')
+ self.assertListEqual(list(output.eval().shape), expected_size)
+
+ def testOutputSizeWithRateTwoValidPadding(self):
+ num_filters = 32
+ input_size = [5, 10, 12, 3]
+ expected_size = [5, 6, 8, num_filters]
+
+ images = tf.random_uniform(input_size, seed=1)
+ output = conv2d_ws.conv2d(
+ images, num_filters, [3, 3], rate=2, padding='VALID')
+ self.assertListEqual(list(output.get_shape().as_list()), expected_size)
+ with self.cached_session() as sess:
+ sess.run(tf.global_variables_initializer())
+ self.assertEqual(output.op.name, 'Conv/Relu')
+ self.assertListEqual(list(output.eval().shape), expected_size)
+
+ def testOutputSizeWithRateTwoThreeValidPadding(self):
+ num_filters = 32
+ input_size = [5, 10, 12, 3]
+ expected_size = [5, 6, 6, num_filters]
+
+ images = tf.random_uniform(input_size, seed=1)
+ output = conv2d_ws.conv2d(
+ images, num_filters, [3, 3], rate=[2, 3], padding='VALID')
+ self.assertListEqual(list(output.get_shape().as_list()), expected_size)
+ with self.cached_session() as sess:
+ sess.run(tf.global_variables_initializer())
+ self.assertEqual(output.op.name, 'Conv/Relu')
+ self.assertListEqual(list(output.eval().shape), expected_size)
+
+ def testDynamicOutputSizeWithRateOneValidPadding(self):
+ num_filters = 32
+ input_size = [5, 9, 11, 3]
+ expected_size = [None, None, None, num_filters]
+ expected_size_dynamic = [5, 7, 9, num_filters]
+
+ with self.cached_session():
+ images = tf.placeholder(np.float32, [None, None, None, input_size[3]])
+ output = conv2d_ws.conv2d(
+ images, num_filters, [3, 3], rate=1, padding='VALID')
+ tf.global_variables_initializer().run()
+ self.assertEqual(output.op.name, 'Conv/Relu')
+ self.assertListEqual(output.get_shape().as_list(), expected_size)
+ eval_output = output.eval({images: np.zeros(input_size, np.float32)})
+ self.assertListEqual(list(eval_output.shape), expected_size_dynamic)
+
+ def testDynamicOutputSizeWithRateOneValidPaddingNCHW(self):
+ if tf.test.is_gpu_available(cuda_only=True):
+ num_filters = 32
+ input_size = [5, 3, 9, 11]
+ expected_size = [None, num_filters, None, None]
+ expected_size_dynamic = [5, num_filters, 7, 9]
+
+ with self.session(use_gpu=True):
+ images = tf.placeholder(np.float32, [None, input_size[1], None, None])
+ output = conv2d_ws.conv2d(
+ images,
+ num_filters, [3, 3],
+ rate=1,
+ padding='VALID',
+ data_format='NCHW')
+ tf.global_variables_initializer().run()
+ self.assertEqual(output.op.name, 'Conv/Relu')
+ self.assertListEqual(output.get_shape().as_list(), expected_size)
+ eval_output = output.eval({images: np.zeros(input_size, np.float32)})
+ self.assertListEqual(list(eval_output.shape), expected_size_dynamic)
+
+ def testDynamicOutputSizeWithRateTwoValidPadding(self):
+ num_filters = 32
+ input_size = [5, 9, 11, 3]
+ expected_size = [None, None, None, num_filters]
+ expected_size_dynamic = [5, 5, 7, num_filters]
+
+ with self.cached_session():
+ images = tf.placeholder(np.float32, [None, None, None, input_size[3]])
+ output = conv2d_ws.conv2d(
+ images, num_filters, [3, 3], rate=2, padding='VALID')
+ tf.global_variables_initializer().run()
+ self.assertEqual(output.op.name, 'Conv/Relu')
+ self.assertListEqual(output.get_shape().as_list(), expected_size)
+ eval_output = output.eval({images: np.zeros(input_size, np.float32)})
+ self.assertListEqual(list(eval_output.shape), expected_size_dynamic)
+
+ def testWithScope(self):
+ num_filters = 32
+ input_size = [5, 9, 11, 3]
+ expected_size = [5, 5, 7, num_filters]
+
+ images = tf.random_uniform(input_size, seed=1)
+ output = conv2d_ws.conv2d(
+ images, num_filters, [3, 3], rate=2, padding='VALID', scope='conv7')
+ with self.cached_session() as sess:
+ sess.run(tf.global_variables_initializer())
+ self.assertEqual(output.op.name, 'conv7/Relu')
+ self.assertListEqual(list(output.eval().shape), expected_size)
+
+ def testWithScopeWithoutActivation(self):
+ num_filters = 32
+ input_size = [5, 9, 11, 3]
+ expected_size = [5, 5, 7, num_filters]
+
+ images = tf.random_uniform(input_size, seed=1)
+ output = conv2d_ws.conv2d(
+ images,
+ num_filters, [3, 3],
+ rate=2,
+ padding='VALID',
+ activation_fn=None,
+ scope='conv7')
+ with self.cached_session() as sess:
+ sess.run(tf.global_variables_initializer())
+ self.assertEqual(output.op.name, 'conv7/BiasAdd')
+ self.assertListEqual(list(output.eval().shape), expected_size)
+
+
+if __name__ == '__main__':
+ tf.test.main()
diff --git a/deeplab/models/research/deeplab/core/dense_prediction_cell.py b/deeplab/models/research/deeplab/core/dense_prediction_cell.py
new file mode 100644
index 0000000..8e32f8e
--- /dev/null
+++ b/deeplab/models/research/deeplab/core/dense_prediction_cell.py
@@ -0,0 +1,290 @@
+# Lint as: python2, python3
+# Copyright 2018 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+
+"""Dense Prediction Cell class that can be evolved in semantic segmentation.
+
+DensePredictionCell is used as a `layer` in semantic segmentation whose
+architecture is determined by the `config`, a dictionary specifying
+the architecture.
+"""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import tensorflow as tf
+from tensorflow.contrib import slim as contrib_slim
+
+from deeplab.core import utils
+
+slim = contrib_slim
+
+# Local constants.
+_META_ARCHITECTURE_SCOPE = 'meta_architecture'
+_CONCAT_PROJECTION_SCOPE = 'concat_projection'
+_OP = 'op'
+_CONV = 'conv'
+_PYRAMID_POOLING = 'pyramid_pooling'
+_KERNEL = 'kernel'
+_RATE = 'rate'
+_GRID_SIZE = 'grid_size'
+_TARGET_SIZE = 'target_size'
+_INPUT = 'input'
+
+
+def dense_prediction_cell_hparams():
+ """DensePredictionCell HParams.
+
+ Returns:
+ A dictionary of hyper-parameters used for dense prediction cell with keys:
+ - reduction_size: Integer, the number of output filters for each operation
+ inside the cell.
+ - dropout_on_concat_features: Boolean, apply dropout on the concatenated
+ features or not.
+ - dropout_on_projection_features: Boolean, apply dropout on the projection
+ features or not.
+ - dropout_keep_prob: Float, when `dropout_on_concat_features' or
+ `dropout_on_projection_features' is True, the `keep_prob` value used
+ in the dropout operation.
+ - concat_channels: Integer, the concatenated features will be
+ channel-reduced to `concat_channels` channels.
+ - conv_rate_multiplier: Integer, used to multiply the convolution rates.
+ This is useful in the case when the output_stride is changed from 16
+ to 8, we need to double the convolution rates correspondingly.
+ """
+ return {
+ 'reduction_size': 256,
+ 'dropout_on_concat_features': True,
+ 'dropout_on_projection_features': False,
+ 'dropout_keep_prob': 0.9,
+ 'concat_channels': 256,
+ 'conv_rate_multiplier': 1,
+ }
+
+
+class DensePredictionCell(object):
+ """DensePredictionCell class used as a 'layer' in semantic segmentation."""
+
+ def __init__(self, config, hparams=None):
+ """Initializes the dense prediction cell.
+
+ Args:
+ config: A dictionary storing the architecture of a dense prediction cell.
+ hparams: A dictionary of hyper-parameters, provided by users. This
+ dictionary will be used to update the default dictionary returned by
+ dense_prediction_cell_hparams().
+
+ Raises:
+ ValueError: If `conv_rate_multiplier` has value < 1.
+ """
+ self.hparams = dense_prediction_cell_hparams()
+ if hparams is not None:
+ self.hparams.update(hparams)
+ self.config = config
+
+ # Check values in hparams are valid or not.
+ if self.hparams['conv_rate_multiplier'] < 1:
+ raise ValueError('conv_rate_multiplier cannot have value < 1.')
+
+ def _get_pyramid_pooling_arguments(
+ self, crop_size, output_stride, image_grid, image_pooling_crop_size=None):
+ """Gets arguments for pyramid pooling.
+
+ Args:
+ crop_size: A list of two integers, [crop_height, crop_width] specifying
+ whole patch crop size.
+ output_stride: Integer, output stride value for extracted features.
+ image_grid: A list of two integers, [image_grid_height, image_grid_width],
+ specifying the grid size of how the pyramid pooling will be performed.
+ image_pooling_crop_size: A list of two integers, [crop_height, crop_width]
+ specifying the crop size for image pooling operations. Note that we
+ decouple whole patch crop_size and image_pooling_crop_size as one could
+ perform the image_pooling with different crop sizes.
+
+ Returns:
+ A list of (resize_value, pooled_kernel)
+ """
+ resize_height = utils.scale_dimension(crop_size[0], 1. / output_stride)
+ resize_width = utils.scale_dimension(crop_size[1], 1. / output_stride)
+ # If image_pooling_crop_size is not specified, use crop_size.
+ if image_pooling_crop_size is None:
+ image_pooling_crop_size = crop_size
+ pooled_height = utils.scale_dimension(
+ image_pooling_crop_size[0], 1. / (output_stride * image_grid[0]))
+ pooled_width = utils.scale_dimension(
+ image_pooling_crop_size[1], 1. / (output_stride * image_grid[1]))
+ return ([resize_height, resize_width], [pooled_height, pooled_width])
+
+ def _parse_operation(self, config, crop_size, output_stride,
+ image_pooling_crop_size=None):
+ """Parses one operation.
+
+ When 'operation' is 'pyramid_pooling', we compute the required
+ hyper-parameters and save in config.
+
+ Args:
+ config: A dictionary storing required hyper-parameters for one
+ operation.
+ crop_size: A list of two integers, [crop_height, crop_width] specifying
+ whole patch crop size.
+ output_stride: Integer, output stride value for extracted features.
+ image_pooling_crop_size: A list of two integers, [crop_height, crop_width]
+ specifying the crop size for image pooling operations. Note that we
+ decouple whole patch crop_size and image_pooling_crop_size as one could
+ perform the image_pooling with different crop sizes.
+
+ Returns:
+ A dictionary stores the related information for the operation.
+ """
+ if config[_OP] == _PYRAMID_POOLING:
+ (config[_TARGET_SIZE],
+ config[_KERNEL]) = self._get_pyramid_pooling_arguments(
+ crop_size=crop_size,
+ output_stride=output_stride,
+ image_grid=config[_GRID_SIZE],
+ image_pooling_crop_size=image_pooling_crop_size)
+
+ return config
+
+ def build_cell(self,
+ features,
+ output_stride=16,
+ crop_size=None,
+ image_pooling_crop_size=None,
+ weight_decay=0.00004,
+ reuse=None,
+ is_training=False,
+ fine_tune_batch_norm=False,
+ scope=None):
+ """Builds the dense prediction cell based on the config.
+
+ Args:
+ features: Input feature map of size [batch, height, width, channels].
+ output_stride: Int, output stride at which the features were extracted.
+ crop_size: A list [crop_height, crop_width], determining the input
+ features resolution.
+ image_pooling_crop_size: A list of two integers, [crop_height, crop_width]
+ specifying the crop size for image pooling operations. Note that we
+ decouple whole patch crop_size and image_pooling_crop_size as one could
+ perform the image_pooling with different crop sizes.
+ weight_decay: Float, the weight decay for model variables.
+ reuse: Reuse the model variables or not.
+ is_training: Boolean, is training or not.
+ fine_tune_batch_norm: Boolean, fine-tuning batch norm parameters or not.
+ scope: Optional string, specifying the variable scope.
+
+ Returns:
+ Features after passing through the constructed dense prediction cell with
+ shape = [batch, height, width, channels] where channels are determined
+ by `reduction_size` returned by dense_prediction_cell_hparams().
+
+ Raises:
+ ValueError: Use Convolution with kernel size not equal to 1x1 or 3x3 or
+ the operation is not recognized.
+ """
+ batch_norm_params = {
+ 'is_training': is_training and fine_tune_batch_norm,
+ 'decay': 0.9997,
+ 'epsilon': 1e-5,
+ 'scale': True,
+ }
+ hparams = self.hparams
+ with slim.arg_scope(
+ [slim.conv2d, slim.separable_conv2d],
+ weights_regularizer=slim.l2_regularizer(weight_decay),
+ activation_fn=tf.nn.relu,
+ normalizer_fn=slim.batch_norm,
+ padding='SAME',
+ stride=1,
+ reuse=reuse):
+ with slim.arg_scope([slim.batch_norm], **batch_norm_params):
+ with tf.variable_scope(scope, _META_ARCHITECTURE_SCOPE, [features]):
+ depth = hparams['reduction_size']
+ branch_logits = []
+ for i, current_config in enumerate(self.config):
+ scope = 'branch%d' % i
+ current_config = self._parse_operation(
+ config=current_config,
+ crop_size=crop_size,
+ output_stride=output_stride,
+ image_pooling_crop_size=image_pooling_crop_size)
+ tf.logging.info(current_config)
+ if current_config[_INPUT] < 0:
+ operation_input = features
+ else:
+ operation_input = branch_logits[current_config[_INPUT]]
+ if current_config[_OP] == _CONV:
+ if current_config[_KERNEL] == [1, 1] or current_config[
+ _KERNEL] == 1:
+ branch_logits.append(
+ slim.conv2d(operation_input, depth, 1, scope=scope))
+ else:
+ conv_rate = [r * hparams['conv_rate_multiplier']
+ for r in current_config[_RATE]]
+ branch_logits.append(
+ utils.split_separable_conv2d(
+ operation_input,
+ filters=depth,
+ kernel_size=current_config[_KERNEL],
+ rate=conv_rate,
+ weight_decay=weight_decay,
+ scope=scope))
+ elif current_config[_OP] == _PYRAMID_POOLING:
+ pooled_features = slim.avg_pool2d(
+ operation_input,
+ kernel_size=current_config[_KERNEL],
+ stride=[1, 1],
+ padding='VALID')
+ pooled_features = slim.conv2d(
+ pooled_features,
+ depth,
+ 1,
+ scope=scope)
+ pooled_features = tf.image.resize_bilinear(
+ pooled_features,
+ current_config[_TARGET_SIZE],
+ align_corners=True)
+ # Set shape for resize_height/resize_width if they are not Tensor.
+ resize_height = current_config[_TARGET_SIZE][0]
+ resize_width = current_config[_TARGET_SIZE][1]
+ if isinstance(resize_height, tf.Tensor):
+ resize_height = None
+ if isinstance(resize_width, tf.Tensor):
+ resize_width = None
+ pooled_features.set_shape(
+ [None, resize_height, resize_width, depth])
+ branch_logits.append(pooled_features)
+ else:
+ raise ValueError('Unrecognized operation.')
+ # Merge branch logits.
+ concat_logits = tf.concat(branch_logits, 3)
+ if self.hparams['dropout_on_concat_features']:
+ concat_logits = slim.dropout(
+ concat_logits,
+ keep_prob=self.hparams['dropout_keep_prob'],
+ is_training=is_training,
+ scope=_CONCAT_PROJECTION_SCOPE + '_dropout')
+ concat_logits = slim.conv2d(concat_logits,
+ self.hparams['concat_channels'],
+ 1,
+ scope=_CONCAT_PROJECTION_SCOPE)
+ if self.hparams['dropout_on_projection_features']:
+ concat_logits = slim.dropout(
+ concat_logits,
+ keep_prob=self.hparams['dropout_keep_prob'],
+ is_training=is_training,
+ scope=_CONCAT_PROJECTION_SCOPE + '_dropout')
+ return concat_logits
diff --git a/deeplab/models/research/deeplab/core/dense_prediction_cell_branch5_top1_cityscapes.json b/deeplab/models/research/deeplab/core/dense_prediction_cell_branch5_top1_cityscapes.json
new file mode 100644
index 0000000..12b093d
--- /dev/null
+++ b/deeplab/models/research/deeplab/core/dense_prediction_cell_branch5_top1_cityscapes.json
@@ -0,0 +1 @@
+[{"kernel": 3, "rate": [1, 6], "op": "conv", "input": -1}, {"kernel": 3, "rate": [18, 15], "op": "conv", "input": 0}, {"kernel": 3, "rate": [6, 3], "op": "conv", "input": 1}, {"kernel": 3, "rate": [1, 1], "op": "conv", "input": 0}, {"kernel": 3, "rate": [6, 21], "op": "conv", "input": 0}]
\ No newline at end of file
diff --git a/deeplab/models/research/deeplab/core/dense_prediction_cell_test.py b/deeplab/models/research/deeplab/core/dense_prediction_cell_test.py
new file mode 100644
index 0000000..1396a73
--- /dev/null
+++ b/deeplab/models/research/deeplab/core/dense_prediction_cell_test.py
@@ -0,0 +1,136 @@
+# Lint as: python2, python3
+# Copyright 2018 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+
+"""Tests for dense_prediction_cell."""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import tensorflow as tf
+
+from deeplab.core import dense_prediction_cell
+
+
+class DensePredictionCellTest(tf.test.TestCase):
+
+ def setUp(self):
+ self.segmentation_layer = dense_prediction_cell.DensePredictionCell(
+ config=[
+ {
+ dense_prediction_cell._INPUT: -1,
+ dense_prediction_cell._OP: dense_prediction_cell._CONV,
+ dense_prediction_cell._KERNEL: 1,
+ },
+ {
+ dense_prediction_cell._INPUT: 0,
+ dense_prediction_cell._OP: dense_prediction_cell._CONV,
+ dense_prediction_cell._KERNEL: 3,
+ dense_prediction_cell._RATE: [1, 3],
+ },
+ {
+ dense_prediction_cell._INPUT: 1,
+ dense_prediction_cell._OP: (
+ dense_prediction_cell._PYRAMID_POOLING),
+ dense_prediction_cell._GRID_SIZE: [1, 2],
+ },
+ ],
+ hparams={'conv_rate_multiplier': 2})
+
+ def testPyramidPoolingArguments(self):
+ features_size, pooled_kernel = (
+ self.segmentation_layer._get_pyramid_pooling_arguments(
+ crop_size=[513, 513],
+ output_stride=16,
+ image_grid=[4, 4]))
+ self.assertListEqual(features_size, [33, 33])
+ self.assertListEqual(pooled_kernel, [9, 9])
+
+ def testPyramidPoolingArgumentsWithImageGrid1x1(self):
+ features_size, pooled_kernel = (
+ self.segmentation_layer._get_pyramid_pooling_arguments(
+ crop_size=[257, 257],
+ output_stride=16,
+ image_grid=[1, 1]))
+ self.assertListEqual(features_size, [17, 17])
+ self.assertListEqual(pooled_kernel, [17, 17])
+
+ def testParseOperationStringWithConv1x1(self):
+ operation = self.segmentation_layer._parse_operation(
+ config={
+ dense_prediction_cell._OP: dense_prediction_cell._CONV,
+ dense_prediction_cell._KERNEL: [1, 1],
+ },
+ crop_size=[513, 513], output_stride=16)
+ self.assertEqual(operation[dense_prediction_cell._OP],
+ dense_prediction_cell._CONV)
+ self.assertListEqual(operation[dense_prediction_cell._KERNEL], [1, 1])
+
+ def testParseOperationStringWithConv3x3(self):
+ operation = self.segmentation_layer._parse_operation(
+ config={
+ dense_prediction_cell._OP: dense_prediction_cell._CONV,
+ dense_prediction_cell._KERNEL: [3, 3],
+ dense_prediction_cell._RATE: [9, 6],
+ },
+ crop_size=[513, 513], output_stride=16)
+ self.assertEqual(operation[dense_prediction_cell._OP],
+ dense_prediction_cell._CONV)
+ self.assertListEqual(operation[dense_prediction_cell._KERNEL], [3, 3])
+ self.assertEqual(operation[dense_prediction_cell._RATE], [9, 6])
+
+ def testParseOperationStringWithPyramidPooling2x2(self):
+ operation = self.segmentation_layer._parse_operation(
+ config={
+ dense_prediction_cell._OP: dense_prediction_cell._PYRAMID_POOLING,
+ dense_prediction_cell._GRID_SIZE: [2, 2],
+ },
+ crop_size=[513, 513],
+ output_stride=16)
+ self.assertEqual(operation[dense_prediction_cell._OP],
+ dense_prediction_cell._PYRAMID_POOLING)
+ # The feature maps of size [33, 33] should be covered by 2x2 kernels with
+ # size [17, 17].
+ self.assertListEqual(
+ operation[dense_prediction_cell._TARGET_SIZE], [33, 33])
+ self.assertListEqual(operation[dense_prediction_cell._KERNEL], [17, 17])
+
+ def testBuildCell(self):
+ with self.test_session(graph=tf.Graph()) as sess:
+ features = tf.random_normal([2, 33, 33, 5])
+ concat_logits = self.segmentation_layer.build_cell(
+ features,
+ output_stride=8,
+ crop_size=[257, 257])
+ sess.run(tf.global_variables_initializer())
+ concat_logits = sess.run(concat_logits)
+ self.assertTrue(concat_logits.any())
+
+ def testBuildCellWithImagePoolingCropSize(self):
+ with self.test_session(graph=tf.Graph()) as sess:
+ features = tf.random_normal([2, 33, 33, 5])
+ concat_logits = self.segmentation_layer.build_cell(
+ features,
+ output_stride=8,
+ crop_size=[257, 257],
+ image_pooling_crop_size=[129, 129])
+ sess.run(tf.global_variables_initializer())
+ concat_logits = sess.run(concat_logits)
+ self.assertTrue(concat_logits.any())
+
+
+if __name__ == '__main__':
+ tf.test.main()
diff --git a/deeplab/models/research/deeplab/core/feature_extractor.py b/deeplab/models/research/deeplab/core/feature_extractor.py
new file mode 100644
index 0000000..553bd9b
--- /dev/null
+++ b/deeplab/models/research/deeplab/core/feature_extractor.py
@@ -0,0 +1,711 @@
+# Lint as: python2, python3
+# Copyright 2018 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+
+"""Extracts features for different models."""
+import copy
+import functools
+
+import tensorflow.compat.v1 as tf
+from tensorflow.contrib import slim as contrib_slim
+
+from deeplab.core import nas_network
+from deeplab.core import resnet_v1_beta
+from deeplab.core import xception
+from nets.mobilenet import conv_blocks
+from nets.mobilenet import mobilenet
+from nets.mobilenet import mobilenet_v2
+from nets.mobilenet import mobilenet_v3
+
+slim = contrib_slim
+
+# Default end point for MobileNetv2 (one-based indexing).
+_MOBILENET_V2_FINAL_ENDPOINT = 'layer_18'
+# Default end point for MobileNetv3.
+_MOBILENET_V3_LARGE_FINAL_ENDPOINT = 'layer_17'
+_MOBILENET_V3_SMALL_FINAL_ENDPOINT = 'layer_13'
+# Default end point for EdgeTPU Mobilenet.
+_MOBILENET_EDGETPU = 'layer_24'
+
+
+def _mobilenet_v2(net,
+ depth_multiplier,
+ output_stride,
+ conv_defs=None,
+ divisible_by=None,
+ reuse=None,
+ scope=None,
+ final_endpoint=None):
+ """Auxiliary function to add support for 'reuse' to mobilenet_v2.
+
+ Args:
+ net: Input tensor of shape [batch_size, height, width, channels].
+ depth_multiplier: Float multiplier for the depth (number of channels)
+ for all convolution ops. The value must be greater than zero. Typical
+ usage will be to set this value in (0, 1) to reduce the number of
+ parameters or computation cost of the model.
+ output_stride: An integer that specifies the requested ratio of input to
+ output spatial resolution. If not None, then we invoke atrous convolution
+ if necessary to prevent the network from reducing the spatial resolution
+ of the activation maps. Allowed values are 8 (accurate fully convolutional
+ mode), 16 (fast fully convolutional mode), 32 (classification mode).
+ conv_defs: MobileNet con def.
+ divisible_by: None (use default setting) or an integer that ensures all
+ layers # channels will be divisible by this number. Used in MobileNet.
+ reuse: Reuse model variables.
+ scope: Optional variable scope.
+ final_endpoint: The endpoint to construct the network up to.
+
+ Returns:
+ Features extracted by MobileNetv2.
+ """
+ if divisible_by is None:
+ divisible_by = 8 if depth_multiplier == 1.0 else 1
+ if conv_defs is None:
+ conv_defs = mobilenet_v2.V2_DEF
+ with tf.variable_scope(
+ scope, 'MobilenetV2', [net], reuse=reuse) as scope:
+ return mobilenet_v2.mobilenet_base(
+ net,
+ conv_defs=conv_defs,
+ depth_multiplier=depth_multiplier,
+ min_depth=8 if depth_multiplier == 1.0 else 1,
+ divisible_by=divisible_by,
+ final_endpoint=final_endpoint or _MOBILENET_V2_FINAL_ENDPOINT,
+ output_stride=output_stride,
+ scope=scope)
+
+
+def _mobilenet_v3(net,
+ depth_multiplier,
+ output_stride,
+ conv_defs=None,
+ divisible_by=None,
+ reuse=None,
+ scope=None,
+ final_endpoint=None):
+ """Auxiliary function to build mobilenet v3.
+
+ Args:
+ net: Input tensor of shape [batch_size, height, width, channels].
+ depth_multiplier: Float multiplier for the depth (number of channels)
+ for all convolution ops. The value must be greater than zero. Typical
+ usage will be to set this value in (0, 1) to reduce the number of
+ parameters or computation cost of the model.
+ output_stride: An integer that specifies the requested ratio of input to
+ output spatial resolution. If not None, then we invoke atrous convolution
+ if necessary to prevent the network from reducing the spatial resolution
+ of the activation maps. Allowed values are 8 (accurate fully convolutional
+ mode), 16 (fast fully convolutional mode), 32 (classification mode).
+ conv_defs: A list of ConvDef namedtuples specifying the net architecture.
+ divisible_by: None (use default setting) or an integer that ensures all
+ layers # channels will be divisible by this number. Used in MobileNet.
+ reuse: Reuse model variables.
+ scope: Optional variable scope.
+ final_endpoint: The endpoint to construct the network up to.
+
+ Returns:
+ net: The output tensor.
+ end_points: A set of activations for external use.
+
+ Raises:
+ ValueError: If conv_defs or final_endpoint is not specified.
+ """
+ del divisible_by
+ with tf.variable_scope(
+ scope, 'MobilenetV3', [net], reuse=reuse) as scope:
+ if conv_defs is None:
+ raise ValueError('conv_defs must be specified for mobilenet v3.')
+ if final_endpoint is None:
+ raise ValueError('Final endpoint must be specified for mobilenet v3.')
+ net, end_points = mobilenet_v3.mobilenet_base(
+ net,
+ depth_multiplier=depth_multiplier,
+ conv_defs=conv_defs,
+ output_stride=output_stride,
+ final_endpoint=final_endpoint,
+ scope=scope)
+
+ return net, end_points
+
+
+def mobilenet_v3_large_seg(net,
+ depth_multiplier,
+ output_stride,
+ divisible_by=None,
+ reuse=None,
+ scope=None,
+ final_endpoint=None):
+ """Final mobilenet v3 large model for segmentation task."""
+ del divisible_by
+ del final_endpoint
+ conv_defs = copy.deepcopy(mobilenet_v3.V3_LARGE)
+
+ # Reduce the filters by a factor of 2 in the last block.
+ for layer, expansion in [(13, 336), (14, 480), (15, 480), (16, None)]:
+ conv_defs['spec'][layer].params['num_outputs'] /= 2
+ # Update expansion size
+ if expansion is not None:
+ factor = expansion / conv_defs['spec'][layer - 1].params['num_outputs']
+ conv_defs['spec'][layer].params[
+ 'expansion_size'] = mobilenet_v3.expand_input(factor)
+
+ return _mobilenet_v3(
+ net,
+ depth_multiplier=depth_multiplier,
+ output_stride=output_stride,
+ divisible_by=8,
+ conv_defs=conv_defs,
+ reuse=reuse,
+ scope=scope,
+ final_endpoint=_MOBILENET_V3_LARGE_FINAL_ENDPOINT)
+
+
+def mobilenet_edgetpu(net,
+ depth_multiplier,
+ output_stride,
+ divisible_by=None,
+ reuse=None,
+ scope=None,
+ final_endpoint=None):
+ """EdgeTPU version of mobilenet model for segmentation task."""
+ del divisible_by
+ del final_endpoint
+ conv_defs = copy.deepcopy(mobilenet_v3.V3_EDGETPU)
+
+ return _mobilenet_v3(
+ net,
+ depth_multiplier=depth_multiplier,
+ output_stride=output_stride,
+ divisible_by=8,
+ conv_defs=conv_defs,
+ reuse=reuse,
+ scope=scope, # the scope is 'MobilenetEdgeTPU'
+ final_endpoint=_MOBILENET_EDGETPU)
+
+
+def mobilenet_v3_small_seg(net,
+ depth_multiplier,
+ output_stride,
+ divisible_by=None,
+ reuse=None,
+ scope=None,
+ final_endpoint=None):
+ """Final mobilenet v3 small model for segmentation task."""
+ del divisible_by
+ del final_endpoint
+ conv_defs = copy.deepcopy(mobilenet_v3.V3_SMALL)
+
+ # Reduce the filters by a factor of 2 in the last block.
+ for layer, expansion in [(9, 144), (10, 288), (11, 288), (12, None)]:
+ conv_defs['spec'][layer].params['num_outputs'] /= 2
+ # Update expansion size
+ if expansion is not None:
+ factor = expansion / conv_defs['spec'][layer - 1].params['num_outputs']
+ conv_defs['spec'][layer].params[
+ 'expansion_size'] = mobilenet_v3.expand_input(factor)
+
+ return _mobilenet_v3(
+ net,
+ depth_multiplier=depth_multiplier,
+ output_stride=output_stride,
+ divisible_by=8,
+ conv_defs=conv_defs,
+ reuse=reuse,
+ scope=scope,
+ final_endpoint=_MOBILENET_V3_SMALL_FINAL_ENDPOINT)
+
+
+# A map from network name to network function.
+networks_map = {
+ 'mobilenet_v2': _mobilenet_v2,
+ 'mobilenet_edgetpu': mobilenet_edgetpu,
+ 'mobilenet_v3_large_seg': mobilenet_v3_large_seg,
+ 'mobilenet_v3_small_seg': mobilenet_v3_small_seg,
+ 'resnet_v1_18': resnet_v1_beta.resnet_v1_18,
+ 'resnet_v1_18_beta': resnet_v1_beta.resnet_v1_18_beta,
+ 'resnet_v1_50': resnet_v1_beta.resnet_v1_50,
+ 'resnet_v1_50_beta': resnet_v1_beta.resnet_v1_50_beta,
+ 'resnet_v1_101': resnet_v1_beta.resnet_v1_101,
+ 'resnet_v1_101_beta': resnet_v1_beta.resnet_v1_101_beta,
+ 'xception_41': xception.xception_41,
+ 'xception_65': xception.xception_65,
+ 'xception_71': xception.xception_71,
+ 'nas_pnasnet': nas_network.pnasnet,
+ 'nas_hnasnet': nas_network.hnasnet,
+}
+
+
+def mobilenet_v2_arg_scope(is_training=True,
+ weight_decay=0.00004,
+ stddev=0.09,
+ activation=tf.nn.relu6,
+ bn_decay=0.997,
+ bn_epsilon=None,
+ bn_renorm=None):
+ """Defines the default MobilenetV2 arg scope.
+
+ Args:
+ is_training: Whether or not we're training the model. If this is set to None
+ is_training parameter in batch_norm is not set. Please note that this also
+ sets the is_training parameter in dropout to None.
+ weight_decay: The weight decay to use for regularizing the model.
+ stddev: Standard deviation for initialization, if negative uses xavier.
+ activation: If True, a modified activation is used (initialized ~ReLU6).
+ bn_decay: decay for the batch norm moving averages.
+ bn_epsilon: batch normalization epsilon.
+ bn_renorm: whether to use batchnorm renormalization
+
+ Returns:
+ An `arg_scope` to use for the mobilenet v1 model.
+ """
+ batch_norm_params = {
+ 'center': True,
+ 'scale': True,
+ 'decay': bn_decay,
+ }
+ if bn_epsilon is not None:
+ batch_norm_params['epsilon'] = bn_epsilon
+ if is_training is not None:
+ batch_norm_params['is_training'] = is_training
+ if bn_renorm is not None:
+ batch_norm_params['renorm'] = bn_renorm
+ dropout_params = {}
+ if is_training is not None:
+ dropout_params['is_training'] = is_training
+
+ instance_norm_params = {
+ 'center': True,
+ 'scale': True,
+ 'epsilon': 0.001,
+ }
+
+ if stddev < 0:
+ weight_intitializer = slim.initializers.xavier_initializer()
+ else:
+ weight_intitializer = tf.truncated_normal_initializer(stddev=stddev)
+
+ # Set weight_decay for weights in Conv and FC layers.
+ with slim.arg_scope(
+ [slim.conv2d, slim.fully_connected, slim.separable_conv2d],
+ weights_initializer=weight_intitializer,
+ activation_fn=activation,
+ normalizer_fn=slim.batch_norm), \
+ slim.arg_scope(
+ [conv_blocks.expanded_conv], normalizer_fn=slim.batch_norm), \
+ slim.arg_scope([mobilenet.apply_activation], activation_fn=activation),\
+ slim.arg_scope([slim.batch_norm], **batch_norm_params), \
+ slim.arg_scope([mobilenet.mobilenet_base, mobilenet.mobilenet],
+ is_training=is_training),\
+ slim.arg_scope([slim.dropout], **dropout_params), \
+ slim.arg_scope([slim.instance_norm], **instance_norm_params), \
+ slim.arg_scope([slim.conv2d], \
+ weights_regularizer=slim.l2_regularizer(weight_decay)), \
+ slim.arg_scope([slim.separable_conv2d], weights_regularizer=None), \
+ slim.arg_scope([slim.conv2d, slim.separable_conv2d], padding='SAME') as s:
+ return s
+
+
+# A map from network name to network arg scope.
+arg_scopes_map = {
+ 'mobilenet_v2': mobilenet_v2.training_scope,
+ 'mobilenet_edgetpu': mobilenet_v2_arg_scope,
+ 'mobilenet_v3_large_seg': mobilenet_v2_arg_scope,
+ 'mobilenet_v3_small_seg': mobilenet_v2_arg_scope,
+ 'resnet_v1_18': resnet_v1_beta.resnet_arg_scope,
+ 'resnet_v1_18_beta': resnet_v1_beta.resnet_arg_scope,
+ 'resnet_v1_50': resnet_v1_beta.resnet_arg_scope,
+ 'resnet_v1_50_beta': resnet_v1_beta.resnet_arg_scope,
+ 'resnet_v1_101': resnet_v1_beta.resnet_arg_scope,
+ 'resnet_v1_101_beta': resnet_v1_beta.resnet_arg_scope,
+ 'xception_41': xception.xception_arg_scope,
+ 'xception_65': xception.xception_arg_scope,
+ 'xception_71': xception.xception_arg_scope,
+ 'nas_pnasnet': nas_network.nas_arg_scope,
+ 'nas_hnasnet': nas_network.nas_arg_scope,
+}
+
+# Names for end point features.
+DECODER_END_POINTS = 'decoder_end_points'
+
+# A dictionary from network name to a map of end point features.
+networks_to_feature_maps = {
+ 'mobilenet_v2': {
+ DECODER_END_POINTS: {
+ 4: ['layer_4/depthwise_output'],
+ 8: ['layer_7/depthwise_output'],
+ 16: ['layer_14/depthwise_output'],
+ },
+ },
+ 'mobilenet_v3_large_seg': {
+ DECODER_END_POINTS: {
+ 4: ['layer_4/depthwise_output'],
+ 8: ['layer_7/depthwise_output'],
+ 16: ['layer_13/depthwise_output'],
+ },
+ },
+ 'mobilenet_v3_small_seg': {
+ DECODER_END_POINTS: {
+ 4: ['layer_2/depthwise_output'],
+ 8: ['layer_4/depthwise_output'],
+ 16: ['layer_9/depthwise_output'],
+ },
+ },
+ 'resnet_v1_18': {
+ DECODER_END_POINTS: {
+ 4: ['block1/unit_1/lite_bottleneck_v1/conv2'],
+ 8: ['block2/unit_1/lite_bottleneck_v1/conv2'],
+ 16: ['block3/unit_1/lite_bottleneck_v1/conv2'],
+ },
+ },
+ 'resnet_v1_18_beta': {
+ DECODER_END_POINTS: {
+ 4: ['block1/unit_1/lite_bottleneck_v1/conv2'],
+ 8: ['block2/unit_1/lite_bottleneck_v1/conv2'],
+ 16: ['block3/unit_1/lite_bottleneck_v1/conv2'],
+ },
+ },
+ 'resnet_v1_50': {
+ DECODER_END_POINTS: {
+ 4: ['block1/unit_2/bottleneck_v1/conv3'],
+ 8: ['block2/unit_3/bottleneck_v1/conv3'],
+ 16: ['block3/unit_5/bottleneck_v1/conv3'],
+ },
+ },
+ 'resnet_v1_50_beta': {
+ DECODER_END_POINTS: {
+ 4: ['block1/unit_2/bottleneck_v1/conv3'],
+ 8: ['block2/unit_3/bottleneck_v1/conv3'],
+ 16: ['block3/unit_5/bottleneck_v1/conv3'],
+ },
+ },
+ 'resnet_v1_101': {
+ DECODER_END_POINTS: {
+ 4: ['block1/unit_2/bottleneck_v1/conv3'],
+ 8: ['block2/unit_3/bottleneck_v1/conv3'],
+ 16: ['block3/unit_22/bottleneck_v1/conv3'],
+ },
+ },
+ 'resnet_v1_101_beta': {
+ DECODER_END_POINTS: {
+ 4: ['block1/unit_2/bottleneck_v1/conv3'],
+ 8: ['block2/unit_3/bottleneck_v1/conv3'],
+ 16: ['block3/unit_22/bottleneck_v1/conv3'],
+ },
+ },
+ 'xception_41': {
+ DECODER_END_POINTS: {
+ 4: ['entry_flow/block2/unit_1/xception_module/'
+ 'separable_conv2_pointwise'],
+ 8: ['entry_flow/block3/unit_1/xception_module/'
+ 'separable_conv2_pointwise'],
+ 16: ['exit_flow/block1/unit_1/xception_module/'
+ 'separable_conv2_pointwise'],
+ },
+ },
+ 'xception_65': {
+ DECODER_END_POINTS: {
+ 4: ['entry_flow/block2/unit_1/xception_module/'
+ 'separable_conv2_pointwise'],
+ 8: ['entry_flow/block3/unit_1/xception_module/'
+ 'separable_conv2_pointwise'],
+ 16: ['exit_flow/block1/unit_1/xception_module/'
+ 'separable_conv2_pointwise'],
+ },
+ },
+ 'xception_71': {
+ DECODER_END_POINTS: {
+ 4: ['entry_flow/block3/unit_1/xception_module/'
+ 'separable_conv2_pointwise'],
+ 8: ['entry_flow/block5/unit_1/xception_module/'
+ 'separable_conv2_pointwise'],
+ 16: ['exit_flow/block1/unit_1/xception_module/'
+ 'separable_conv2_pointwise'],
+ },
+ },
+ 'nas_pnasnet': {
+ DECODER_END_POINTS: {
+ 4: ['Stem'],
+ 8: ['Cell_3'],
+ 16: ['Cell_7'],
+ },
+ },
+ 'nas_hnasnet': {
+ DECODER_END_POINTS: {
+ 4: ['Cell_2'],
+ 8: ['Cell_5'],
+ 16: ['Cell_7'],
+ },
+ },
+}
+
+# A map from feature extractor name to the network name scope used in the
+# ImageNet pretrained versions of these models.
+name_scope = {
+ 'mobilenet_v2': 'MobilenetV2',
+ 'mobilenet_edgetpu': 'MobilenetEdgeTPU',
+ 'mobilenet_v3_large_seg': 'MobilenetV3',
+ 'mobilenet_v3_small_seg': 'MobilenetV3',
+ 'resnet_v1_18': 'resnet_v1_18',
+ 'resnet_v1_18_beta': 'resnet_v1_18',
+ 'resnet_v1_50': 'resnet_v1_50',
+ 'resnet_v1_50_beta': 'resnet_v1_50',
+ 'resnet_v1_101': 'resnet_v1_101',
+ 'resnet_v1_101_beta': 'resnet_v1_101',
+ 'xception_41': 'xception_41',
+ 'xception_65': 'xception_65',
+ 'xception_71': 'xception_71',
+ 'nas_pnasnet': 'pnasnet',
+ 'nas_hnasnet': 'hnasnet',
+}
+
+# Mean pixel value.
+_MEAN_RGB = [123.15, 115.90, 103.06]
+
+
+def _preprocess_subtract_imagenet_mean(inputs, dtype=tf.float32):
+ """Subtract Imagenet mean RGB value."""
+ mean_rgb = tf.reshape(_MEAN_RGB, [1, 1, 1, 3])
+ num_channels = tf.shape(inputs)[-1]
+ # We set mean pixel as 0 for the non-RGB channels.
+ mean_rgb_extended = tf.concat(
+ [mean_rgb, tf.zeros([1, 1, 1, num_channels - 3])], axis=3)
+ return tf.cast(inputs - mean_rgb_extended, dtype=dtype)
+
+
+def _preprocess_zero_mean_unit_range(inputs, dtype=tf.float32):
+ """Map image values from [0, 255] to [-1, 1]."""
+ preprocessed_inputs = (2.0 / 255.0) * tf.to_float(inputs) - 1.0
+ return tf.cast(preprocessed_inputs, dtype=dtype)
+
+
+_PREPROCESS_FN = {
+ 'mobilenet_v2': _preprocess_zero_mean_unit_range,
+ 'mobilenet_edgetpu': _preprocess_zero_mean_unit_range,
+ 'mobilenet_v3_large_seg': _preprocess_zero_mean_unit_range,
+ 'mobilenet_v3_small_seg': _preprocess_zero_mean_unit_range,
+ 'resnet_v1_18': _preprocess_subtract_imagenet_mean,
+ 'resnet_v1_18_beta': _preprocess_zero_mean_unit_range,
+ 'resnet_v1_50': _preprocess_subtract_imagenet_mean,
+ 'resnet_v1_50_beta': _preprocess_zero_mean_unit_range,
+ 'resnet_v1_101': _preprocess_subtract_imagenet_mean,
+ 'resnet_v1_101_beta': _preprocess_zero_mean_unit_range,
+ 'xception_41': _preprocess_zero_mean_unit_range,
+ 'xception_65': _preprocess_zero_mean_unit_range,
+ 'xception_71': _preprocess_zero_mean_unit_range,
+ 'nas_pnasnet': _preprocess_zero_mean_unit_range,
+ 'nas_hnasnet': _preprocess_zero_mean_unit_range,
+}
+
+
+def mean_pixel(model_variant=None):
+ """Gets mean pixel value.
+
+ This function returns different mean pixel value, depending on the input
+ model_variant which adopts different preprocessing functions. We currently
+ handle the following preprocessing functions:
+ (1) _preprocess_subtract_imagenet_mean. We simply return mean pixel value.
+ (2) _preprocess_zero_mean_unit_range. We return [127.5, 127.5, 127.5].
+ The return values are used in a way that the padded regions after
+ pre-processing will contain value 0.
+
+ Args:
+ model_variant: Model variant (string) for feature extraction. For
+ backwards compatibility, model_variant=None returns _MEAN_RGB.
+
+ Returns:
+ Mean pixel value.
+ """
+ if model_variant in ['resnet_v1_50',
+ 'resnet_v1_101'] or model_variant is None:
+ return _MEAN_RGB
+ else:
+ return [127.5, 127.5, 127.5]
+
+
+def extract_features(images,
+ output_stride=8,
+ multi_grid=None,
+ depth_multiplier=1.0,
+ divisible_by=None,
+ final_endpoint=None,
+ model_variant=None,
+ weight_decay=0.0001,
+ reuse=None,
+ is_training=False,
+ fine_tune_batch_norm=False,
+ regularize_depthwise=False,
+ preprocess_images=True,
+ preprocessed_images_dtype=tf.float32,
+ num_classes=None,
+ global_pool=False,
+ nas_architecture_options=None,
+ nas_training_hyper_parameters=None,
+ use_bounded_activation=False):
+ """Extracts features by the particular model_variant.
+
+ Args:
+ images: A tensor of size [batch, height, width, channels].
+ output_stride: The ratio of input to output spatial resolution.
+ multi_grid: Employ a hierarchy of different atrous rates within network.
+ depth_multiplier: Float multiplier for the depth (number of channels)
+ for all convolution ops used in MobileNet.
+ divisible_by: None (use default setting) or an integer that ensures all
+ layers # channels will be divisible by this number. Used in MobileNet.
+ final_endpoint: The MobileNet endpoint to construct the network up to.
+ model_variant: Model variant for feature extraction.
+ weight_decay: The weight decay for model variables.
+ reuse: Reuse the model variables or not.
+ is_training: Is training or not.
+ fine_tune_batch_norm: Fine-tune the batch norm parameters or not.
+ regularize_depthwise: Whether or not apply L2-norm regularization on the
+ depthwise convolution weights.
+ preprocess_images: Performs preprocessing on images or not. Defaults to
+ True. Set to False if preprocessing will be done by other functions. We
+ supprot two types of preprocessing: (1) Mean pixel substraction and (2)
+ Pixel values normalization to be [-1, 1].
+ preprocessed_images_dtype: The type after the preprocessing function.
+ num_classes: Number of classes for image classification task. Defaults
+ to None for dense prediction tasks.
+ global_pool: Global pooling for image classification task. Defaults to
+ False, since dense prediction tasks do not use this.
+ nas_architecture_options: A dictionary storing NAS architecture options.
+ It is either None or its kerys are:
+ - `nas_stem_output_num_conv_filters`: Number of filters of the NAS stem
+ output tensor.
+ - `nas_use_classification_head`: Boolean, use image classification head.
+ nas_training_hyper_parameters: A dictionary storing hyper-parameters for
+ training nas models. It is either None or its keys are:
+ - `drop_path_keep_prob`: Probability to keep each path in the cell when
+ training.
+ - `total_training_steps`: Total training steps to help drop path
+ probability calculation.
+ use_bounded_activation: Whether or not to use bounded activations. Bounded
+ activations better lend themselves to quantized inference. Currently,
+ bounded activation is only used in xception model.
+
+ Returns:
+ features: A tensor of size [batch, feature_height, feature_width,
+ feature_channels], where feature_height/feature_width are determined
+ by the images height/width and output_stride.
+ end_points: A dictionary from components of the network to the corresponding
+ activation.
+
+ Raises:
+ ValueError: Unrecognized model variant.
+ """
+ if 'resnet' in model_variant:
+ arg_scope = arg_scopes_map[model_variant](
+ weight_decay=weight_decay,
+ batch_norm_decay=0.95,
+ batch_norm_epsilon=1e-5,
+ batch_norm_scale=True)
+ features, end_points = get_network(
+ model_variant, preprocess_images, preprocessed_images_dtype, arg_scope)(
+ inputs=images,
+ num_classes=num_classes,
+ is_training=(is_training and fine_tune_batch_norm),
+ global_pool=global_pool,
+ output_stride=output_stride,
+ multi_grid=multi_grid,
+ reuse=reuse,
+ scope=name_scope[model_variant])
+ elif 'xception' in model_variant:
+ arg_scope = arg_scopes_map[model_variant](
+ weight_decay=weight_decay,
+ batch_norm_decay=0.9997,
+ batch_norm_epsilon=1e-3,
+ batch_norm_scale=True,
+ regularize_depthwise=regularize_depthwise,
+ use_bounded_activation=use_bounded_activation)
+ features, end_points = get_network(
+ model_variant, preprocess_images, preprocessed_images_dtype, arg_scope)(
+ inputs=images,
+ num_classes=num_classes,
+ is_training=(is_training and fine_tune_batch_norm),
+ global_pool=global_pool,
+ output_stride=output_stride,
+ regularize_depthwise=regularize_depthwise,
+ multi_grid=multi_grid,
+ reuse=reuse,
+ scope=name_scope[model_variant])
+ elif 'mobilenet' in model_variant or model_variant.startswith('mnas'):
+ arg_scope = arg_scopes_map[model_variant](
+ is_training=(is_training and fine_tune_batch_norm),
+ weight_decay=weight_decay)
+ features, end_points = get_network(
+ model_variant, preprocess_images, preprocessed_images_dtype, arg_scope)(
+ inputs=images,
+ depth_multiplier=depth_multiplier,
+ divisible_by=divisible_by,
+ output_stride=output_stride,
+ reuse=reuse,
+ scope=name_scope[model_variant],
+ final_endpoint=final_endpoint)
+ elif model_variant.startswith('nas'):
+ arg_scope = arg_scopes_map[model_variant](
+ weight_decay=weight_decay,
+ batch_norm_decay=0.9997,
+ batch_norm_epsilon=1e-3)
+ features, end_points = get_network(
+ model_variant, preprocess_images, preprocessed_images_dtype, arg_scope)(
+ inputs=images,
+ num_classes=num_classes,
+ is_training=(is_training and fine_tune_batch_norm),
+ global_pool=global_pool,
+ output_stride=output_stride,
+ nas_architecture_options=nas_architecture_options,
+ nas_training_hyper_parameters=nas_training_hyper_parameters,
+ reuse=reuse,
+ scope=name_scope[model_variant])
+ else:
+ raise ValueError('Unknown model variant %s.' % model_variant)
+
+ return features, end_points
+
+
+def get_network(network_name, preprocess_images,
+ preprocessed_images_dtype=tf.float32, arg_scope=None):
+ """Gets the network.
+
+ Args:
+ network_name: Network name.
+ preprocess_images: Preprocesses the images or not.
+ preprocessed_images_dtype: The type after the preprocessing function.
+ arg_scope: Optional, arg_scope to build the network. If not provided the
+ default arg_scope of the network would be used.
+
+ Returns:
+ A network function that is used to extract features.
+
+ Raises:
+ ValueError: network is not supported.
+ """
+ if network_name not in networks_map:
+ raise ValueError('Unsupported network %s.' % network_name)
+ arg_scope = arg_scope or arg_scopes_map[network_name]()
+ def _identity_function(inputs, dtype=preprocessed_images_dtype):
+ return tf.cast(inputs, dtype=dtype)
+ if preprocess_images:
+ preprocess_function = _PREPROCESS_FN[network_name]
+ else:
+ preprocess_function = _identity_function
+ func = networks_map[network_name]
+ @functools.wraps(func)
+ def network_fn(inputs, *args, **kwargs):
+ with slim.arg_scope(arg_scope):
+ return func(preprocess_function(inputs, preprocessed_images_dtype),
+ *args, **kwargs)
+ return network_fn
diff --git a/deeplab/models/research/deeplab/core/nas_cell.py b/deeplab/models/research/deeplab/core/nas_cell.py
new file mode 100644
index 0000000..d179082
--- /dev/null
+++ b/deeplab/models/research/deeplab/core/nas_cell.py
@@ -0,0 +1,221 @@
+# Lint as: python2, python3
+# Copyright 2018 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+
+"""Cell structure used by NAS."""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import functools
+from six.moves import range
+from six.moves import zip
+import tensorflow as tf
+from tensorflow.contrib import framework as contrib_framework
+from tensorflow.contrib import slim as contrib_slim
+from deeplab.core import xception as xception_utils
+from deeplab.core.utils import resize_bilinear
+from deeplab.core.utils import scale_dimension
+from tensorflow.contrib.slim.nets import resnet_utils
+
+arg_scope = contrib_framework.arg_scope
+slim = contrib_slim
+
+separable_conv2d_same = functools.partial(xception_utils.separable_conv2d_same,
+ regularize_depthwise=True)
+
+
+class NASBaseCell(object):
+ """NASNet Cell class that is used as a 'layer' in image architectures."""
+
+ def __init__(self, num_conv_filters, operations, used_hiddenstates,
+ hiddenstate_indices, drop_path_keep_prob, total_num_cells,
+ total_training_steps, batch_norm_fn=slim.batch_norm):
+ """Init function.
+
+ For more details about NAS cell, see
+ https://arxiv.org/abs/1707.07012 and https://arxiv.org/abs/1712.00559.
+
+ Args:
+ num_conv_filters: The number of filters for each convolution operation.
+ operations: List of operations that are performed in the NASNet Cell in
+ order.
+ used_hiddenstates: Binary array that signals if the hiddenstate was used
+ within the cell. This is used to determine what outputs of the cell
+ should be concatenated together.
+ hiddenstate_indices: Determines what hiddenstates should be combined
+ together with the specified operations to create the NASNet cell.
+ drop_path_keep_prob: Float, drop path keep probability.
+ total_num_cells: Integer, total number of cells.
+ total_training_steps: Integer, total training steps.
+ batch_norm_fn: Function, batch norm function. Defaults to
+ slim.batch_norm.
+ """
+ if len(hiddenstate_indices) != len(operations):
+ raise ValueError(
+ 'Number of hiddenstate_indices and operations should be the same.')
+ if len(operations) % 2:
+ raise ValueError('Number of operations should be even.')
+ self._num_conv_filters = num_conv_filters
+ self._operations = operations
+ self._used_hiddenstates = used_hiddenstates
+ self._hiddenstate_indices = hiddenstate_indices
+ self._drop_path_keep_prob = drop_path_keep_prob
+ self._total_num_cells = total_num_cells
+ self._total_training_steps = total_training_steps
+ self._batch_norm_fn = batch_norm_fn
+
+ def __call__(self, net, scope, filter_scaling, stride, prev_layer, cell_num):
+ """Runs the conv cell."""
+ self._cell_num = cell_num
+ self._filter_scaling = filter_scaling
+ self._filter_size = int(self._num_conv_filters * filter_scaling)
+
+ with tf.variable_scope(scope):
+ net = self._cell_base(net, prev_layer)
+ for i in range(len(self._operations) // 2):
+ with tf.variable_scope('comb_iter_{}'.format(i)):
+ h1 = net[self._hiddenstate_indices[i * 2]]
+ h2 = net[self._hiddenstate_indices[i * 2 + 1]]
+ with tf.variable_scope('left'):
+ h1 = self._apply_conv_operation(
+ h1, self._operations[i * 2], stride,
+ self._hiddenstate_indices[i * 2] < 2)
+ with tf.variable_scope('right'):
+ h2 = self._apply_conv_operation(
+ h2, self._operations[i * 2 + 1], stride,
+ self._hiddenstate_indices[i * 2 + 1] < 2)
+ with tf.variable_scope('combine'):
+ h = h1 + h2
+ net.append(h)
+
+ with tf.variable_scope('cell_output'):
+ net = self._combine_unused_states(net)
+
+ return net
+
+ def _cell_base(self, net, prev_layer):
+ """Runs the beginning of the conv cell before the chosen ops are run."""
+ filter_size = self._filter_size
+
+ if prev_layer is None:
+ prev_layer = net
+ else:
+ if net.shape[2] != prev_layer.shape[2]:
+ prev_layer = resize_bilinear(
+ prev_layer, tf.shape(net)[1:3], prev_layer.dtype)
+ if filter_size != prev_layer.shape[3]:
+ prev_layer = tf.nn.relu(prev_layer)
+ prev_layer = slim.conv2d(prev_layer, filter_size, 1, scope='prev_1x1')
+ prev_layer = self._batch_norm_fn(prev_layer, scope='prev_bn')
+
+ net = tf.nn.relu(net)
+ net = slim.conv2d(net, filter_size, 1, scope='1x1')
+ net = self._batch_norm_fn(net, scope='beginning_bn')
+ net = tf.split(axis=3, num_or_size_splits=1, value=net)
+ net.append(prev_layer)
+ return net
+
+ def _apply_conv_operation(self, net, operation, stride,
+ is_from_original_input):
+ """Applies the predicted conv operation to net."""
+ if stride > 1 and not is_from_original_input:
+ stride = 1
+ input_filters = net.shape[3]
+ filter_size = self._filter_size
+ if 'separable' in operation:
+ num_layers = int(operation.split('_')[-1])
+ kernel_size = int(operation.split('x')[0][-1])
+ for layer_num in range(num_layers):
+ net = tf.nn.relu(net)
+ net = separable_conv2d_same(
+ net,
+ filter_size,
+ kernel_size,
+ depth_multiplier=1,
+ scope='separable_{0}x{0}_{1}'.format(kernel_size, layer_num + 1),
+ stride=stride)
+ net = self._batch_norm_fn(
+ net, scope='bn_sep_{0}x{0}_{1}'.format(kernel_size, layer_num + 1))
+ stride = 1
+ elif 'atrous' in operation:
+ kernel_size = int(operation.split('x')[0][-1])
+ net = tf.nn.relu(net)
+ if stride == 2:
+ scaled_height = scale_dimension(tf.shape(net)[1], 0.5)
+ scaled_width = scale_dimension(tf.shape(net)[2], 0.5)
+ net = resize_bilinear(net, [scaled_height, scaled_width], net.dtype)
+ net = resnet_utils.conv2d_same(
+ net, filter_size, kernel_size, rate=1, stride=1,
+ scope='atrous_{0}x{0}'.format(kernel_size))
+ else:
+ net = resnet_utils.conv2d_same(
+ net, filter_size, kernel_size, rate=2, stride=1,
+ scope='atrous_{0}x{0}'.format(kernel_size))
+ net = self._batch_norm_fn(net, scope='bn_atr_{0}x{0}'.format(kernel_size))
+ elif operation in ['none']:
+ if stride > 1 or (input_filters != filter_size):
+ net = tf.nn.relu(net)
+ net = slim.conv2d(net, filter_size, 1, stride=stride, scope='1x1')
+ net = self._batch_norm_fn(net, scope='bn_1')
+ elif 'pool' in operation:
+ pooling_type = operation.split('_')[0]
+ pooling_shape = int(operation.split('_')[-1].split('x')[0])
+ if pooling_type == 'avg':
+ net = slim.avg_pool2d(net, pooling_shape, stride=stride, padding='SAME')
+ elif pooling_type == 'max':
+ net = slim.max_pool2d(net, pooling_shape, stride=stride, padding='SAME')
+ else:
+ raise ValueError('Unimplemented pooling type: ', pooling_type)
+ if input_filters != filter_size:
+ net = slim.conv2d(net, filter_size, 1, stride=1, scope='1x1')
+ net = self._batch_norm_fn(net, scope='bn_1')
+ else:
+ raise ValueError('Unimplemented operation', operation)
+
+ if operation != 'none':
+ net = self._apply_drop_path(net)
+ return net
+
+ def _combine_unused_states(self, net):
+ """Concatenates the unused hidden states of the cell."""
+ used_hiddenstates = self._used_hiddenstates
+ states_to_combine = ([
+ h for h, is_used in zip(net, used_hiddenstates) if not is_used])
+ net = tf.concat(values=states_to_combine, axis=3)
+ return net
+
+ @contrib_framework.add_arg_scope
+ def _apply_drop_path(self, net):
+ """Apply drop_path regularization."""
+ drop_path_keep_prob = self._drop_path_keep_prob
+ if drop_path_keep_prob < 1.0:
+ # Scale keep prob by layer number.
+ assert self._cell_num != -1
+ layer_ratio = (self._cell_num + 1) / float(self._total_num_cells)
+ drop_path_keep_prob = 1 - layer_ratio * (1 - drop_path_keep_prob)
+ # Decrease keep prob over time.
+ current_step = tf.cast(tf.train.get_or_create_global_step(), tf.float32)
+ current_ratio = tf.minimum(1.0, current_step / self._total_training_steps)
+ drop_path_keep_prob = (1 - current_ratio * (1 - drop_path_keep_prob))
+ # Drop path.
+ noise_shape = [tf.shape(net)[0], 1, 1, 1]
+ random_tensor = drop_path_keep_prob
+ random_tensor += tf.random_uniform(noise_shape, dtype=tf.float32)
+ binary_tensor = tf.cast(tf.floor(random_tensor), net.dtype)
+ keep_prob_inv = tf.cast(1.0 / drop_path_keep_prob, net.dtype)
+ net = net * keep_prob_inv * binary_tensor
+ return net
diff --git a/deeplab/models/research/deeplab/core/nas_genotypes.py b/deeplab/models/research/deeplab/core/nas_genotypes.py
new file mode 100644
index 0000000..a2e6dd5
--- /dev/null
+++ b/deeplab/models/research/deeplab/core/nas_genotypes.py
@@ -0,0 +1,45 @@
+# Lint as: python2, python3
+# Copyright 2018 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+
+"""Genotypes used by NAS."""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from tensorflow.contrib import slim as contrib_slim
+from deeplab.core import nas_cell
+
+slim = contrib_slim
+
+
+class PNASCell(nas_cell.NASBaseCell):
+ """Configuration and construction of the PNASNet-5 Cell."""
+
+ def __init__(self, num_conv_filters, drop_path_keep_prob, total_num_cells,
+ total_training_steps, batch_norm_fn=slim.batch_norm):
+ # Name of operations: op_kernel-size_num-layers.
+ operations = [
+ 'separable_5x5_2', 'max_pool_3x3', 'separable_7x7_2', 'max_pool_3x3',
+ 'separable_5x5_2', 'separable_3x3_2', 'separable_3x3_2', 'max_pool_3x3',
+ 'separable_3x3_2', 'none'
+ ]
+ used_hiddenstates = [1, 1, 0, 0, 0, 0, 0]
+ hiddenstate_indices = [1, 1, 0, 0, 0, 0, 4, 0, 1, 0]
+
+ super(PNASCell, self).__init__(
+ num_conv_filters, operations, used_hiddenstates, hiddenstate_indices,
+ drop_path_keep_prob, total_num_cells, total_training_steps,
+ batch_norm_fn)
diff --git a/deeplab/models/research/deeplab/core/nas_network.py b/deeplab/models/research/deeplab/core/nas_network.py
new file mode 100644
index 0000000..1da2e04
--- /dev/null
+++ b/deeplab/models/research/deeplab/core/nas_network.py
@@ -0,0 +1,368 @@
+# Lint as: python2, python3
+# Copyright 2018 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+
+"""Network structure used by NAS.
+
+Here we provide a few NAS backbones for semantic segmentation.
+Currently, we have
+
+1. pnasnet
+"Progressive Neural Architecture Search", Chenxi Liu, Barret Zoph,
+Maxim Neumann, Jonathon Shlens, Wei Hua, Li-Jia Li, Li Fei-Fei,
+Alan Yuille, Jonathan Huang, Kevin Murphy. In ECCV, 2018.
+
+2. hnasnet (also called Auto-DeepLab)
+"Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic
+Image Segmentation", Chenxi Liu, Liang-Chieh Chen, Florian Schroff,
+Hartwig Adam, Wei Hua, Alan Yuille, Li Fei-Fei. In CVPR, 2019.
+"""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+from six.moves import range
+import tensorflow as tf
+from tensorflow.contrib import framework as contrib_framework
+from tensorflow.contrib import layers as contrib_layers
+from tensorflow.contrib import slim as contrib_slim
+from tensorflow.contrib import training as contrib_training
+
+from deeplab.core import nas_genotypes
+from deeplab.core import utils
+from deeplab.core.nas_cell import NASBaseCell
+from tensorflow.contrib.slim.nets import resnet_utils
+
+arg_scope = contrib_framework.arg_scope
+slim = contrib_slim
+resize_bilinear = utils.resize_bilinear
+scale_dimension = utils.scale_dimension
+
+
+def config(num_conv_filters=20,
+ total_training_steps=500000,
+ drop_path_keep_prob=1.0):
+ return contrib_training.HParams(
+ # Multiplier when spatial size is reduced by 2.
+ filter_scaling_rate=2.0,
+ # Number of filters of the stem output tensor.
+ num_conv_filters=num_conv_filters,
+ # Probability to keep each path in the cell when training.
+ drop_path_keep_prob=drop_path_keep_prob,
+ # Total training steps to help drop path probability calculation.
+ total_training_steps=total_training_steps,
+ )
+
+
+def nas_arg_scope(weight_decay=4e-5,
+ batch_norm_decay=0.9997,
+ batch_norm_epsilon=0.001,
+ sync_batch_norm_method='None'):
+ """Default arg scope for the NAS models."""
+ batch_norm_params = {
+ # Decay for the moving averages.
+ 'decay': batch_norm_decay,
+ # epsilon to prevent 0s in variance.
+ 'epsilon': batch_norm_epsilon,
+ 'scale': True,
+ }
+ batch_norm = utils.get_batch_norm_fn(sync_batch_norm_method)
+ weights_regularizer = contrib_layers.l2_regularizer(weight_decay)
+ weights_initializer = contrib_layers.variance_scaling_initializer(
+ factor=1 / 3.0, mode='FAN_IN', uniform=True)
+ with arg_scope([slim.fully_connected, slim.conv2d, slim.separable_conv2d],
+ weights_regularizer=weights_regularizer,
+ weights_initializer=weights_initializer):
+ with arg_scope([slim.fully_connected],
+ activation_fn=None, scope='FC'):
+ with arg_scope([slim.conv2d, slim.separable_conv2d],
+ activation_fn=None, biases_initializer=None):
+ with arg_scope([batch_norm], **batch_norm_params) as sc:
+ return sc
+
+
+def _nas_stem(inputs,
+ batch_norm_fn=slim.batch_norm):
+ """Stem used for NAS models."""
+ net = resnet_utils.conv2d_same(inputs, 64, 3, stride=2, scope='conv0')
+ net = batch_norm_fn(net, scope='conv0_bn')
+ net = tf.nn.relu(net)
+ net = resnet_utils.conv2d_same(net, 64, 3, stride=1, scope='conv1')
+ net = batch_norm_fn(net, scope='conv1_bn')
+ cell_outputs = [net]
+ net = tf.nn.relu(net)
+ net = resnet_utils.conv2d_same(net, 128, 3, stride=2, scope='conv2')
+ net = batch_norm_fn(net, scope='conv2_bn')
+ cell_outputs.append(net)
+ return net, cell_outputs
+
+
+def _build_nas_base(images,
+ cell,
+ backbone,
+ num_classes,
+ hparams,
+ global_pool=False,
+ output_stride=16,
+ nas_use_classification_head=False,
+ reuse=None,
+ scope=None,
+ final_endpoint=None,
+ batch_norm_fn=slim.batch_norm,
+ nas_remove_os32_stride=False):
+ """Constructs a NAS model.
+
+ Args:
+ images: A tensor of size [batch, height, width, channels].
+ cell: Cell structure used in the network.
+ backbone: Backbone structure used in the network. A list of integers in
+ which value 0 means "output_stride=4", value 1 means "output_stride=8",
+ value 2 means "output_stride=16", and value 3 means "output_stride=32".
+ num_classes: Number of classes to predict.
+ hparams: Hyperparameters needed to construct the network.
+ global_pool: If True, we perform global average pooling before computing the
+ logits. Set to True for image classification, False for dense prediction.
+ output_stride: Interger, the stride of output feature maps.
+ nas_use_classification_head: Boolean, use image classification head.
+ reuse: Whether or not the network and its variables should be reused. To be
+ able to reuse 'scope' must be given.
+ scope: Optional variable_scope.
+ final_endpoint: The endpoint to construct the network up to.
+ batch_norm_fn: Batch norm function.
+ nas_remove_os32_stride: Boolean, remove stride in output_stride 32 branch.
+
+ Returns:
+ net: A rank-4 tensor of size [batch, height_out, width_out, channels_out].
+ end_points: A dictionary from components of the network to the corresponding
+ activation.
+
+ Raises:
+ ValueError: If output_stride is not a multiple of backbone output stride.
+ """
+ with tf.variable_scope(scope, 'nas', [images], reuse=reuse):
+ end_points = {}
+ def add_and_check_endpoint(endpoint_name, net):
+ end_points[endpoint_name] = net
+ return final_endpoint and (endpoint_name == final_endpoint)
+
+ net, cell_outputs = _nas_stem(images,
+ batch_norm_fn=batch_norm_fn)
+ if add_and_check_endpoint('Stem', net):
+ return net, end_points
+
+ # Run the cells
+ filter_scaling = 1.0
+ for cell_num in range(len(backbone)):
+ stride = 1
+ if cell_num == 0:
+ if backbone[0] == 1:
+ stride = 2
+ filter_scaling *= hparams.filter_scaling_rate
+ else:
+ if backbone[cell_num] == backbone[cell_num - 1] + 1:
+ stride = 2
+ if backbone[cell_num] == 3 and nas_remove_os32_stride:
+ stride = 1
+ filter_scaling *= hparams.filter_scaling_rate
+ elif backbone[cell_num] == backbone[cell_num - 1] - 1:
+ if backbone[cell_num - 1] == 3 and nas_remove_os32_stride:
+ # No need to rescale features.
+ pass
+ else:
+ # Scale features by a factor of 2.
+ scaled_height = scale_dimension(net.shape[1].value, 2)
+ scaled_width = scale_dimension(net.shape[2].value, 2)
+ net = resize_bilinear(net, [scaled_height, scaled_width], net.dtype)
+ filter_scaling /= hparams.filter_scaling_rate
+ net = cell(
+ net,
+ scope='cell_{}'.format(cell_num),
+ filter_scaling=filter_scaling,
+ stride=stride,
+ prev_layer=cell_outputs[-2],
+ cell_num=cell_num)
+ if add_and_check_endpoint('Cell_{}'.format(cell_num), net):
+ return net, end_points
+ cell_outputs.append(net)
+ net = tf.nn.relu(net)
+
+ if nas_use_classification_head:
+ # Add image classification head.
+ # We will expand the filters for different output_strides.
+ output_stride_to_expanded_filters = {8: 256, 16: 512, 32: 1024}
+ current_output_scale = 2 + backbone[-1]
+ current_output_stride = 2 ** current_output_scale
+ if output_stride % current_output_stride != 0:
+ raise ValueError(
+ 'output_stride must be a multiple of backbone output stride.')
+ output_stride //= current_output_stride
+ rate = 1
+ if current_output_stride != 32:
+ num_downsampling = 5 - current_output_scale
+ for i in range(num_downsampling):
+ # Gradually donwsample feature maps to output stride = 32.
+ target_output_stride = 2 ** (current_output_scale + 1 + i)
+ target_filters = output_stride_to_expanded_filters[
+ target_output_stride]
+ scope = 'downsample_os{}'.format(target_output_stride)
+ if output_stride != 1:
+ stride = 2
+ output_stride //= 2
+ else:
+ stride = 1
+ rate *= 2
+ net = resnet_utils.conv2d_same(
+ net, target_filters, 3, stride=stride, rate=rate,
+ scope=scope + '_conv')
+ net = batch_norm_fn(net, scope=scope + '_bn')
+ add_and_check_endpoint(scope, net)
+ net = tf.nn.relu(net)
+ # Apply 1x1 convolution to expand dimension to 2048.
+ scope = 'classification_head'
+ net = slim.conv2d(net, 2048, 1, scope=scope + '_conv')
+ net = batch_norm_fn(net, scope=scope + '_bn')
+ add_and_check_endpoint(scope, net)
+ net = tf.nn.relu(net)
+ if global_pool:
+ # Global average pooling.
+ net = tf.reduce_mean(net, [1, 2], name='global_pool', keepdims=True)
+ if num_classes is not None:
+ net = slim.conv2d(net, num_classes, 1, activation_fn=None,
+ normalizer_fn=None, scope='logits')
+ end_points['predictions'] = slim.softmax(net, scope='predictions')
+ return net, end_points
+
+
+def pnasnet(images,
+ num_classes,
+ is_training=True,
+ global_pool=False,
+ output_stride=16,
+ nas_architecture_options=None,
+ nas_training_hyper_parameters=None,
+ reuse=None,
+ scope='pnasnet',
+ final_endpoint=None,
+ sync_batch_norm_method='None'):
+ """Builds PNASNet model."""
+ if nas_architecture_options is None:
+ raise ValueError(
+ 'Using NAS model variants. nas_architecture_options cannot be None.')
+ hparams = config(num_conv_filters=nas_architecture_options[
+ 'nas_stem_output_num_conv_filters'])
+ if nas_training_hyper_parameters:
+ hparams.set_hparam('drop_path_keep_prob',
+ nas_training_hyper_parameters['drop_path_keep_prob'])
+ hparams.set_hparam('total_training_steps',
+ nas_training_hyper_parameters['total_training_steps'])
+ if not is_training:
+ tf.logging.info('During inference, setting drop_path_keep_prob = 1.0.')
+ hparams.set_hparam('drop_path_keep_prob', 1.0)
+ tf.logging.info(hparams)
+ if output_stride == 8:
+ backbone = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
+ elif output_stride == 16:
+ backbone = [1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2]
+ elif output_stride == 32:
+ backbone = [1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3]
+ else:
+ raise ValueError('Unsupported output_stride ', output_stride)
+ batch_norm = utils.get_batch_norm_fn(sync_batch_norm_method)
+ cell = nas_genotypes.PNASCell(hparams.num_conv_filters,
+ hparams.drop_path_keep_prob,
+ len(backbone),
+ hparams.total_training_steps,
+ batch_norm_fn=batch_norm)
+ with arg_scope([slim.dropout, batch_norm], is_training=is_training):
+ return _build_nas_base(
+ images,
+ cell=cell,
+ backbone=backbone,
+ num_classes=num_classes,
+ hparams=hparams,
+ global_pool=global_pool,
+ output_stride=output_stride,
+ nas_use_classification_head=nas_architecture_options[
+ 'nas_use_classification_head'],
+ reuse=reuse,
+ scope=scope,
+ final_endpoint=final_endpoint,
+ batch_norm_fn=batch_norm,
+ nas_remove_os32_stride=nas_architecture_options[
+ 'nas_remove_os32_stride'])
+
+
+# pylint: disable=unused-argument
+def hnasnet(images,
+ num_classes,
+ is_training=True,
+ global_pool=False,
+ output_stride=8,
+ nas_architecture_options=None,
+ nas_training_hyper_parameters=None,
+ reuse=None,
+ scope='hnasnet',
+ final_endpoint=None,
+ sync_batch_norm_method='None'):
+ """Builds hierarchical model."""
+ if nas_architecture_options is None:
+ raise ValueError(
+ 'Using NAS model variants. nas_architecture_options cannot be None.')
+ hparams = config(num_conv_filters=nas_architecture_options[
+ 'nas_stem_output_num_conv_filters'])
+ if nas_training_hyper_parameters:
+ hparams.set_hparam('drop_path_keep_prob',
+ nas_training_hyper_parameters['drop_path_keep_prob'])
+ hparams.set_hparam('total_training_steps',
+ nas_training_hyper_parameters['total_training_steps'])
+ if not is_training:
+ tf.logging.info('During inference, setting drop_path_keep_prob = 1.0.')
+ hparams.set_hparam('drop_path_keep_prob', 1.0)
+ tf.logging.info(hparams)
+ operations = [
+ 'atrous_5x5', 'separable_3x3_2', 'separable_3x3_2', 'atrous_3x3',
+ 'separable_3x3_2', 'separable_3x3_2', 'separable_5x5_2',
+ 'separable_5x5_2', 'separable_5x5_2', 'atrous_5x5'
+ ]
+ used_hiddenstates = [1, 1, 0, 0, 0, 0, 0]
+ hiddenstate_indices = [1, 0, 1, 0, 3, 1, 4, 2, 3, 5]
+ backbone = [0, 0, 0, 1, 2, 1, 2, 2, 3, 3, 2, 1]
+ batch_norm = utils.get_batch_norm_fn(sync_batch_norm_method)
+ cell = NASBaseCell(hparams.num_conv_filters,
+ operations,
+ used_hiddenstates,
+ hiddenstate_indices,
+ hparams.drop_path_keep_prob,
+ len(backbone),
+ hparams.total_training_steps,
+ batch_norm_fn=batch_norm)
+ with arg_scope([slim.dropout, batch_norm], is_training=is_training):
+ return _build_nas_base(
+ images,
+ cell=cell,
+ backbone=backbone,
+ num_classes=num_classes,
+ hparams=hparams,
+ global_pool=global_pool,
+ output_stride=output_stride,
+ nas_use_classification_head=nas_architecture_options[
+ 'nas_use_classification_head'],
+ reuse=reuse,
+ scope=scope,
+ final_endpoint=final_endpoint,
+ batch_norm_fn=batch_norm,
+ nas_remove_os32_stride=nas_architecture_options[
+ 'nas_remove_os32_stride'])
diff --git a/deeplab/models/research/deeplab/core/nas_network_test.py b/deeplab/models/research/deeplab/core/nas_network_test.py
new file mode 100644
index 0000000..18621b2
--- /dev/null
+++ b/deeplab/models/research/deeplab/core/nas_network_test.py
@@ -0,0 +1,111 @@
+# Lint as: python2, python3
+# Copyright 2018 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+
+"""Tests for resnet_v1_beta module."""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import numpy as np
+import tensorflow as tf
+from tensorflow.contrib import framework as contrib_framework
+from tensorflow.contrib import slim as contrib_slim
+from tensorflow.contrib import training as contrib_training
+
+from deeplab.core import nas_genotypes
+from deeplab.core import nas_network
+
+arg_scope = contrib_framework.arg_scope
+slim = contrib_slim
+
+
+def create_test_input(batch, height, width, channels):
+ """Creates test input tensor."""
+ if None in [batch, height, width, channels]:
+ return tf.placeholder(tf.float32, (batch, height, width, channels))
+ else:
+ return tf.to_float(
+ np.tile(
+ np.reshape(
+ np.reshape(np.arange(height), [height, 1]) +
+ np.reshape(np.arange(width), [1, width]),
+ [1, height, width, 1]),
+ [batch, 1, 1, channels]))
+
+
+class NASNetworkTest(tf.test.TestCase):
+ """Tests with complete small NAS networks."""
+
+ def _pnasnet(self,
+ images,
+ backbone,
+ num_classes,
+ is_training=True,
+ output_stride=16,
+ final_endpoint=None):
+ """Build PNASNet model backbone."""
+ hparams = contrib_training.HParams(
+ filter_scaling_rate=2.0,
+ num_conv_filters=10,
+ drop_path_keep_prob=1.0,
+ total_training_steps=200000,
+ )
+ if not is_training:
+ hparams.set_hparam('drop_path_keep_prob', 1.0)
+
+ cell = nas_genotypes.PNASCell(hparams.num_conv_filters,
+ hparams.drop_path_keep_prob,
+ len(backbone),
+ hparams.total_training_steps)
+ with arg_scope([slim.dropout, slim.batch_norm], is_training=is_training):
+ return nas_network._build_nas_base(
+ images,
+ cell=cell,
+ backbone=backbone,
+ num_classes=num_classes,
+ hparams=hparams,
+ reuse=tf.AUTO_REUSE,
+ scope='pnasnet_small',
+ final_endpoint=final_endpoint)
+
+ def testFullyConvolutionalEndpointShapes(self):
+ num_classes = 10
+ backbone = [0, 0, 0, 1, 2, 1, 2, 2, 3, 3, 2, 1]
+ inputs = create_test_input(None, 321, 321, 3)
+ with slim.arg_scope(nas_network.nas_arg_scope()):
+ _, end_points = self._pnasnet(inputs, backbone, num_classes)
+ endpoint_to_shape = {
+ 'Stem': [None, 81, 81, 128],
+ 'Cell_0': [None, 81, 81, 50],
+ 'Cell_1': [None, 81, 81, 50],
+ 'Cell_2': [None, 81, 81, 50],
+ 'Cell_3': [None, 41, 41, 100],
+ 'Cell_4': [None, 21, 21, 200],
+ 'Cell_5': [None, 41, 41, 100],
+ 'Cell_6': [None, 21, 21, 200],
+ 'Cell_7': [None, 21, 21, 200],
+ 'Cell_8': [None, 11, 11, 400],
+ 'Cell_9': [None, 11, 11, 400],
+ 'Cell_10': [None, 21, 21, 200],
+ 'Cell_11': [None, 41, 41, 100]
+ }
+ for endpoint, shape in endpoint_to_shape.items():
+ self.assertListEqual(end_points[endpoint].get_shape().as_list(), shape)
+
+
+if __name__ == '__main__':
+ tf.test.main()
diff --git a/deeplab/models/research/deeplab/core/preprocess_utils.py b/deeplab/models/research/deeplab/core/preprocess_utils.py
new file mode 100644
index 0000000..440717e
--- /dev/null
+++ b/deeplab/models/research/deeplab/core/preprocess_utils.py
@@ -0,0 +1,533 @@
+# Lint as: python2, python3
+# Copyright 2018 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+
+"""Utility functions related to preprocessing inputs."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from six.moves import range
+from six.moves import zip
+import tensorflow as tf
+
+
+def flip_dim(tensor_list, prob=0.5, dim=1):
+ """Randomly flips a dimension of the given tensor.
+
+ The decision to randomly flip the `Tensors` is made together. In other words,
+ all or none of the images pass in are flipped.
+
+ Note that tf.random_flip_left_right and tf.random_flip_up_down isn't used so
+ that we can control for the probability as well as ensure the same decision
+ is applied across the images.
+
+ Args:
+ tensor_list: A list of `Tensors` with the same number of dimensions.
+ prob: The probability of a left-right flip.
+ dim: The dimension to flip, 0, 1, ..
+
+ Returns:
+ outputs: A list of the possibly flipped `Tensors` as well as an indicator
+ `Tensor` at the end whose value is `True` if the inputs were flipped and
+ `False` otherwise.
+
+ Raises:
+ ValueError: If dim is negative or greater than the dimension of a `Tensor`.
+ """
+ random_value = tf.random_uniform([])
+
+ def flip():
+ flipped = []
+ for tensor in tensor_list:
+ if dim < 0 or dim >= len(tensor.get_shape().as_list()):
+ raise ValueError('dim must represent a valid dimension.')
+ flipped.append(tf.reverse_v2(tensor, [dim]))
+ return flipped
+
+ is_flipped = tf.less_equal(random_value, prob)
+ outputs = tf.cond(is_flipped, flip, lambda: tensor_list)
+ if not isinstance(outputs, (list, tuple)):
+ outputs = [outputs]
+ outputs.append(is_flipped)
+
+ return outputs
+
+
+def _image_dimensions(image, rank):
+ """Returns the dimensions of an image tensor.
+
+ Args:
+ image: A rank-D Tensor. For 3-D of shape: `[height, width, channels]`.
+ rank: The expected rank of the image
+
+ Returns:
+ A list of corresponding to the dimensions of the input image. Dimensions
+ that are statically known are python integers, otherwise they are integer
+ scalar tensors.
+ """
+ if image.get_shape().is_fully_defined():
+ return image.get_shape().as_list()
+ else:
+ static_shape = image.get_shape().with_rank(rank).as_list()
+ dynamic_shape = tf.unstack(tf.shape(image), rank)
+ return [
+ s if s is not None else d for s, d in zip(static_shape, dynamic_shape)
+ ]
+
+
+def get_label_resize_method(label):
+ """Returns the resize method of labels depending on label dtype.
+
+ Args:
+ label: Groundtruth label tensor.
+
+ Returns:
+ tf.image.ResizeMethod.BILINEAR, if label dtype is floating.
+ tf.image.ResizeMethod.NEAREST_NEIGHBOR, if label dtype is integer.
+
+ Raises:
+ ValueError: If label is neither floating nor integer.
+ """
+ if label.dtype.is_floating:
+ return tf.image.ResizeMethod.BILINEAR
+ elif label.dtype.is_integer:
+ return tf.image.ResizeMethod.NEAREST_NEIGHBOR
+ else:
+ raise ValueError('Label type must be either floating or integer.')
+
+
+def pad_to_bounding_box(image, offset_height, offset_width, target_height,
+ target_width, pad_value):
+ """Pads the given image with the given pad_value.
+
+ Works like tf.image.pad_to_bounding_box, except it can pad the image
+ with any given arbitrary pad value and also handle images whose sizes are not
+ known during graph construction.
+
+ Args:
+ image: 3-D tensor with shape [height, width, channels]
+ offset_height: Number of rows of zeros to add on top.
+ offset_width: Number of columns of zeros to add on the left.
+ target_height: Height of output image.
+ target_width: Width of output image.
+ pad_value: Value to pad the image tensor with.
+
+ Returns:
+ 3-D tensor of shape [target_height, target_width, channels].
+
+ Raises:
+ ValueError: If the shape of image is incompatible with the offset_* or
+ target_* arguments.
+ """
+ with tf.name_scope(None, 'pad_to_bounding_box', [image]):
+ image = tf.convert_to_tensor(image, name='image')
+ original_dtype = image.dtype
+ if original_dtype != tf.float32 and original_dtype != tf.float64:
+ # If image dtype is not float, we convert it to int32 to avoid overflow.
+ image = tf.cast(image, tf.int32)
+ image_rank_assert = tf.Assert(
+ tf.logical_or(
+ tf.equal(tf.rank(image), 3),
+ tf.equal(tf.rank(image), 4)),
+ ['Wrong image tensor rank.'])
+ with tf.control_dependencies([image_rank_assert]):
+ image -= pad_value
+ image_shape = image.get_shape()
+ is_batch = True
+ if image_shape.ndims == 3:
+ is_batch = False
+ image = tf.expand_dims(image, 0)
+ elif image_shape.ndims is None:
+ is_batch = False
+ image = tf.expand_dims(image, 0)
+ image.set_shape([None] * 4)
+ elif image.get_shape().ndims != 4:
+ raise ValueError('Input image must have either 3 or 4 dimensions.')
+ _, height, width, _ = _image_dimensions(image, rank=4)
+ target_width_assert = tf.Assert(
+ tf.greater_equal(
+ target_width, width),
+ ['target_width must be >= width'])
+ target_height_assert = tf.Assert(
+ tf.greater_equal(target_height, height),
+ ['target_height must be >= height'])
+ with tf.control_dependencies([target_width_assert]):
+ after_padding_width = target_width - offset_width - width
+ with tf.control_dependencies([target_height_assert]):
+ after_padding_height = target_height - offset_height - height
+ offset_assert = tf.Assert(
+ tf.logical_and(
+ tf.greater_equal(after_padding_width, 0),
+ tf.greater_equal(after_padding_height, 0)),
+ ['target size not possible with the given target offsets'])
+ batch_params = tf.stack([0, 0])
+ height_params = tf.stack([offset_height, after_padding_height])
+ width_params = tf.stack([offset_width, after_padding_width])
+ channel_params = tf.stack([0, 0])
+ with tf.control_dependencies([offset_assert]):
+ paddings = tf.stack([batch_params, height_params, width_params,
+ channel_params])
+ padded = tf.pad(image, paddings)
+ if not is_batch:
+ padded = tf.squeeze(padded, axis=[0])
+ outputs = padded + pad_value
+ if outputs.dtype != original_dtype:
+ outputs = tf.cast(outputs, original_dtype)
+ return outputs
+
+
+def _crop(image, offset_height, offset_width, crop_height, crop_width):
+ """Crops the given image using the provided offsets and sizes.
+
+ Note that the method doesn't assume we know the input image size but it does
+ assume we know the input image rank.
+
+ Args:
+ image: an image of shape [height, width, channels].
+ offset_height: a scalar tensor indicating the height offset.
+ offset_width: a scalar tensor indicating the width offset.
+ crop_height: the height of the cropped image.
+ crop_width: the width of the cropped image.
+
+ Returns:
+ The cropped (and resized) image.
+
+ Raises:
+ ValueError: if `image` doesn't have rank of 3.
+ InvalidArgumentError: if the rank is not 3 or if the image dimensions are
+ less than the crop size.
+ """
+ original_shape = tf.shape(image)
+
+ if len(image.get_shape().as_list()) != 3:
+ raise ValueError('input must have rank of 3')
+ original_channels = image.get_shape().as_list()[2]
+
+ rank_assertion = tf.Assert(
+ tf.equal(tf.rank(image), 3),
+ ['Rank of image must be equal to 3.'])
+ with tf.control_dependencies([rank_assertion]):
+ cropped_shape = tf.stack([crop_height, crop_width, original_shape[2]])
+
+ size_assertion = tf.Assert(
+ tf.logical_and(
+ tf.greater_equal(original_shape[0], crop_height),
+ tf.greater_equal(original_shape[1], crop_width)),
+ ['Crop size greater than the image size.'])
+
+ offsets = tf.cast(tf.stack([offset_height, offset_width, 0]), tf.int32)
+
+ # Use tf.slice instead of crop_to_bounding box as it accepts tensors to
+ # define the crop size.
+ with tf.control_dependencies([size_assertion]):
+ image = tf.slice(image, offsets, cropped_shape)
+ image = tf.reshape(image, cropped_shape)
+ image.set_shape([crop_height, crop_width, original_channels])
+ return image
+
+
+def random_crop(image_list, crop_height, crop_width):
+ """Crops the given list of images.
+
+ The function applies the same crop to each image in the list. This can be
+ effectively applied when there are multiple image inputs of the same
+ dimension such as:
+
+ image, depths, normals = random_crop([image, depths, normals], 120, 150)
+
+ Args:
+ image_list: a list of image tensors of the same dimension but possibly
+ varying channel.
+ crop_height: the new height.
+ crop_width: the new width.
+
+ Returns:
+ the image_list with cropped images.
+
+ Raises:
+ ValueError: if there are multiple image inputs provided with different size
+ or the images are smaller than the crop dimensions.
+ """
+ if not image_list:
+ raise ValueError('Empty image_list.')
+
+ # Compute the rank assertions.
+ rank_assertions = []
+ for i in range(len(image_list)):
+ image_rank = tf.rank(image_list[i])
+ rank_assert = tf.Assert(
+ tf.equal(image_rank, 3),
+ ['Wrong rank for tensor %s [expected] [actual]',
+ image_list[i].name, 3, image_rank])
+ rank_assertions.append(rank_assert)
+
+ with tf.control_dependencies([rank_assertions[0]]):
+ image_shape = tf.shape(image_list[0])
+ image_height = image_shape[0]
+ image_width = image_shape[1]
+ crop_size_assert = tf.Assert(
+ tf.logical_and(
+ tf.greater_equal(image_height, crop_height),
+ tf.greater_equal(image_width, crop_width)),
+ ['Crop size greater than the image size.'])
+
+ asserts = [rank_assertions[0], crop_size_assert]
+
+ for i in range(1, len(image_list)):
+ image = image_list[i]
+ asserts.append(rank_assertions[i])
+ with tf.control_dependencies([rank_assertions[i]]):
+ shape = tf.shape(image)
+ height = shape[0]
+ width = shape[1]
+
+ height_assert = tf.Assert(
+ tf.equal(height, image_height),
+ ['Wrong height for tensor %s [expected][actual]',
+ image.name, height, image_height])
+ width_assert = tf.Assert(
+ tf.equal(width, image_width),
+ ['Wrong width for tensor %s [expected][actual]',
+ image.name, width, image_width])
+ asserts.extend([height_assert, width_assert])
+
+ # Create a random bounding box.
+ #
+ # Use tf.random_uniform and not numpy.random.rand as doing the former would
+ # generate random numbers at graph eval time, unlike the latter which
+ # generates random numbers at graph definition time.
+ with tf.control_dependencies(asserts):
+ max_offset_height = tf.reshape(image_height - crop_height + 1, [])
+ max_offset_width = tf.reshape(image_width - crop_width + 1, [])
+ offset_height = tf.random_uniform(
+ [], maxval=max_offset_height, dtype=tf.int32)
+ offset_width = tf.random_uniform(
+ [], maxval=max_offset_width, dtype=tf.int32)
+
+ return [_crop(image, offset_height, offset_width,
+ crop_height, crop_width) for image in image_list]
+
+
+def get_random_scale(min_scale_factor, max_scale_factor, step_size):
+ """Gets a random scale value.
+
+ Args:
+ min_scale_factor: Minimum scale value.
+ max_scale_factor: Maximum scale value.
+ step_size: The step size from minimum to maximum value.
+
+ Returns:
+ A random scale value selected between minimum and maximum value.
+
+ Raises:
+ ValueError: min_scale_factor has unexpected value.
+ """
+ if min_scale_factor < 0 or min_scale_factor > max_scale_factor:
+ raise ValueError('Unexpected value of min_scale_factor.')
+
+ if min_scale_factor == max_scale_factor:
+ return tf.cast(min_scale_factor, tf.float32)
+
+ # When step_size = 0, we sample the value uniformly from [min, max).
+ if step_size == 0:
+ return tf.random_uniform([1],
+ minval=min_scale_factor,
+ maxval=max_scale_factor)
+
+ # When step_size != 0, we randomly select one discrete value from [min, max].
+ num_steps = int((max_scale_factor - min_scale_factor) / step_size + 1)
+ scale_factors = tf.lin_space(min_scale_factor, max_scale_factor, num_steps)
+ shuffled_scale_factors = tf.random_shuffle(scale_factors)
+ return shuffled_scale_factors[0]
+
+
+def randomly_scale_image_and_label(image, label=None, scale=1.0):
+ """Randomly scales image and label.
+
+ Args:
+ image: Image with shape [height, width, 3].
+ label: Label with shape [height, width, 1].
+ scale: The value to scale image and label.
+
+ Returns:
+ Scaled image and label.
+ """
+ # No random scaling if scale == 1.
+ if scale == 1.0:
+ return image, label
+ image_shape = tf.shape(image)
+ new_dim = tf.cast(
+ tf.cast([image_shape[0], image_shape[1]], tf.float32) * scale,
+ tf.int32)
+
+ # Need squeeze and expand_dims because image interpolation takes
+ # 4D tensors as input.
+ image = tf.squeeze(tf.image.resize_bilinear(
+ tf.expand_dims(image, 0),
+ new_dim,
+ align_corners=True), [0])
+ if label is not None:
+ label = tf.image.resize(
+ label,
+ new_dim,
+ method=get_label_resize_method(label),
+ align_corners=True)
+
+ return image, label
+
+
+def resolve_shape(tensor, rank=None, scope=None):
+ """Fully resolves the shape of a Tensor.
+
+ Use as much as possible the shape components already known during graph
+ creation and resolve the remaining ones during runtime.
+
+ Args:
+ tensor: Input tensor whose shape we query.
+ rank: The rank of the tensor, provided that we know it.
+ scope: Optional name scope.
+
+ Returns:
+ shape: The full shape of the tensor.
+ """
+ with tf.name_scope(scope, 'resolve_shape', [tensor]):
+ if rank is not None:
+ shape = tensor.get_shape().with_rank(rank).as_list()
+ else:
+ shape = tensor.get_shape().as_list()
+
+ if None in shape:
+ shape_dynamic = tf.shape(tensor)
+ for i in range(len(shape)):
+ if shape[i] is None:
+ shape[i] = shape_dynamic[i]
+
+ return shape
+
+
+def resize_to_range(image,
+ label=None,
+ min_size=None,
+ max_size=None,
+ factor=None,
+ keep_aspect_ratio=True,
+ align_corners=True,
+ label_layout_is_chw=False,
+ scope=None,
+ method=tf.image.ResizeMethod.BILINEAR):
+ """Resizes image or label so their sides are within the provided range.
+
+ The output size can be described by two cases:
+ 1. If the image can be rescaled so its minimum size is equal to min_size
+ without the other side exceeding max_size, then do so.
+ 2. Otherwise, resize so the largest side is equal to max_size.
+
+ An integer in `range(factor)` is added to the computed sides so that the
+ final dimensions are multiples of `factor` plus one.
+
+ Args:
+ image: A 3D tensor of shape [height, width, channels].
+ label: (optional) A 3D tensor of shape [height, width, channels] (default)
+ or [channels, height, width] when label_layout_is_chw = True.
+ min_size: (scalar) desired size of the smaller image side.
+ max_size: (scalar) maximum allowed size of the larger image side. Note
+ that the output dimension is no larger than max_size and may be slightly
+ smaller than max_size when factor is not None.
+ factor: Make output size multiple of factor plus one.
+ keep_aspect_ratio: Boolean, keep aspect ratio or not. If True, the input
+ will be resized while keeping the original aspect ratio. If False, the
+ input will be resized to [max_resize_value, max_resize_value] without
+ keeping the original aspect ratio.
+ align_corners: If True, exactly align all 4 corners of input and output.
+ label_layout_is_chw: If true, the label has shape [channel, height, width].
+ We support this case because for some instance segmentation dataset, the
+ instance segmentation is saved as [num_instances, height, width].
+ scope: Optional name scope.
+ method: Image resize method. Defaults to tf.image.ResizeMethod.BILINEAR.
+
+ Returns:
+ A 3-D tensor of shape [new_height, new_width, channels], where the image
+ has been resized (with the specified method) so that
+ min(new_height, new_width) == ceil(min_size) or
+ max(new_height, new_width) == ceil(max_size).
+
+ Raises:
+ ValueError: If the image is not a 3D tensor.
+ """
+ with tf.name_scope(scope, 'resize_to_range', [image]):
+ new_tensor_list = []
+ min_size = tf.cast(min_size, tf.float32)
+ if max_size is not None:
+ max_size = tf.cast(max_size, tf.float32)
+ # Modify the max_size to be a multiple of factor plus 1 and make sure the
+ # max dimension after resizing is no larger than max_size.
+ if factor is not None:
+ max_size = (max_size - (max_size - 1) % factor)
+
+ [orig_height, orig_width, _] = resolve_shape(image, rank=3)
+ orig_height = tf.cast(orig_height, tf.float32)
+ orig_width = tf.cast(orig_width, tf.float32)
+ orig_min_size = tf.minimum(orig_height, orig_width)
+
+ # Calculate the larger of the possible sizes
+ large_scale_factor = min_size / orig_min_size
+ large_height = tf.cast(tf.floor(orig_height * large_scale_factor), tf.int32)
+ large_width = tf.cast(tf.floor(orig_width * large_scale_factor), tf.int32)
+ large_size = tf.stack([large_height, large_width])
+
+ new_size = large_size
+ if max_size is not None:
+ # Calculate the smaller of the possible sizes, use that if the larger
+ # is too big.
+ orig_max_size = tf.maximum(orig_height, orig_width)
+ small_scale_factor = max_size / orig_max_size
+ small_height = tf.cast(
+ tf.floor(orig_height * small_scale_factor), tf.int32)
+ small_width = tf.cast(tf.floor(orig_width * small_scale_factor), tf.int32)
+ small_size = tf.stack([small_height, small_width])
+ new_size = tf.cond(
+ tf.cast(tf.reduce_max(large_size), tf.float32) > max_size,
+ lambda: small_size,
+ lambda: large_size)
+ # Ensure that both output sides are multiples of factor plus one.
+ if factor is not None:
+ new_size += (factor - (new_size - 1) % factor) % factor
+ if not keep_aspect_ratio:
+ # If not keep the aspect ratio, we resize everything to max_size, allowing
+ # us to do pre-processing without extra padding.
+ new_size = [tf.reduce_max(new_size), tf.reduce_max(new_size)]
+ new_tensor_list.append(tf.image.resize(
+ image, new_size, method=method, align_corners=align_corners))
+ if label is not None:
+ if label_layout_is_chw:
+ # Input label has shape [channel, height, width].
+ resized_label = tf.expand_dims(label, 3)
+ resized_label = tf.image.resize(
+ resized_label,
+ new_size,
+ method=get_label_resize_method(label),
+ align_corners=align_corners)
+ resized_label = tf.squeeze(resized_label, 3)
+ else:
+ # Input label has shape [height, width, channel].
+ resized_label = tf.image.resize(
+ label,
+ new_size,
+ method=get_label_resize_method(label),
+ align_corners=align_corners)
+ new_tensor_list.append(resized_label)
+ else:
+ new_tensor_list.append(None)
+ return new_tensor_list
diff --git a/deeplab/models/research/deeplab/core/preprocess_utils_test.py b/deeplab/models/research/deeplab/core/preprocess_utils_test.py
new file mode 100644
index 0000000..606fe46
--- /dev/null
+++ b/deeplab/models/research/deeplab/core/preprocess_utils_test.py
@@ -0,0 +1,515 @@
+# Lint as: python2, python3
+# Copyright 2018 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+
+"""Tests for preprocess_utils."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import numpy as np
+from six.moves import range
+import tensorflow as tf
+
+from deeplab.core import preprocess_utils
+
+
+class PreprocessUtilsTest(tf.test.TestCase):
+
+ def testNoFlipWhenProbIsZero(self):
+ numpy_image = np.dstack([[[5., 6.],
+ [9., 0.]],
+ [[4., 3.],
+ [3., 5.]]])
+ image = tf.convert_to_tensor(numpy_image)
+
+ with self.test_session():
+ actual, is_flipped = preprocess_utils.flip_dim([image], prob=0, dim=0)
+ self.assertAllEqual(numpy_image, actual.eval())
+ self.assertAllEqual(False, is_flipped.eval())
+ actual, is_flipped = preprocess_utils.flip_dim([image], prob=0, dim=1)
+ self.assertAllEqual(numpy_image, actual.eval())
+ self.assertAllEqual(False, is_flipped.eval())
+ actual, is_flipped = preprocess_utils.flip_dim([image], prob=0, dim=2)
+ self.assertAllEqual(numpy_image, actual.eval())
+ self.assertAllEqual(False, is_flipped.eval())
+
+ def testFlipWhenProbIsOne(self):
+ numpy_image = np.dstack([[[5., 6.],
+ [9., 0.]],
+ [[4., 3.],
+ [3., 5.]]])
+ dim0_flipped = np.dstack([[[9., 0.],
+ [5., 6.]],
+ [[3., 5.],
+ [4., 3.]]])
+ dim1_flipped = np.dstack([[[6., 5.],
+ [0., 9.]],
+ [[3., 4.],
+ [5., 3.]]])
+ dim2_flipped = np.dstack([[[4., 3.],
+ [3., 5.]],
+ [[5., 6.],
+ [9., 0.]]])
+ image = tf.convert_to_tensor(numpy_image)
+
+ with self.test_session():
+ actual, is_flipped = preprocess_utils.flip_dim([image], prob=1, dim=0)
+ self.assertAllEqual(dim0_flipped, actual.eval())
+ self.assertAllEqual(True, is_flipped.eval())
+ actual, is_flipped = preprocess_utils.flip_dim([image], prob=1, dim=1)
+ self.assertAllEqual(dim1_flipped, actual.eval())
+ self.assertAllEqual(True, is_flipped.eval())
+ actual, is_flipped = preprocess_utils.flip_dim([image], prob=1, dim=2)
+ self.assertAllEqual(dim2_flipped, actual.eval())
+ self.assertAllEqual(True, is_flipped.eval())
+
+ def testFlipMultipleImagesConsistentlyWhenProbIsOne(self):
+ numpy_image = np.dstack([[[5., 6.],
+ [9., 0.]],
+ [[4., 3.],
+ [3., 5.]]])
+ numpy_label = np.dstack([[[0., 1.],
+ [2., 3.]]])
+ image_dim1_flipped = np.dstack([[[6., 5.],
+ [0., 9.]],
+ [[3., 4.],
+ [5., 3.]]])
+ label_dim1_flipped = np.dstack([[[1., 0.],
+ [3., 2.]]])
+ image = tf.convert_to_tensor(numpy_image)
+ label = tf.convert_to_tensor(numpy_label)
+
+ with self.test_session() as sess:
+ image, label, is_flipped = preprocess_utils.flip_dim(
+ [image, label], prob=1, dim=1)
+ actual_image, actual_label = sess.run([image, label])
+ self.assertAllEqual(image_dim1_flipped, actual_image)
+ self.assertAllEqual(label_dim1_flipped, actual_label)
+ self.assertEqual(True, is_flipped.eval())
+
+ def testReturnRandomFlipsOnMultipleEvals(self):
+ numpy_image = np.dstack([[[5., 6.],
+ [9., 0.]],
+ [[4., 3.],
+ [3., 5.]]])
+ dim1_flipped = np.dstack([[[6., 5.],
+ [0., 9.]],
+ [[3., 4.],
+ [5., 3.]]])
+ image = tf.convert_to_tensor(numpy_image)
+ tf.compat.v1.set_random_seed(53)
+
+ with self.test_session() as sess:
+ actual, is_flipped = preprocess_utils.flip_dim(
+ [image], prob=0.5, dim=1)
+ actual_image, actual_is_flipped = sess.run([actual, is_flipped])
+ self.assertAllEqual(numpy_image, actual_image)
+ self.assertEqual(False, actual_is_flipped)
+ actual_image, actual_is_flipped = sess.run([actual, is_flipped])
+ self.assertAllEqual(dim1_flipped, actual_image)
+ self.assertEqual(True, actual_is_flipped)
+
+ def testReturnCorrectCropOfSingleImage(self):
+ np.random.seed(0)
+
+ height, width = 10, 20
+ image = np.random.randint(0, 256, size=(height, width, 3))
+
+ crop_height, crop_width = 2, 4
+
+ image_placeholder = tf.placeholder(tf.int32, shape=(None, None, 3))
+ [cropped] = preprocess_utils.random_crop([image_placeholder],
+ crop_height,
+ crop_width)
+
+ with self.test_session():
+ cropped_image = cropped.eval(feed_dict={image_placeholder: image})
+
+ # Ensure we can find the cropped image in the original:
+ is_found = False
+ for x in range(0, width - crop_width + 1):
+ for y in range(0, height - crop_height + 1):
+ if np.isclose(image[y:y+crop_height, x:x+crop_width, :],
+ cropped_image).all():
+ is_found = True
+ break
+
+ self.assertTrue(is_found)
+
+ def testRandomCropMaintainsNumberOfChannels(self):
+ np.random.seed(0)
+
+ crop_height, crop_width = 10, 20
+ image = np.random.randint(0, 256, size=(100, 200, 3))
+
+ tf.compat.v1.set_random_seed(37)
+ image_placeholder = tf.placeholder(tf.int32, shape=(None, None, 3))
+ [cropped] = preprocess_utils.random_crop(
+ [image_placeholder], crop_height, crop_width)
+
+ with self.test_session():
+ cropped_image = cropped.eval(feed_dict={image_placeholder: image})
+ self.assertTupleEqual(cropped_image.shape, (crop_height, crop_width, 3))
+
+ def testReturnDifferentCropAreasOnTwoEvals(self):
+ tf.compat.v1.set_random_seed(0)
+
+ crop_height, crop_width = 2, 3
+ image = np.random.randint(0, 256, size=(100, 200, 3))
+ image_placeholder = tf.placeholder(tf.int32, shape=(None, None, 3))
+ [cropped] = preprocess_utils.random_crop(
+ [image_placeholder], crop_height, crop_width)
+
+ with self.test_session():
+ crop0 = cropped.eval(feed_dict={image_placeholder: image})
+ crop1 = cropped.eval(feed_dict={image_placeholder: image})
+ self.assertFalse(np.isclose(crop0, crop1).all())
+
+ def testReturnConsistenCropsOfImagesInTheList(self):
+ tf.compat.v1.set_random_seed(0)
+
+ height, width = 10, 20
+ crop_height, crop_width = 2, 3
+ labels = np.linspace(0, height * width-1, height * width)
+ labels = labels.reshape((height, width, 1))
+ image = np.tile(labels, (1, 1, 3))
+
+ image_placeholder = tf.placeholder(tf.int32, shape=(None, None, 3))
+ label_placeholder = tf.placeholder(tf.int32, shape=(None, None, 1))
+ [cropped_image, cropped_label] = preprocess_utils.random_crop(
+ [image_placeholder, label_placeholder], crop_height, crop_width)
+
+ with self.test_session() as sess:
+ cropped_image, cropped_labels = sess.run([cropped_image, cropped_label],
+ feed_dict={
+ image_placeholder: image,
+ label_placeholder: labels})
+ for i in range(3):
+ self.assertAllEqual(cropped_image[:, :, i], cropped_labels.squeeze())
+
+ def testDieOnRandomCropWhenImagesWithDifferentWidth(self):
+ crop_height, crop_width = 2, 3
+ image1 = tf.placeholder(tf.float32, name='image1', shape=(None, None, 3))
+ image2 = tf.placeholder(tf.float32, name='image2', shape=(None, None, 1))
+ cropped = preprocess_utils.random_crop(
+ [image1, image2], crop_height, crop_width)
+
+ with self.test_session() as sess:
+ with self.assertRaises(tf.errors.InvalidArgumentError):
+ sess.run(cropped, feed_dict={image1: np.random.rand(4, 5, 3),
+ image2: np.random.rand(4, 6, 1)})
+
+ def testDieOnRandomCropWhenImagesWithDifferentHeight(self):
+ crop_height, crop_width = 2, 3
+ image1 = tf.placeholder(tf.float32, name='image1', shape=(None, None, 3))
+ image2 = tf.placeholder(tf.float32, name='image2', shape=(None, None, 1))
+ cropped = preprocess_utils.random_crop(
+ [image1, image2], crop_height, crop_width)
+
+ with self.test_session() as sess:
+ with self.assertRaisesWithPredicateMatch(
+ tf.errors.InvalidArgumentError,
+ 'Wrong height for tensor'):
+ sess.run(cropped, feed_dict={image1: np.random.rand(4, 5, 3),
+ image2: np.random.rand(3, 5, 1)})
+
+ def testDieOnRandomCropWhenCropSizeIsGreaterThanImage(self):
+ crop_height, crop_width = 5, 9
+ image1 = tf.placeholder(tf.float32, name='image1', shape=(None, None, 3))
+ image2 = tf.placeholder(tf.float32, name='image2', shape=(None, None, 1))
+ cropped = preprocess_utils.random_crop(
+ [image1, image2], crop_height, crop_width)
+
+ with self.test_session() as sess:
+ with self.assertRaisesWithPredicateMatch(
+ tf.errors.InvalidArgumentError,
+ 'Crop size greater than the image size.'):
+ sess.run(cropped, feed_dict={image1: np.random.rand(4, 5, 3),
+ image2: np.random.rand(4, 5, 1)})
+
+ def testReturnPaddedImageWithNonZeroPadValue(self):
+ for dtype in [np.int32, np.int64, np.float32, np.float64]:
+ image = np.dstack([[[5, 6],
+ [9, 0]],
+ [[4, 3],
+ [3, 5]]]).astype(dtype)
+ expected_image = np.dstack([[[255, 255, 255, 255, 255],
+ [255, 255, 255, 255, 255],
+ [255, 5, 6, 255, 255],
+ [255, 9, 0, 255, 255],
+ [255, 255, 255, 255, 255]],
+ [[255, 255, 255, 255, 255],
+ [255, 255, 255, 255, 255],
+ [255, 4, 3, 255, 255],
+ [255, 3, 5, 255, 255],
+ [255, 255, 255, 255, 255]]]).astype(dtype)
+
+ with self.session() as sess:
+ padded_image = preprocess_utils.pad_to_bounding_box(
+ image, 2, 1, 5, 5, 255)
+ padded_image = sess.run(padded_image)
+ self.assertAllClose(padded_image, expected_image)
+ # Add batch size = 1 to image.
+ padded_image = preprocess_utils.pad_to_bounding_box(
+ np.expand_dims(image, 0), 2, 1, 5, 5, 255)
+ padded_image = sess.run(padded_image)
+ self.assertAllClose(padded_image, np.expand_dims(expected_image, 0))
+
+ def testReturnOriginalImageWhenTargetSizeIsEqualToImageSize(self):
+ image = np.dstack([[[5, 6],
+ [9, 0]],
+ [[4, 3],
+ [3, 5]]])
+ with self.session() as sess:
+ padded_image = preprocess_utils.pad_to_bounding_box(
+ image, 0, 0, 2, 2, 255)
+ padded_image = sess.run(padded_image)
+ self.assertAllClose(padded_image, image)
+
+ def testDieOnTargetSizeGreaterThanImageSize(self):
+ image = np.dstack([[[5, 6],
+ [9, 0]],
+ [[4, 3],
+ [3, 5]]])
+ with self.test_session():
+ image_placeholder = tf.placeholder(tf.float32)
+ padded_image = preprocess_utils.pad_to_bounding_box(
+ image_placeholder, 0, 0, 2, 1, 255)
+ with self.assertRaisesWithPredicateMatch(
+ tf.errors.InvalidArgumentError,
+ 'target_width must be >= width'):
+ padded_image.eval(feed_dict={image_placeholder: image})
+ padded_image = preprocess_utils.pad_to_bounding_box(
+ image_placeholder, 0, 0, 1, 2, 255)
+ with self.assertRaisesWithPredicateMatch(
+ tf.errors.InvalidArgumentError,
+ 'target_height must be >= height'):
+ padded_image.eval(feed_dict={image_placeholder: image})
+
+ def testDieIfTargetSizeNotPossibleWithGivenOffset(self):
+ image = np.dstack([[[5, 6],
+ [9, 0]],
+ [[4, 3],
+ [3, 5]]])
+ with self.test_session():
+ image_placeholder = tf.placeholder(tf.float32)
+ padded_image = preprocess_utils.pad_to_bounding_box(
+ image_placeholder, 3, 0, 4, 4, 255)
+ with self.assertRaisesWithPredicateMatch(
+ tf.errors.InvalidArgumentError,
+ 'target size not possible with the given target offsets'):
+ padded_image.eval(feed_dict={image_placeholder: image})
+
+ def testDieIfImageTensorRankIsTwo(self):
+ image = np.vstack([[5, 6],
+ [9, 0]])
+ with self.test_session():
+ image_placeholder = tf.placeholder(tf.float32)
+ padded_image = preprocess_utils.pad_to_bounding_box(
+ image_placeholder, 0, 0, 2, 2, 255)
+ with self.assertRaisesWithPredicateMatch(
+ tf.errors.InvalidArgumentError,
+ 'Wrong image tensor rank'):
+ padded_image.eval(feed_dict={image_placeholder: image})
+
+ def testResizeTensorsToRange(self):
+ test_shapes = [[60, 40],
+ [15, 30],
+ [15, 50]]
+ min_size = 50
+ max_size = 100
+ factor = None
+ expected_shape_list = [(75, 50, 3),
+ (50, 100, 3),
+ (30, 100, 3)]
+ for i, test_shape in enumerate(test_shapes):
+ image = tf.random.normal([test_shape[0], test_shape[1], 3])
+ new_tensor_list = preprocess_utils.resize_to_range(
+ image=image,
+ label=None,
+ min_size=min_size,
+ max_size=max_size,
+ factor=factor,
+ align_corners=True)
+ with self.test_session() as session:
+ resized_image = session.run(new_tensor_list[0])
+ self.assertEqual(resized_image.shape, expected_shape_list[i])
+
+ def testResizeTensorsToRangeWithFactor(self):
+ test_shapes = [[60, 40],
+ [15, 30],
+ [15, 50]]
+ min_size = 50
+ max_size = 98
+ factor = 8
+ expected_image_shape_list = [(81, 57, 3),
+ (49, 97, 3),
+ (33, 97, 3)]
+ expected_label_shape_list = [(81, 57, 1),
+ (49, 97, 1),
+ (33, 97, 1)]
+ for i, test_shape in enumerate(test_shapes):
+ image = tf.random.normal([test_shape[0], test_shape[1], 3])
+ label = tf.random.normal([test_shape[0], test_shape[1], 1])
+ new_tensor_list = preprocess_utils.resize_to_range(
+ image=image,
+ label=label,
+ min_size=min_size,
+ max_size=max_size,
+ factor=factor,
+ align_corners=True)
+ with self.test_session() as session:
+ new_tensor_list = session.run(new_tensor_list)
+ self.assertEqual(new_tensor_list[0].shape, expected_image_shape_list[i])
+ self.assertEqual(new_tensor_list[1].shape, expected_label_shape_list[i])
+
+ def testResizeTensorsToRangeWithFactorAndLabelShapeCHW(self):
+ test_shapes = [[60, 40],
+ [15, 30],
+ [15, 50]]
+ min_size = 50
+ max_size = 98
+ factor = 8
+ expected_image_shape_list = [(81, 57, 3),
+ (49, 97, 3),
+ (33, 97, 3)]
+ expected_label_shape_list = [(5, 81, 57),
+ (5, 49, 97),
+ (5, 33, 97)]
+ for i, test_shape in enumerate(test_shapes):
+ image = tf.random.normal([test_shape[0], test_shape[1], 3])
+ label = tf.random.normal([5, test_shape[0], test_shape[1]])
+ new_tensor_list = preprocess_utils.resize_to_range(
+ image=image,
+ label=label,
+ min_size=min_size,
+ max_size=max_size,
+ factor=factor,
+ align_corners=True,
+ label_layout_is_chw=True)
+ with self.test_session() as session:
+ new_tensor_list = session.run(new_tensor_list)
+ self.assertEqual(new_tensor_list[0].shape, expected_image_shape_list[i])
+ self.assertEqual(new_tensor_list[1].shape, expected_label_shape_list[i])
+
+ def testResizeTensorsToRangeWithSimilarMinMaxSizes(self):
+ test_shapes = [[60, 40],
+ [15, 30],
+ [15, 50]]
+ # Values set so that one of the side = 97.
+ min_size = 96
+ max_size = 98
+ factor = 8
+ expected_image_shape_list = [(97, 65, 3),
+ (49, 97, 3),
+ (33, 97, 3)]
+ expected_label_shape_list = [(97, 65, 1),
+ (49, 97, 1),
+ (33, 97, 1)]
+ for i, test_shape in enumerate(test_shapes):
+ image = tf.random.normal([test_shape[0], test_shape[1], 3])
+ label = tf.random.normal([test_shape[0], test_shape[1], 1])
+ new_tensor_list = preprocess_utils.resize_to_range(
+ image=image,
+ label=label,
+ min_size=min_size,
+ max_size=max_size,
+ factor=factor,
+ align_corners=True)
+ with self.test_session() as session:
+ new_tensor_list = session.run(new_tensor_list)
+ self.assertEqual(new_tensor_list[0].shape, expected_image_shape_list[i])
+ self.assertEqual(new_tensor_list[1].shape, expected_label_shape_list[i])
+
+ def testResizeTensorsToRangeWithEqualMaxSize(self):
+ test_shapes = [[97, 38],
+ [96, 97]]
+ # Make max_size equal to the larger value of test_shapes.
+ min_size = 97
+ max_size = 97
+ factor = 8
+ expected_image_shape_list = [(97, 41, 3),
+ (97, 97, 3)]
+ expected_label_shape_list = [(97, 41, 1),
+ (97, 97, 1)]
+ for i, test_shape in enumerate(test_shapes):
+ image = tf.random.normal([test_shape[0], test_shape[1], 3])
+ label = tf.random.normal([test_shape[0], test_shape[1], 1])
+ new_tensor_list = preprocess_utils.resize_to_range(
+ image=image,
+ label=label,
+ min_size=min_size,
+ max_size=max_size,
+ factor=factor,
+ align_corners=True)
+ with self.test_session() as session:
+ new_tensor_list = session.run(new_tensor_list)
+ self.assertEqual(new_tensor_list[0].shape, expected_image_shape_list[i])
+ self.assertEqual(new_tensor_list[1].shape, expected_label_shape_list[i])
+
+ def testResizeTensorsToRangeWithPotentialErrorInTFCeil(self):
+ test_shape = [3936, 5248]
+ # Make max_size equal to the larger value of test_shapes.
+ min_size = 1441
+ max_size = 1441
+ factor = 16
+ expected_image_shape = (1089, 1441, 3)
+ expected_label_shape = (1089, 1441, 1)
+ image = tf.random.normal([test_shape[0], test_shape[1], 3])
+ label = tf.random.normal([test_shape[0], test_shape[1], 1])
+ new_tensor_list = preprocess_utils.resize_to_range(
+ image=image,
+ label=label,
+ min_size=min_size,
+ max_size=max_size,
+ factor=factor,
+ align_corners=True)
+ with self.test_session() as session:
+ new_tensor_list = session.run(new_tensor_list)
+ self.assertEqual(new_tensor_list[0].shape, expected_image_shape)
+ self.assertEqual(new_tensor_list[1].shape, expected_label_shape)
+
+ def testResizeTensorsToRangeWithEqualMaxSizeWithoutAspectRatio(self):
+ test_shapes = [[97, 38],
+ [96, 97]]
+ # Make max_size equal to the larger value of test_shapes.
+ min_size = 97
+ max_size = 97
+ factor = 8
+ keep_aspect_ratio = False
+ expected_image_shape_list = [(97, 97, 3),
+ (97, 97, 3)]
+ expected_label_shape_list = [(97, 97, 1),
+ (97, 97, 1)]
+ for i, test_shape in enumerate(test_shapes):
+ image = tf.random.normal([test_shape[0], test_shape[1], 3])
+ label = tf.random.normal([test_shape[0], test_shape[1], 1])
+ new_tensor_list = preprocess_utils.resize_to_range(
+ image=image,
+ label=label,
+ min_size=min_size,
+ max_size=max_size,
+ factor=factor,
+ keep_aspect_ratio=keep_aspect_ratio,
+ align_corners=True)
+ with self.test_session() as session:
+ new_tensor_list = session.run(new_tensor_list)
+ self.assertEqual(new_tensor_list[0].shape, expected_image_shape_list[i])
+ self.assertEqual(new_tensor_list[1].shape, expected_label_shape_list[i])
+
+
+if __name__ == '__main__':
+ tf.test.main()
diff --git a/deeplab/models/research/deeplab/core/resnet_v1_beta.py b/deeplab/models/research/deeplab/core/resnet_v1_beta.py
new file mode 100644
index 0000000..0d5f1f1
--- /dev/null
+++ b/deeplab/models/research/deeplab/core/resnet_v1_beta.py
@@ -0,0 +1,827 @@
+# Lint as: python2, python3
+# Copyright 2018 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+
+"""Resnet v1 model variants.
+
+Code branched out from slim/nets/resnet_v1.py, and please refer to it for
+more details.
+
+The original version ResNets-v1 were proposed by:
+[1] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun
+ Deep Residual Learning for Image Recognition. arXiv:1512.03385
+"""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import functools
+from six.moves import range
+import tensorflow as tf
+from tensorflow.contrib import slim as contrib_slim
+from deeplab.core import conv2d_ws
+from deeplab.core import utils
+from tensorflow.contrib.slim.nets import resnet_utils
+
+slim = contrib_slim
+
+_DEFAULT_MULTI_GRID = [1, 1, 1]
+_DEFAULT_MULTI_GRID_RESNET_18 = [1, 1]
+
+
+@slim.add_arg_scope
+def bottleneck(inputs,
+ depth,
+ depth_bottleneck,
+ stride,
+ unit_rate=1,
+ rate=1,
+ outputs_collections=None,
+ scope=None):
+ """Bottleneck residual unit variant with BN after convolutions.
+
+ This is the original residual unit proposed in [1]. See Fig. 1(a) of [2] for
+ its definition. Note that we use here the bottleneck variant which has an
+ extra bottleneck layer.
+
+ When putting together two consecutive ResNet blocks that use this unit, one
+ should use stride = 2 in the last unit of the first block.
+
+ Args:
+ inputs: A tensor of size [batch, height, width, channels].
+ depth: The depth of the ResNet unit output.
+ depth_bottleneck: The depth of the bottleneck layers.
+ stride: The ResNet unit's stride. Determines the amount of downsampling of
+ the units output compared to its input.
+ unit_rate: An integer, unit rate for atrous convolution.
+ rate: An integer, rate for atrous convolution.
+ outputs_collections: Collection to add the ResNet unit output.
+ scope: Optional variable_scope.
+
+ Returns:
+ The ResNet unit's output.
+ """
+ with tf.variable_scope(scope, 'bottleneck_v1', [inputs]) as sc:
+ depth_in = slim.utils.last_dimension(inputs.get_shape(), min_rank=4)
+ if depth == depth_in:
+ shortcut = resnet_utils.subsample(inputs, stride, 'shortcut')
+ else:
+ shortcut = conv2d_ws.conv2d(
+ inputs,
+ depth,
+ [1, 1],
+ stride=stride,
+ activation_fn=None,
+ scope='shortcut')
+
+ residual = conv2d_ws.conv2d(inputs, depth_bottleneck, [1, 1], stride=1,
+ scope='conv1')
+ residual = conv2d_ws.conv2d_same(residual, depth_bottleneck, 3, stride,
+ rate=rate*unit_rate, scope='conv2')
+ residual = conv2d_ws.conv2d(residual, depth, [1, 1], stride=1,
+ activation_fn=None, scope='conv3')
+ output = tf.nn.relu(shortcut + residual)
+
+ return slim.utils.collect_named_outputs(outputs_collections, sc.name,
+ output)
+
+
+@slim.add_arg_scope
+def lite_bottleneck(inputs,
+ depth,
+ stride,
+ unit_rate=1,
+ rate=1,
+ outputs_collections=None,
+ scope=None):
+ """Bottleneck residual unit variant with BN after convolutions.
+
+ This is the original residual unit proposed in [1]. See Fig. 1(a) of [2] for
+ its definition. Note that we use here the bottleneck variant which has an
+ extra bottleneck layer.
+
+ When putting together two consecutive ResNet blocks that use this unit, one
+ should use stride = 2 in the last unit of the first block.
+
+ Args:
+ inputs: A tensor of size [batch, height, width, channels].
+ depth: The depth of the ResNet unit output.
+ stride: The ResNet unit's stride. Determines the amount of downsampling of
+ the units output compared to its input.
+ unit_rate: An integer, unit rate for atrous convolution.
+ rate: An integer, rate for atrous convolution.
+ outputs_collections: Collection to add the ResNet unit output.
+ scope: Optional variable_scope.
+
+ Returns:
+ The ResNet unit's output.
+ """
+ with tf.variable_scope(scope, 'lite_bottleneck_v1', [inputs]) as sc:
+ depth_in = slim.utils.last_dimension(inputs.get_shape(), min_rank=4)
+ if depth == depth_in:
+ shortcut = resnet_utils.subsample(inputs, stride, 'shortcut')
+ else:
+ shortcut = conv2d_ws.conv2d(
+ inputs,
+ depth, [1, 1],
+ stride=stride,
+ activation_fn=None,
+ scope='shortcut')
+
+ residual = conv2d_ws.conv2d_same(
+ inputs, depth, 3, 1, rate=rate * unit_rate, scope='conv1')
+ with slim.arg_scope([conv2d_ws.conv2d], activation_fn=None):
+ residual = conv2d_ws.conv2d_same(
+ residual, depth, 3, stride, rate=rate * unit_rate, scope='conv2')
+ output = tf.nn.relu(shortcut + residual)
+
+ return slim.utils.collect_named_outputs(outputs_collections, sc.name,
+ output)
+
+
+def root_block_fn_for_beta_variant(net, depth_multiplier=1.0):
+ """Gets root_block_fn for beta variant.
+
+ ResNet-v1 beta variant modifies the first original 7x7 convolution to three
+ 3x3 convolutions.
+
+ Args:
+ net: A tensor of size [batch, height, width, channels], input to the model.
+ depth_multiplier: Controls the number of convolution output channels for
+ each input channel. The total number of depthwise convolution output
+ channels will be equal to `num_filters_out * depth_multiplier`.
+
+ Returns:
+ A tensor after three 3x3 convolutions.
+ """
+ net = conv2d_ws.conv2d_same(
+ net, int(64 * depth_multiplier), 3, stride=2, scope='conv1_1')
+ net = conv2d_ws.conv2d_same(
+ net, int(64 * depth_multiplier), 3, stride=1, scope='conv1_2')
+ net = conv2d_ws.conv2d_same(
+ net, int(128 * depth_multiplier), 3, stride=1, scope='conv1_3')
+
+ return net
+
+
+def resnet_v1_beta(inputs,
+ blocks,
+ num_classes=None,
+ is_training=None,
+ global_pool=True,
+ output_stride=None,
+ root_block_fn=None,
+ reuse=None,
+ scope=None,
+ sync_batch_norm_method='None'):
+ """Generator for v1 ResNet models (beta variant).
+
+ This function generates a family of modified ResNet v1 models. In particular,
+ the first original 7x7 convolution is replaced with three 3x3 convolutions.
+ See the resnet_v1_*() methods for specific model instantiations, obtained by
+ selecting different block instantiations that produce ResNets of various
+ depths.
+
+ The code is modified from slim/nets/resnet_v1.py, and please refer to it for
+ more details.
+
+ Args:
+ inputs: A tensor of size [batch, height_in, width_in, channels].
+ blocks: A list of length equal to the number of ResNet blocks. Each element
+ is a resnet_utils.Block object describing the units in the block.
+ num_classes: Number of predicted classes for classification tasks. If None
+ we return the features before the logit layer.
+ is_training: Enable/disable is_training for batch normalization.
+ global_pool: If True, we perform global average pooling before computing the
+ logits. Set to True for image classification, False for dense prediction.
+ output_stride: If None, then the output will be computed at the nominal
+ network stride. If output_stride is not None, it specifies the requested
+ ratio of input to output spatial resolution.
+ root_block_fn: The function consisting of convolution operations applied to
+ the root input. If root_block_fn is None, use the original setting of
+ RseNet-v1, which is simply one convolution with 7x7 kernel and stride=2.
+ reuse: whether or not the network and its variables should be reused. To be
+ able to reuse 'scope' must be given.
+ scope: Optional variable_scope.
+ sync_batch_norm_method: String, sync batchnorm method.
+
+ Returns:
+ net: A rank-4 tensor of size [batch, height_out, width_out, channels_out].
+ If global_pool is False, then height_out and width_out are reduced by a
+ factor of output_stride compared to the respective height_in and width_in,
+ else both height_out and width_out equal one. If num_classes is None, then
+ net is the output of the last ResNet block, potentially after global
+ average pooling. If num_classes is not None, net contains the pre-softmax
+ activations.
+ end_points: A dictionary from components of the network to the corresponding
+ activation.
+
+ Raises:
+ ValueError: If the target output_stride is not valid.
+ """
+ if root_block_fn is None:
+ root_block_fn = functools.partial(conv2d_ws.conv2d_same,
+ num_outputs=64,
+ kernel_size=7,
+ stride=2,
+ scope='conv1')
+ batch_norm = utils.get_batch_norm_fn(sync_batch_norm_method)
+ with tf.variable_scope(scope, 'resnet_v1', [inputs], reuse=reuse) as sc:
+ end_points_collection = sc.original_name_scope + '_end_points'
+ with slim.arg_scope([
+ conv2d_ws.conv2d, bottleneck, lite_bottleneck,
+ resnet_utils.stack_blocks_dense
+ ],
+ outputs_collections=end_points_collection):
+ if is_training is not None:
+ arg_scope = slim.arg_scope([batch_norm], is_training=is_training)
+ else:
+ arg_scope = slim.arg_scope([])
+ with arg_scope:
+ net = inputs
+ if output_stride is not None:
+ if output_stride % 4 != 0:
+ raise ValueError('The output_stride needs to be a multiple of 4.')
+ output_stride //= 4
+ net = root_block_fn(net)
+ net = slim.max_pool2d(net, 3, stride=2, padding='SAME', scope='pool1')
+ net = resnet_utils.stack_blocks_dense(net, blocks, output_stride)
+
+ if global_pool:
+ # Global average pooling.
+ net = tf.reduce_mean(net, [1, 2], name='pool5', keepdims=True)
+ if num_classes is not None:
+ net = conv2d_ws.conv2d(net, num_classes, [1, 1], activation_fn=None,
+ normalizer_fn=None, scope='logits',
+ use_weight_standardization=False)
+ # Convert end_points_collection into a dictionary of end_points.
+ end_points = slim.utils.convert_collection_to_dict(
+ end_points_collection)
+ if num_classes is not None:
+ end_points['predictions'] = slim.softmax(net, scope='predictions')
+ return net, end_points
+
+
+def resnet_v1_beta_block(scope, base_depth, num_units, stride):
+ """Helper function for creating a resnet_v1 beta variant bottleneck block.
+
+ Args:
+ scope: The scope of the block.
+ base_depth: The depth of the bottleneck layer for each unit.
+ num_units: The number of units in the block.
+ stride: The stride of the block, implemented as a stride in the last unit.
+ All other units have stride=1.
+
+ Returns:
+ A resnet_v1 bottleneck block.
+ """
+ return resnet_utils.Block(scope, bottleneck, [{
+ 'depth': base_depth * 4,
+ 'depth_bottleneck': base_depth,
+ 'stride': 1,
+ 'unit_rate': 1
+ }] * (num_units - 1) + [{
+ 'depth': base_depth * 4,
+ 'depth_bottleneck': base_depth,
+ 'stride': stride,
+ 'unit_rate': 1
+ }])
+
+
+def resnet_v1_small_beta_block(scope, base_depth, num_units, stride):
+ """Helper function for creating a resnet_18 beta variant bottleneck block.
+
+ Args:
+ scope: The scope of the block.
+ base_depth: The depth of the bottleneck layer for each unit.
+ num_units: The number of units in the block.
+ stride: The stride of the block, implemented as a stride in the last unit.
+ All other units have stride=1.
+
+ Returns:
+ A resnet_18 bottleneck block.
+ """
+ block_args = []
+ for _ in range(num_units - 1):
+ block_args.append({'depth': base_depth, 'stride': 1, 'unit_rate': 1})
+ block_args.append({'depth': base_depth, 'stride': stride, 'unit_rate': 1})
+ return resnet_utils.Block(scope, lite_bottleneck, block_args)
+
+
+def resnet_v1_18(inputs,
+ num_classes=None,
+ is_training=None,
+ global_pool=False,
+ output_stride=None,
+ multi_grid=None,
+ reuse=None,
+ scope='resnet_v1_18',
+ sync_batch_norm_method='None'):
+ """Resnet v1 18.
+
+ Args:
+ inputs: A tensor of size [batch, height_in, width_in, channels].
+ num_classes: Number of predicted classes for classification tasks. If None
+ we return the features before the logit layer.
+ is_training: Enable/disable is_training for batch normalization.
+ global_pool: If True, we perform global average pooling before computing the
+ logits. Set to True for image classification, False for dense prediction.
+ output_stride: If None, then the output will be computed at the nominal
+ network stride. If output_stride is not None, it specifies the requested
+ ratio of input to output spatial resolution.
+ multi_grid: Employ a hierarchy of different atrous rates within network.
+ reuse: whether or not the network and its variables should be reused. To be
+ able to reuse 'scope' must be given.
+ scope: Optional variable_scope.
+ sync_batch_norm_method: String, sync batchnorm method.
+
+ Returns:
+ net: A rank-4 tensor of size [batch, height_out, width_out, channels_out].
+ If global_pool is False, then height_out and width_out are reduced by a
+ factor of output_stride compared to the respective height_in and width_in,
+ else both height_out and width_out equal one. If num_classes is None, then
+ net is the output of the last ResNet block, potentially after global
+ average pooling. If num_classes is not None, net contains the pre-softmax
+ activations.
+ end_points: A dictionary from components of the network to the corresponding
+ activation.
+
+ Raises:
+ ValueError: if multi_grid is not None and does not have length = 3.
+ """
+ if multi_grid is None:
+ multi_grid = _DEFAULT_MULTI_GRID_RESNET_18
+ else:
+ if len(multi_grid) != 2:
+ raise ValueError('Expect multi_grid to have length 2.')
+
+ block4_args = []
+ for rate in multi_grid:
+ block4_args.append({'depth': 512, 'stride': 1, 'unit_rate': rate})
+
+ blocks = [
+ resnet_v1_small_beta_block(
+ 'block1', base_depth=64, num_units=2, stride=2),
+ resnet_v1_small_beta_block(
+ 'block2', base_depth=128, num_units=2, stride=2),
+ resnet_v1_small_beta_block(
+ 'block3', base_depth=256, num_units=2, stride=2),
+ resnet_utils.Block('block4', lite_bottleneck, block4_args),
+ ]
+ return resnet_v1_beta(
+ inputs,
+ blocks=blocks,
+ num_classes=num_classes,
+ is_training=is_training,
+ global_pool=global_pool,
+ output_stride=output_stride,
+ reuse=reuse,
+ scope=scope,
+ sync_batch_norm_method=sync_batch_norm_method)
+
+
+def resnet_v1_18_beta(inputs,
+ num_classes=None,
+ is_training=None,
+ global_pool=False,
+ output_stride=None,
+ multi_grid=None,
+ root_depth_multiplier=0.25,
+ reuse=None,
+ scope='resnet_v1_18',
+ sync_batch_norm_method='None'):
+ """Resnet v1 18 beta variant.
+
+ This variant modifies the first convolution layer of ResNet-v1-18. In
+ particular, it changes the original one 7x7 convolution to three 3x3
+ convolutions.
+
+ Args:
+ inputs: A tensor of size [batch, height_in, width_in, channels].
+ num_classes: Number of predicted classes for classification tasks. If None
+ we return the features before the logit layer.
+ is_training: Enable/disable is_training for batch normalization.
+ global_pool: If True, we perform global average pooling before computing the
+ logits. Set to True for image classification, False for dense prediction.
+ output_stride: If None, then the output will be computed at the nominal
+ network stride. If output_stride is not None, it specifies the requested
+ ratio of input to output spatial resolution.
+ multi_grid: Employ a hierarchy of different atrous rates within network.
+ root_depth_multiplier: Float, depth multiplier used for the first three
+ convolution layers that replace the 7x7 convolution.
+ reuse: whether or not the network and its variables should be reused. To be
+ able to reuse 'scope' must be given.
+ scope: Optional variable_scope.
+ sync_batch_norm_method: String, sync batchnorm method.
+
+ Returns:
+ net: A rank-4 tensor of size [batch, height_out, width_out, channels_out].
+ If global_pool is False, then height_out and width_out are reduced by a
+ factor of output_stride compared to the respective height_in and width_in,
+ else both height_out and width_out equal one. If num_classes is None, then
+ net is the output of the last ResNet block, potentially after global
+ average pooling. If num_classes is not None, net contains the pre-softmax
+ activations.
+ end_points: A dictionary from components of the network to the corresponding
+ activation.
+
+ Raises:
+ ValueError: if multi_grid is not None and does not have length = 3.
+ """
+ if multi_grid is None:
+ multi_grid = _DEFAULT_MULTI_GRID_RESNET_18
+ else:
+ if len(multi_grid) != 2:
+ raise ValueError('Expect multi_grid to have length 2.')
+
+ block4_args = []
+ for rate in multi_grid:
+ block4_args.append({'depth': 512, 'stride': 1, 'unit_rate': rate})
+
+ blocks = [
+ resnet_v1_small_beta_block(
+ 'block1', base_depth=64, num_units=2, stride=2),
+ resnet_v1_small_beta_block(
+ 'block2', base_depth=128, num_units=2, stride=2),
+ resnet_v1_small_beta_block(
+ 'block3', base_depth=256, num_units=2, stride=2),
+ resnet_utils.Block('block4', lite_bottleneck, block4_args),
+ ]
+ return resnet_v1_beta(
+ inputs,
+ blocks=blocks,
+ num_classes=num_classes,
+ is_training=is_training,
+ global_pool=global_pool,
+ output_stride=output_stride,
+ root_block_fn=functools.partial(root_block_fn_for_beta_variant,
+ depth_multiplier=root_depth_multiplier),
+ reuse=reuse,
+ scope=scope,
+ sync_batch_norm_method=sync_batch_norm_method)
+
+
+def resnet_v1_50(inputs,
+ num_classes=None,
+ is_training=None,
+ global_pool=False,
+ output_stride=None,
+ multi_grid=None,
+ reuse=None,
+ scope='resnet_v1_50',
+ sync_batch_norm_method='None'):
+ """Resnet v1 50.
+
+ Args:
+ inputs: A tensor of size [batch, height_in, width_in, channels].
+ num_classes: Number of predicted classes for classification tasks. If None
+ we return the features before the logit layer.
+ is_training: Enable/disable is_training for batch normalization.
+ global_pool: If True, we perform global average pooling before computing the
+ logits. Set to True for image classification, False for dense prediction.
+ output_stride: If None, then the output will be computed at the nominal
+ network stride. If output_stride is not None, it specifies the requested
+ ratio of input to output spatial resolution.
+ multi_grid: Employ a hierarchy of different atrous rates within network.
+ reuse: whether or not the network and its variables should be reused. To be
+ able to reuse 'scope' must be given.
+ scope: Optional variable_scope.
+ sync_batch_norm_method: String, sync batchnorm method.
+
+ Returns:
+ net: A rank-4 tensor of size [batch, height_out, width_out, channels_out].
+ If global_pool is False, then height_out and width_out are reduced by a
+ factor of output_stride compared to the respective height_in and width_in,
+ else both height_out and width_out equal one. If num_classes is None, then
+ net is the output of the last ResNet block, potentially after global
+ average pooling. If num_classes is not None, net contains the pre-softmax
+ activations.
+ end_points: A dictionary from components of the network to the corresponding
+ activation.
+
+ Raises:
+ ValueError: if multi_grid is not None and does not have length = 3.
+ """
+ if multi_grid is None:
+ multi_grid = _DEFAULT_MULTI_GRID
+ else:
+ if len(multi_grid) != 3:
+ raise ValueError('Expect multi_grid to have length 3.')
+
+ blocks = [
+ resnet_v1_beta_block(
+ 'block1', base_depth=64, num_units=3, stride=2),
+ resnet_v1_beta_block(
+ 'block2', base_depth=128, num_units=4, stride=2),
+ resnet_v1_beta_block(
+ 'block3', base_depth=256, num_units=6, stride=2),
+ resnet_utils.Block('block4', bottleneck, [
+ {'depth': 2048, 'depth_bottleneck': 512, 'stride': 1,
+ 'unit_rate': rate} for rate in multi_grid]),
+ ]
+ return resnet_v1_beta(
+ inputs,
+ blocks=blocks,
+ num_classes=num_classes,
+ is_training=is_training,
+ global_pool=global_pool,
+ output_stride=output_stride,
+ reuse=reuse,
+ scope=scope,
+ sync_batch_norm_method=sync_batch_norm_method)
+
+
+def resnet_v1_50_beta(inputs,
+ num_classes=None,
+ is_training=None,
+ global_pool=False,
+ output_stride=None,
+ multi_grid=None,
+ reuse=None,
+ scope='resnet_v1_50',
+ sync_batch_norm_method='None'):
+ """Resnet v1 50 beta variant.
+
+ This variant modifies the first convolution layer of ResNet-v1-50. In
+ particular, it changes the original one 7x7 convolution to three 3x3
+ convolutions.
+
+ Args:
+ inputs: A tensor of size [batch, height_in, width_in, channels].
+ num_classes: Number of predicted classes for classification tasks. If None
+ we return the features before the logit layer.
+ is_training: Enable/disable is_training for batch normalization.
+ global_pool: If True, we perform global average pooling before computing the
+ logits. Set to True for image classification, False for dense prediction.
+ output_stride: If None, then the output will be computed at the nominal
+ network stride. If output_stride is not None, it specifies the requested
+ ratio of input to output spatial resolution.
+ multi_grid: Employ a hierarchy of different atrous rates within network.
+ reuse: whether or not the network and its variables should be reused. To be
+ able to reuse 'scope' must be given.
+ scope: Optional variable_scope.
+ sync_batch_norm_method: String, sync batchnorm method.
+
+ Returns:
+ net: A rank-4 tensor of size [batch, height_out, width_out, channels_out].
+ If global_pool is False, then height_out and width_out are reduced by a
+ factor of output_stride compared to the respective height_in and width_in,
+ else both height_out and width_out equal one. If num_classes is None, then
+ net is the output of the last ResNet block, potentially after global
+ average pooling. If num_classes is not None, net contains the pre-softmax
+ activations.
+ end_points: A dictionary from components of the network to the corresponding
+ activation.
+
+ Raises:
+ ValueError: if multi_grid is not None and does not have length = 3.
+ """
+ if multi_grid is None:
+ multi_grid = _DEFAULT_MULTI_GRID
+ else:
+ if len(multi_grid) != 3:
+ raise ValueError('Expect multi_grid to have length 3.')
+
+ blocks = [
+ resnet_v1_beta_block(
+ 'block1', base_depth=64, num_units=3, stride=2),
+ resnet_v1_beta_block(
+ 'block2', base_depth=128, num_units=4, stride=2),
+ resnet_v1_beta_block(
+ 'block3', base_depth=256, num_units=6, stride=2),
+ resnet_utils.Block('block4', bottleneck, [
+ {'depth': 2048, 'depth_bottleneck': 512, 'stride': 1,
+ 'unit_rate': rate} for rate in multi_grid]),
+ ]
+ return resnet_v1_beta(
+ inputs,
+ blocks=blocks,
+ num_classes=num_classes,
+ is_training=is_training,
+ global_pool=global_pool,
+ output_stride=output_stride,
+ root_block_fn=functools.partial(root_block_fn_for_beta_variant),
+ reuse=reuse,
+ scope=scope,
+ sync_batch_norm_method=sync_batch_norm_method)
+
+
+def resnet_v1_101(inputs,
+ num_classes=None,
+ is_training=None,
+ global_pool=False,
+ output_stride=None,
+ multi_grid=None,
+ reuse=None,
+ scope='resnet_v1_101',
+ sync_batch_norm_method='None'):
+ """Resnet v1 101.
+
+ Args:
+ inputs: A tensor of size [batch, height_in, width_in, channels].
+ num_classes: Number of predicted classes for classification tasks. If None
+ we return the features before the logit layer.
+ is_training: Enable/disable is_training for batch normalization.
+ global_pool: If True, we perform global average pooling before computing the
+ logits. Set to True for image classification, False for dense prediction.
+ output_stride: If None, then the output will be computed at the nominal
+ network stride. If output_stride is not None, it specifies the requested
+ ratio of input to output spatial resolution.
+ multi_grid: Employ a hierarchy of different atrous rates within network.
+ reuse: whether or not the network and its variables should be reused. To be
+ able to reuse 'scope' must be given.
+ scope: Optional variable_scope.
+ sync_batch_norm_method: String, sync batchnorm method.
+
+ Returns:
+ net: A rank-4 tensor of size [batch, height_out, width_out, channels_out].
+ If global_pool is False, then height_out and width_out are reduced by a
+ factor of output_stride compared to the respective height_in and width_in,
+ else both height_out and width_out equal one. If num_classes is None, then
+ net is the output of the last ResNet block, potentially after global
+ average pooling. If num_classes is not None, net contains the pre-softmax
+ activations.
+ end_points: A dictionary from components of the network to the corresponding
+ activation.
+
+ Raises:
+ ValueError: if multi_grid is not None and does not have length = 3.
+ """
+ if multi_grid is None:
+ multi_grid = _DEFAULT_MULTI_GRID
+ else:
+ if len(multi_grid) != 3:
+ raise ValueError('Expect multi_grid to have length 3.')
+
+ blocks = [
+ resnet_v1_beta_block(
+ 'block1', base_depth=64, num_units=3, stride=2),
+ resnet_v1_beta_block(
+ 'block2', base_depth=128, num_units=4, stride=2),
+ resnet_v1_beta_block(
+ 'block3', base_depth=256, num_units=23, stride=2),
+ resnet_utils.Block('block4', bottleneck, [
+ {'depth': 2048, 'depth_bottleneck': 512, 'stride': 1,
+ 'unit_rate': rate} for rate in multi_grid]),
+ ]
+ return resnet_v1_beta(
+ inputs,
+ blocks=blocks,
+ num_classes=num_classes,
+ is_training=is_training,
+ global_pool=global_pool,
+ output_stride=output_stride,
+ reuse=reuse,
+ scope=scope,
+ sync_batch_norm_method=sync_batch_norm_method)
+
+
+def resnet_v1_101_beta(inputs,
+ num_classes=None,
+ is_training=None,
+ global_pool=False,
+ output_stride=None,
+ multi_grid=None,
+ reuse=None,
+ scope='resnet_v1_101',
+ sync_batch_norm_method='None'):
+ """Resnet v1 101 beta variant.
+
+ This variant modifies the first convolution layer of ResNet-v1-101. In
+ particular, it changes the original one 7x7 convolution to three 3x3
+ convolutions.
+
+ Args:
+ inputs: A tensor of size [batch, height_in, width_in, channels].
+ num_classes: Number of predicted classes for classification tasks. If None
+ we return the features before the logit layer.
+ is_training: Enable/disable is_training for batch normalization.
+ global_pool: If True, we perform global average pooling before computing the
+ logits. Set to True for image classification, False for dense prediction.
+ output_stride: If None, then the output will be computed at the nominal
+ network stride. If output_stride is not None, it specifies the requested
+ ratio of input to output spatial resolution.
+ multi_grid: Employ a hierarchy of different atrous rates within network.
+ reuse: whether or not the network and its variables should be reused. To be
+ able to reuse 'scope' must be given.
+ scope: Optional variable_scope.
+ sync_batch_norm_method: String, sync batchnorm method.
+
+ Returns:
+ net: A rank-4 tensor of size [batch, height_out, width_out, channels_out].
+ If global_pool is False, then height_out and width_out are reduced by a
+ factor of output_stride compared to the respective height_in and width_in,
+ else both height_out and width_out equal one. If num_classes is None, then
+ net is the output of the last ResNet block, potentially after global
+ average pooling. If num_classes is not None, net contains the pre-softmax
+ activations.
+ end_points: A dictionary from components of the network to the corresponding
+ activation.
+
+ Raises:
+ ValueError: if multi_grid is not None and does not have length = 3.
+ """
+ if multi_grid is None:
+ multi_grid = _DEFAULT_MULTI_GRID
+ else:
+ if len(multi_grid) != 3:
+ raise ValueError('Expect multi_grid to have length 3.')
+
+ blocks = [
+ resnet_v1_beta_block(
+ 'block1', base_depth=64, num_units=3, stride=2),
+ resnet_v1_beta_block(
+ 'block2', base_depth=128, num_units=4, stride=2),
+ resnet_v1_beta_block(
+ 'block3', base_depth=256, num_units=23, stride=2),
+ resnet_utils.Block('block4', bottleneck, [
+ {'depth': 2048, 'depth_bottleneck': 512, 'stride': 1,
+ 'unit_rate': rate} for rate in multi_grid]),
+ ]
+ return resnet_v1_beta(
+ inputs,
+ blocks=blocks,
+ num_classes=num_classes,
+ is_training=is_training,
+ global_pool=global_pool,
+ output_stride=output_stride,
+ root_block_fn=functools.partial(root_block_fn_for_beta_variant),
+ reuse=reuse,
+ scope=scope,
+ sync_batch_norm_method=sync_batch_norm_method)
+
+
+def resnet_arg_scope(weight_decay=0.0001,
+ batch_norm_decay=0.997,
+ batch_norm_epsilon=1e-5,
+ batch_norm_scale=True,
+ activation_fn=tf.nn.relu,
+ use_batch_norm=True,
+ sync_batch_norm_method='None',
+ normalization_method='unspecified',
+ use_weight_standardization=False):
+ """Defines the default ResNet arg scope.
+
+ Args:
+ weight_decay: The weight decay to use for regularizing the model.
+ batch_norm_decay: The moving average decay when estimating layer activation
+ statistics in batch normalization.
+ batch_norm_epsilon: Small constant to prevent division by zero when
+ normalizing activations by their variance in batch normalization.
+ batch_norm_scale: If True, uses an explicit `gamma` multiplier to scale the
+ activations in the batch normalization layer.
+ activation_fn: The activation function which is used in ResNet.
+ use_batch_norm: Deprecated in favor of normalization_method.
+ sync_batch_norm_method: String, sync batchnorm method.
+ normalization_method: String, one of `batch`, `none`, or `group`, to use
+ batch normalization, no normalization, or group normalization.
+ use_weight_standardization: Boolean, whether to use weight standardization.
+
+ Returns:
+ An `arg_scope` to use for the resnet models.
+ """
+ batch_norm_params = {
+ 'decay': batch_norm_decay,
+ 'epsilon': batch_norm_epsilon,
+ 'scale': batch_norm_scale,
+ }
+ batch_norm = utils.get_batch_norm_fn(sync_batch_norm_method)
+ if normalization_method == 'batch':
+ normalizer_fn = batch_norm
+ elif normalization_method == 'none':
+ normalizer_fn = None
+ elif normalization_method == 'group':
+ normalizer_fn = slim.group_norm
+ elif normalization_method == 'unspecified':
+ normalizer_fn = batch_norm if use_batch_norm else None
+ else:
+ raise ValueError('Unrecognized normalization_method %s' %
+ normalization_method)
+
+ with slim.arg_scope([conv2d_ws.conv2d],
+ weights_regularizer=slim.l2_regularizer(weight_decay),
+ weights_initializer=slim.variance_scaling_initializer(),
+ activation_fn=activation_fn,
+ normalizer_fn=normalizer_fn,
+ use_weight_standardization=use_weight_standardization):
+ with slim.arg_scope([batch_norm], **batch_norm_params):
+ # The following implies padding='SAME' for pool1, which makes feature
+ # alignment easier for dense prediction tasks. This is also used in
+ # https://github.com/facebook/fb.resnet.torch. However the accompanying
+ # code of 'Deep Residual Learning for Image Recognition' uses
+ # padding='VALID' for pool1. You can switch to that choice by setting
+ # slim.arg_scope([slim.max_pool2d], padding='VALID').
+ with slim.arg_scope([slim.max_pool2d], padding='SAME') as arg_sc:
+ return arg_sc
diff --git a/deeplab/models/research/deeplab/core/resnet_v1_beta_test.py b/deeplab/models/research/deeplab/core/resnet_v1_beta_test.py
new file mode 100644
index 0000000..8b61edc
--- /dev/null
+++ b/deeplab/models/research/deeplab/core/resnet_v1_beta_test.py
@@ -0,0 +1,564 @@
+# Lint as: python2, python3
+# Copyright 2018 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+
+"""Tests for resnet_v1_beta module."""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import functools
+
+import numpy as np
+import six
+import tensorflow as tf
+from tensorflow.contrib import slim as contrib_slim
+
+from deeplab.core import resnet_v1_beta
+from tensorflow.contrib.slim.nets import resnet_utils
+
+slim = contrib_slim
+
+
+def create_test_input(batch, height, width, channels):
+ """Create test input tensor."""
+ if None in [batch, height, width, channels]:
+ return tf.placeholder(tf.float32, (batch, height, width, channels))
+ else:
+ return tf.to_float(
+ np.tile(
+ np.reshape(
+ np.reshape(np.arange(height), [height, 1]) +
+ np.reshape(np.arange(width), [1, width]),
+ [1, height, width, 1]),
+ [batch, 1, 1, channels]))
+
+
+class ResnetCompleteNetworkTest(tf.test.TestCase):
+ """Tests with complete small ResNet v1 networks."""
+
+ def _resnet_small_lite_bottleneck(self,
+ inputs,
+ num_classes=None,
+ is_training=True,
+ global_pool=True,
+ output_stride=None,
+ multi_grid=None,
+ reuse=None,
+ scope='resnet_v1_small'):
+ """A shallow and thin ResNet v1 with lite_bottleneck."""
+ if multi_grid is None:
+ multi_grid = [1, 1]
+ else:
+ if len(multi_grid) != 2:
+ raise ValueError('Expect multi_grid to have length 2.')
+ block = resnet_v1_beta.resnet_v1_small_beta_block
+ blocks = [
+ block('block1', base_depth=1, num_units=1, stride=2),
+ block('block2', base_depth=2, num_units=1, stride=2),
+ block('block3', base_depth=4, num_units=1, stride=2),
+ resnet_utils.Block('block4', resnet_v1_beta.lite_bottleneck, [
+ {'depth': 8,
+ 'stride': 1,
+ 'unit_rate': rate} for rate in multi_grid])]
+ return resnet_v1_beta.resnet_v1_beta(
+ inputs,
+ blocks,
+ num_classes=num_classes,
+ is_training=is_training,
+ global_pool=global_pool,
+ output_stride=output_stride,
+ root_block_fn=functools.partial(
+ resnet_v1_beta.root_block_fn_for_beta_variant,
+ depth_multiplier=0.25),
+ reuse=reuse,
+ scope=scope)
+
+ def _resnet_small(self,
+ inputs,
+ num_classes=None,
+ is_training=True,
+ global_pool=True,
+ output_stride=None,
+ multi_grid=None,
+ reuse=None,
+ scope='resnet_v1_small'):
+ """A shallow and thin ResNet v1 for faster tests."""
+ if multi_grid is None:
+ multi_grid = [1, 1, 1]
+ else:
+ if len(multi_grid) != 3:
+ raise ValueError('Expect multi_grid to have length 3.')
+
+ block = resnet_v1_beta.resnet_v1_beta_block
+ blocks = [
+ block('block1', base_depth=1, num_units=1, stride=2),
+ block('block2', base_depth=2, num_units=1, stride=2),
+ block('block3', base_depth=4, num_units=1, stride=2),
+ resnet_utils.Block('block4', resnet_v1_beta.bottleneck, [
+ {'depth': 32, 'depth_bottleneck': 8, 'stride': 1,
+ 'unit_rate': rate} for rate in multi_grid])]
+
+ return resnet_v1_beta.resnet_v1_beta(
+ inputs,
+ blocks,
+ num_classes=num_classes,
+ is_training=is_training,
+ global_pool=global_pool,
+ output_stride=output_stride,
+ root_block_fn=functools.partial(
+ resnet_v1_beta.root_block_fn_for_beta_variant),
+ reuse=reuse,
+ scope=scope)
+
+ def testClassificationEndPointsWithLiteBottleneck(self):
+ global_pool = True
+ num_classes = 10
+ inputs = create_test_input(2, 224, 224, 3)
+ with slim.arg_scope(resnet_utils.resnet_arg_scope()):
+ logits, end_points = self._resnet_small_lite_bottleneck(
+ inputs,
+ num_classes,
+ global_pool=global_pool,
+ scope='resnet')
+
+ self.assertTrue(logits.op.name.startswith('resnet/logits'))
+ self.assertListEqual(logits.get_shape().as_list(), [2, 1, 1, num_classes])
+ self.assertIn('predictions', end_points)
+ self.assertListEqual(end_points['predictions'].get_shape().as_list(),
+ [2, 1, 1, num_classes])
+
+ def testClassificationEndPointsWithMultigridAndLiteBottleneck(self):
+ global_pool = True
+ num_classes = 10
+ inputs = create_test_input(2, 224, 224, 3)
+ multi_grid = [1, 2]
+ with slim.arg_scope(resnet_utils.resnet_arg_scope()):
+ logits, end_points = self._resnet_small_lite_bottleneck(
+ inputs,
+ num_classes,
+ global_pool=global_pool,
+ multi_grid=multi_grid,
+ scope='resnet')
+
+ self.assertTrue(logits.op.name.startswith('resnet/logits'))
+ self.assertListEqual(logits.get_shape().as_list(), [2, 1, 1, num_classes])
+ self.assertIn('predictions', end_points)
+ self.assertListEqual(end_points['predictions'].get_shape().as_list(),
+ [2, 1, 1, num_classes])
+
+ def testClassificationShapesWithLiteBottleneck(self):
+ global_pool = True
+ num_classes = 10
+ inputs = create_test_input(2, 224, 224, 3)
+ with slim.arg_scope(resnet_utils.resnet_arg_scope()):
+ _, end_points = self._resnet_small_lite_bottleneck(
+ inputs,
+ num_classes,
+ global_pool=global_pool,
+ scope='resnet')
+ endpoint_to_shape = {
+ 'resnet/conv1_1': [2, 112, 112, 16],
+ 'resnet/conv1_2': [2, 112, 112, 16],
+ 'resnet/conv1_3': [2, 112, 112, 32],
+ 'resnet/block1': [2, 28, 28, 1],
+ 'resnet/block2': [2, 14, 14, 2],
+ 'resnet/block3': [2, 7, 7, 4],
+ 'resnet/block4': [2, 7, 7, 8]}
+ for endpoint, shape in six.iteritems(endpoint_to_shape):
+ self.assertListEqual(end_points[endpoint].get_shape().as_list(), shape)
+
+ def testFullyConvolutionalEndpointShapesWithLiteBottleneck(self):
+ global_pool = False
+ num_classes = 10
+ inputs = create_test_input(2, 321, 321, 3)
+ with slim.arg_scope(resnet_utils.resnet_arg_scope()):
+ _, end_points = self._resnet_small_lite_bottleneck(
+ inputs,
+ num_classes,
+ global_pool=global_pool,
+ scope='resnet')
+ endpoint_to_shape = {
+ 'resnet/conv1_1': [2, 161, 161, 16],
+ 'resnet/conv1_2': [2, 161, 161, 16],
+ 'resnet/conv1_3': [2, 161, 161, 32],
+ 'resnet/block1': [2, 41, 41, 1],
+ 'resnet/block2': [2, 21, 21, 2],
+ 'resnet/block3': [2, 11, 11, 4],
+ 'resnet/block4': [2, 11, 11, 8]}
+ for endpoint, shape in six.iteritems(endpoint_to_shape):
+ self.assertListEqual(end_points[endpoint].get_shape().as_list(), shape)
+
+ def testAtrousFullyConvolutionalEndpointShapesWithLiteBottleneck(self):
+ global_pool = False
+ num_classes = 10
+ output_stride = 8
+ inputs = create_test_input(2, 321, 321, 3)
+ with slim.arg_scope(resnet_utils.resnet_arg_scope()):
+ _, end_points = self._resnet_small_lite_bottleneck(
+ inputs,
+ num_classes,
+ global_pool=global_pool,
+ output_stride=output_stride,
+ scope='resnet')
+ endpoint_to_shape = {
+ 'resnet/conv1_1': [2, 161, 161, 16],
+ 'resnet/conv1_2': [2, 161, 161, 16],
+ 'resnet/conv1_3': [2, 161, 161, 32],
+ 'resnet/block1': [2, 41, 41, 1],
+ 'resnet/block2': [2, 41, 41, 2],
+ 'resnet/block3': [2, 41, 41, 4],
+ 'resnet/block4': [2, 41, 41, 8]}
+ for endpoint, shape in six.iteritems(endpoint_to_shape):
+ self.assertListEqual(end_points[endpoint].get_shape().as_list(), shape)
+
+ def testAtrousFullyConvolutionalValuesWithLiteBottleneck(self):
+ """Verify dense feature extraction with atrous convolution."""
+ nominal_stride = 32
+ for output_stride in [4, 8, 16, 32, None]:
+ with slim.arg_scope(resnet_utils.resnet_arg_scope()):
+ with tf.Graph().as_default():
+ with self.test_session() as sess:
+ tf.set_random_seed(0)
+ inputs = create_test_input(2, 81, 81, 3)
+ # Dense feature extraction followed by subsampling.
+ output, _ = self._resnet_small_lite_bottleneck(
+ inputs,
+ None,
+ is_training=False,
+ global_pool=False,
+ output_stride=output_stride)
+ if output_stride is None:
+ factor = 1
+ else:
+ factor = nominal_stride // output_stride
+ output = resnet_utils.subsample(output, factor)
+ # Make the two networks use the same weights.
+ tf.get_variable_scope().reuse_variables()
+ # Feature extraction at the nominal network rate.
+ expected, _ = self._resnet_small_lite_bottleneck(
+ inputs,
+ None,
+ is_training=False,
+ global_pool=False)
+ sess.run(tf.global_variables_initializer())
+ self.assertAllClose(output.eval(), expected.eval(),
+ atol=1e-4, rtol=1e-4)
+
+ def testUnknownBatchSizeWithLiteBottleneck(self):
+ batch = 2
+ height, width = 65, 65
+ global_pool = True
+ num_classes = 10
+ inputs = create_test_input(None, height, width, 3)
+ with slim.arg_scope(resnet_utils.resnet_arg_scope()):
+ logits, _ = self._resnet_small_lite_bottleneck(
+ inputs,
+ num_classes,
+ global_pool=global_pool,
+ scope='resnet')
+ self.assertTrue(logits.op.name.startswith('resnet/logits'))
+ self.assertListEqual(logits.get_shape().as_list(),
+ [None, 1, 1, num_classes])
+ images = create_test_input(batch, height, width, 3)
+ with self.test_session() as sess:
+ sess.run(tf.global_variables_initializer())
+ output = sess.run(logits, {inputs: images.eval()})
+ self.assertEqual(output.shape, (batch, 1, 1, num_classes))
+
+ def testFullyConvolutionalUnknownHeightWidthWithLiteBottleneck(self):
+ batch = 2
+ height, width = 65, 65
+ global_pool = False
+ inputs = create_test_input(batch, None, None, 3)
+ with slim.arg_scope(resnet_utils.resnet_arg_scope()):
+ output, _ = self._resnet_small_lite_bottleneck(
+ inputs,
+ None,
+ global_pool=global_pool)
+ self.assertListEqual(output.get_shape().as_list(),
+ [batch, None, None, 8])
+ images = create_test_input(batch, height, width, 3)
+ with self.test_session() as sess:
+ sess.run(tf.global_variables_initializer())
+ output = sess.run(output, {inputs: images.eval()})
+ self.assertEqual(output.shape, (batch, 3, 3, 8))
+
+ def testAtrousFullyConvolutionalUnknownHeightWidthWithLiteBottleneck(self):
+ batch = 2
+ height, width = 65, 65
+ global_pool = False
+ output_stride = 8
+ inputs = create_test_input(batch, None, None, 3)
+ with slim.arg_scope(resnet_utils.resnet_arg_scope()):
+ output, _ = self._resnet_small_lite_bottleneck(
+ inputs,
+ None,
+ global_pool=global_pool,
+ output_stride=output_stride)
+ self.assertListEqual(output.get_shape().as_list(),
+ [batch, None, None, 8])
+ images = create_test_input(batch, height, width, 3)
+ with self.test_session() as sess:
+ sess.run(tf.global_variables_initializer())
+ output = sess.run(output, {inputs: images.eval()})
+ self.assertEqual(output.shape, (batch, 9, 9, 8))
+
+ def testClassificationEndPoints(self):
+ global_pool = True
+ num_classes = 10
+ inputs = create_test_input(2, 224, 224, 3)
+ with slim.arg_scope(resnet_utils.resnet_arg_scope()):
+ logits, end_points = self._resnet_small(inputs,
+ num_classes,
+ global_pool=global_pool,
+ scope='resnet')
+
+ self.assertTrue(logits.op.name.startswith('resnet/logits'))
+ self.assertListEqual(logits.get_shape().as_list(), [2, 1, 1, num_classes])
+ self.assertIn('predictions', end_points)
+ self.assertListEqual(end_points['predictions'].get_shape().as_list(),
+ [2, 1, 1, num_classes])
+
+ def testClassificationEndPointsWithWS(self):
+ global_pool = True
+ num_classes = 10
+ inputs = create_test_input(2, 224, 224, 3)
+ with slim.arg_scope(
+ resnet_v1_beta.resnet_arg_scope(use_weight_standardization=True)):
+ logits, end_points = self._resnet_small(
+ inputs, num_classes, global_pool=global_pool, scope='resnet')
+
+ self.assertTrue(logits.op.name.startswith('resnet/logits'))
+ self.assertListEqual(logits.get_shape().as_list(), [2, 1, 1, num_classes])
+ self.assertIn('predictions', end_points)
+ self.assertListEqual(end_points['predictions'].get_shape().as_list(),
+ [2, 1, 1, num_classes])
+
+ def testClassificationEndPointsWithGN(self):
+ global_pool = True
+ num_classes = 10
+ inputs = create_test_input(2, 224, 224, 3)
+ with slim.arg_scope(
+ resnet_v1_beta.resnet_arg_scope(normalization_method='group')):
+ with slim.arg_scope([slim.group_norm], groups=1):
+ logits, end_points = self._resnet_small(
+ inputs, num_classes, global_pool=global_pool, scope='resnet')
+
+ self.assertTrue(logits.op.name.startswith('resnet/logits'))
+ self.assertListEqual(logits.get_shape().as_list(), [2, 1, 1, num_classes])
+ self.assertIn('predictions', end_points)
+ self.assertListEqual(end_points['predictions'].get_shape().as_list(),
+ [2, 1, 1, num_classes])
+
+ def testInvalidGroupsWithGN(self):
+ global_pool = True
+ num_classes = 10
+ inputs = create_test_input(2, 224, 224, 3)
+ with self.assertRaisesRegexp(ValueError, 'Invalid groups'):
+ with slim.arg_scope(
+ resnet_v1_beta.resnet_arg_scope(normalization_method='group')):
+ with slim.arg_scope([slim.group_norm], groups=32):
+ _, _ = self._resnet_small(
+ inputs, num_classes, global_pool=global_pool, scope='resnet')
+
+ def testClassificationEndPointsWithGNWS(self):
+ global_pool = True
+ num_classes = 10
+ inputs = create_test_input(2, 224, 224, 3)
+ with slim.arg_scope(
+ resnet_v1_beta.resnet_arg_scope(
+ normalization_method='group', use_weight_standardization=True)):
+ with slim.arg_scope([slim.group_norm], groups=1):
+ logits, end_points = self._resnet_small(
+ inputs, num_classes, global_pool=global_pool, scope='resnet')
+
+ self.assertTrue(logits.op.name.startswith('resnet/logits'))
+ self.assertListEqual(logits.get_shape().as_list(), [2, 1, 1, num_classes])
+ self.assertIn('predictions', end_points)
+ self.assertListEqual(end_points['predictions'].get_shape().as_list(),
+ [2, 1, 1, num_classes])
+
+ def testClassificationEndPointsWithMultigrid(self):
+ global_pool = True
+ num_classes = 10
+ inputs = create_test_input(2, 224, 224, 3)
+ multi_grid = [1, 2, 4]
+ with slim.arg_scope(resnet_utils.resnet_arg_scope()):
+ logits, end_points = self._resnet_small(inputs,
+ num_classes,
+ global_pool=global_pool,
+ multi_grid=multi_grid,
+ scope='resnet')
+
+ self.assertTrue(logits.op.name.startswith('resnet/logits'))
+ self.assertListEqual(logits.get_shape().as_list(), [2, 1, 1, num_classes])
+ self.assertIn('predictions', end_points)
+ self.assertListEqual(end_points['predictions'].get_shape().as_list(),
+ [2, 1, 1, num_classes])
+
+ def testClassificationShapes(self):
+ global_pool = True
+ num_classes = 10
+ inputs = create_test_input(2, 224, 224, 3)
+ with slim.arg_scope(resnet_utils.resnet_arg_scope()):
+ _, end_points = self._resnet_small(inputs,
+ num_classes,
+ global_pool=global_pool,
+ scope='resnet')
+ endpoint_to_shape = {
+ 'resnet/conv1_1': [2, 112, 112, 64],
+ 'resnet/conv1_2': [2, 112, 112, 64],
+ 'resnet/conv1_3': [2, 112, 112, 128],
+ 'resnet/block1': [2, 28, 28, 4],
+ 'resnet/block2': [2, 14, 14, 8],
+ 'resnet/block3': [2, 7, 7, 16],
+ 'resnet/block4': [2, 7, 7, 32]}
+ for endpoint, shape in six.iteritems(endpoint_to_shape):
+ self.assertListEqual(end_points[endpoint].get_shape().as_list(), shape)
+
+ def testFullyConvolutionalEndpointShapes(self):
+ global_pool = False
+ num_classes = 10
+ inputs = create_test_input(2, 321, 321, 3)
+ with slim.arg_scope(resnet_utils.resnet_arg_scope()):
+ _, end_points = self._resnet_small(inputs,
+ num_classes,
+ global_pool=global_pool,
+ scope='resnet')
+ endpoint_to_shape = {
+ 'resnet/conv1_1': [2, 161, 161, 64],
+ 'resnet/conv1_2': [2, 161, 161, 64],
+ 'resnet/conv1_3': [2, 161, 161, 128],
+ 'resnet/block1': [2, 41, 41, 4],
+ 'resnet/block2': [2, 21, 21, 8],
+ 'resnet/block3': [2, 11, 11, 16],
+ 'resnet/block4': [2, 11, 11, 32]}
+ for endpoint, shape in six.iteritems(endpoint_to_shape):
+ self.assertListEqual(end_points[endpoint].get_shape().as_list(), shape)
+
+ def testAtrousFullyConvolutionalEndpointShapes(self):
+ global_pool = False
+ num_classes = 10
+ output_stride = 8
+ inputs = create_test_input(2, 321, 321, 3)
+ with slim.arg_scope(resnet_utils.resnet_arg_scope()):
+ _, end_points = self._resnet_small(inputs,
+ num_classes,
+ global_pool=global_pool,
+ output_stride=output_stride,
+ scope='resnet')
+ endpoint_to_shape = {
+ 'resnet/conv1_1': [2, 161, 161, 64],
+ 'resnet/conv1_2': [2, 161, 161, 64],
+ 'resnet/conv1_3': [2, 161, 161, 128],
+ 'resnet/block1': [2, 41, 41, 4],
+ 'resnet/block2': [2, 41, 41, 8],
+ 'resnet/block3': [2, 41, 41, 16],
+ 'resnet/block4': [2, 41, 41, 32]}
+ for endpoint, shape in six.iteritems(endpoint_to_shape):
+ self.assertListEqual(end_points[endpoint].get_shape().as_list(), shape)
+
+ def testAtrousFullyConvolutionalValues(self):
+ """Verify dense feature extraction with atrous convolution."""
+ nominal_stride = 32
+ for output_stride in [4, 8, 16, 32, None]:
+ with slim.arg_scope(resnet_utils.resnet_arg_scope()):
+ with tf.Graph().as_default():
+ with self.test_session() as sess:
+ tf.set_random_seed(0)
+ inputs = create_test_input(2, 81, 81, 3)
+ # Dense feature extraction followed by subsampling.
+ output, _ = self._resnet_small(inputs,
+ None,
+ is_training=False,
+ global_pool=False,
+ output_stride=output_stride)
+ if output_stride is None:
+ factor = 1
+ else:
+ factor = nominal_stride // output_stride
+ output = resnet_utils.subsample(output, factor)
+ # Make the two networks use the same weights.
+ tf.get_variable_scope().reuse_variables()
+ # Feature extraction at the nominal network rate.
+ expected, _ = self._resnet_small(inputs,
+ None,
+ is_training=False,
+ global_pool=False)
+ sess.run(tf.global_variables_initializer())
+ self.assertAllClose(output.eval(), expected.eval(),
+ atol=1e-4, rtol=1e-4)
+
+ def testUnknownBatchSize(self):
+ batch = 2
+ height, width = 65, 65
+ global_pool = True
+ num_classes = 10
+ inputs = create_test_input(None, height, width, 3)
+ with slim.arg_scope(resnet_utils.resnet_arg_scope()):
+ logits, _ = self._resnet_small(inputs,
+ num_classes,
+ global_pool=global_pool,
+ scope='resnet')
+ self.assertTrue(logits.op.name.startswith('resnet/logits'))
+ self.assertListEqual(logits.get_shape().as_list(),
+ [None, 1, 1, num_classes])
+ images = create_test_input(batch, height, width, 3)
+ with self.test_session() as sess:
+ sess.run(tf.global_variables_initializer())
+ output = sess.run(logits, {inputs: images.eval()})
+ self.assertEqual(output.shape, (batch, 1, 1, num_classes))
+
+ def testFullyConvolutionalUnknownHeightWidth(self):
+ batch = 2
+ height, width = 65, 65
+ global_pool = False
+ inputs = create_test_input(batch, None, None, 3)
+ with slim.arg_scope(resnet_utils.resnet_arg_scope()):
+ output, _ = self._resnet_small(inputs,
+ None,
+ global_pool=global_pool)
+ self.assertListEqual(output.get_shape().as_list(),
+ [batch, None, None, 32])
+ images = create_test_input(batch, height, width, 3)
+ with self.test_session() as sess:
+ sess.run(tf.global_variables_initializer())
+ output = sess.run(output, {inputs: images.eval()})
+ self.assertEqual(output.shape, (batch, 3, 3, 32))
+
+ def testAtrousFullyConvolutionalUnknownHeightWidth(self):
+ batch = 2
+ height, width = 65, 65
+ global_pool = False
+ output_stride = 8
+ inputs = create_test_input(batch, None, None, 3)
+ with slim.arg_scope(resnet_utils.resnet_arg_scope()):
+ output, _ = self._resnet_small(inputs,
+ None,
+ global_pool=global_pool,
+ output_stride=output_stride)
+ self.assertListEqual(output.get_shape().as_list(),
+ [batch, None, None, 32])
+ images = create_test_input(batch, height, width, 3)
+ with self.test_session() as sess:
+ sess.run(tf.global_variables_initializer())
+ output = sess.run(output, {inputs: images.eval()})
+ self.assertEqual(output.shape, (batch, 9, 9, 32))
+
+
+if __name__ == '__main__':
+ tf.test.main()
diff --git a/deeplab/models/research/deeplab/core/utils.py b/deeplab/models/research/deeplab/core/utils.py
new file mode 100644
index 0000000..4bf3d09
--- /dev/null
+++ b/deeplab/models/research/deeplab/core/utils.py
@@ -0,0 +1,214 @@
+# Lint as: python2, python3
+# Copyright 2018 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+
+"""This script contains utility functions."""
+import tensorflow as tf
+from tensorflow.contrib import framework as contrib_framework
+from tensorflow.contrib import slim as contrib_slim
+
+slim = contrib_slim
+
+
+# Quantized version of sigmoid function.
+q_sigmoid = lambda x: tf.nn.relu6(x + 3) * 0.16667
+
+
+def resize_bilinear(images, size, output_dtype=tf.float32):
+ """Returns resized images as output_type.
+
+ Args:
+ images: A tensor of size [batch, height_in, width_in, channels].
+ size: A 1-D int32 Tensor of 2 elements: new_height, new_width. The new size
+ for the images.
+ output_dtype: The destination type.
+ Returns:
+ A tensor of size [batch, height_out, width_out, channels] as a dtype of
+ output_dtype.
+ """
+ images = tf.image.resize_bilinear(images, size, align_corners=True)
+ return tf.cast(images, dtype=output_dtype)
+
+
+def scale_dimension(dim, scale):
+ """Scales the input dimension.
+
+ Args:
+ dim: Input dimension (a scalar or a scalar Tensor).
+ scale: The amount of scaling applied to the input.
+
+ Returns:
+ Scaled dimension.
+ """
+ if isinstance(dim, tf.Tensor):
+ return tf.cast((tf.to_float(dim) - 1.0) * scale + 1.0, dtype=tf.int32)
+ else:
+ return int((float(dim) - 1.0) * scale + 1.0)
+
+
+def split_separable_conv2d(inputs,
+ filters,
+ kernel_size=3,
+ rate=1,
+ weight_decay=0.00004,
+ depthwise_weights_initializer_stddev=0.33,
+ pointwise_weights_initializer_stddev=0.06,
+ scope=None):
+ """Splits a separable conv2d into depthwise and pointwise conv2d.
+
+ This operation differs from `tf.layers.separable_conv2d` as this operation
+ applies activation function between depthwise and pointwise conv2d.
+
+ Args:
+ inputs: Input tensor with shape [batch, height, width, channels].
+ filters: Number of filters in the 1x1 pointwise convolution.
+ kernel_size: A list of length 2: [kernel_height, kernel_width] of
+ of the filters. Can be an int if both values are the same.
+ rate: Atrous convolution rate for the depthwise convolution.
+ weight_decay: The weight decay to use for regularizing the model.
+ depthwise_weights_initializer_stddev: The standard deviation of the
+ truncated normal weight initializer for depthwise convolution.
+ pointwise_weights_initializer_stddev: The standard deviation of the
+ truncated normal weight initializer for pointwise convolution.
+ scope: Optional scope for the operation.
+
+ Returns:
+ Computed features after split separable conv2d.
+ """
+ outputs = slim.separable_conv2d(
+ inputs,
+ None,
+ kernel_size=kernel_size,
+ depth_multiplier=1,
+ rate=rate,
+ weights_initializer=tf.truncated_normal_initializer(
+ stddev=depthwise_weights_initializer_stddev),
+ weights_regularizer=None,
+ scope=scope + '_depthwise')
+ return slim.conv2d(
+ outputs,
+ filters,
+ 1,
+ weights_initializer=tf.truncated_normal_initializer(
+ stddev=pointwise_weights_initializer_stddev),
+ weights_regularizer=slim.l2_regularizer(weight_decay),
+ scope=scope + '_pointwise')
+
+
+def get_label_weight_mask(labels, ignore_label, num_classes, label_weights=1.0):
+ """Gets the label weight mask.
+
+ Args:
+ labels: A Tensor of labels with the shape of [-1].
+ ignore_label: Integer, label to ignore.
+ num_classes: Integer, the number of semantic classes.
+ label_weights: A float or a list of weights. If it is a float, it means all
+ the labels have the same weight. If it is a list of weights, then each
+ element in the list represents the weight for the label of its index, for
+ example, label_weights = [0.1, 0.5] means the weight for label 0 is 0.1
+ and the weight for label 1 is 0.5.
+
+ Returns:
+ A Tensor of label weights with the same shape of labels, each element is the
+ weight for the label with the same index in labels and the element is 0.0
+ if the label is to ignore.
+
+ Raises:
+ ValueError: If label_weights is neither a float nor a list, or if
+ label_weights is a list and its length is not equal to num_classes.
+ """
+ if not isinstance(label_weights, (float, list)):
+ raise ValueError(
+ 'The type of label_weights is invalid, it must be a float or a list.')
+
+ if isinstance(label_weights, list) and len(label_weights) != num_classes:
+ raise ValueError(
+ 'Length of label_weights must be equal to num_classes if it is a list, '
+ 'label_weights: %s, num_classes: %d.' % (label_weights, num_classes))
+
+ not_ignore_mask = tf.not_equal(labels, ignore_label)
+ not_ignore_mask = tf.cast(not_ignore_mask, tf.float32)
+ if isinstance(label_weights, float):
+ return not_ignore_mask * label_weights
+
+ label_weights = tf.constant(label_weights, tf.float32)
+ weight_mask = tf.einsum('...y,y->...',
+ tf.one_hot(labels, num_classes, dtype=tf.float32),
+ label_weights)
+ return tf.multiply(not_ignore_mask, weight_mask)
+
+
+def get_batch_norm_fn(sync_batch_norm_method):
+ """Gets batch norm function.
+
+ Currently we only support the following methods:
+ - `None` (no sync batch norm). We use slim.batch_norm in this case.
+
+ Args:
+ sync_batch_norm_method: String, method used to sync batch norm.
+
+ Returns:
+ Batchnorm function.
+
+ Raises:
+ ValueError: If sync_batch_norm_method is not supported.
+ """
+ if sync_batch_norm_method == 'None':
+ return slim.batch_norm
+ else:
+ raise ValueError('Unsupported sync_batch_norm_method.')
+
+
+def get_batch_norm_params(decay=0.9997,
+ epsilon=1e-5,
+ center=True,
+ scale=True,
+ is_training=True,
+ sync_batch_norm_method='None',
+ initialize_gamma_as_zeros=False):
+ """Gets batch norm parameters.
+
+ Args:
+ decay: Float, decay for the moving average.
+ epsilon: Float, value added to variance to avoid dividing by zero.
+ center: Boolean. If True, add offset of `beta` to normalized tensor. If
+ False,`beta` is ignored.
+ scale: Boolean. If True, multiply by `gamma`. If False, `gamma` is not used.
+ is_training: Boolean, whether or not the layer is in training mode.
+ sync_batch_norm_method: String, method used to sync batch norm.
+ initialize_gamma_as_zeros: Boolean, initializing `gamma` as zeros or not.
+
+ Returns:
+ A dictionary for batchnorm parameters.
+
+ Raises:
+ ValueError: If sync_batch_norm_method is not supported.
+ """
+ batch_norm_params = {
+ 'is_training': is_training,
+ 'decay': decay,
+ 'epsilon': epsilon,
+ 'scale': scale,
+ 'center': center,
+ }
+ if initialize_gamma_as_zeros:
+ if sync_batch_norm_method == 'None':
+ # Slim-type gamma_initialier.
+ batch_norm_params['param_initializers'] = {
+ 'gamma': tf.zeros_initializer(),
+ }
+ else:
+ raise ValueError('Unsupported sync_batch_norm_method.')
+ return batch_norm_params
diff --git a/deeplab/models/research/deeplab/core/utils_test.py b/deeplab/models/research/deeplab/core/utils_test.py
new file mode 100644
index 0000000..cfdb63e
--- /dev/null
+++ b/deeplab/models/research/deeplab/core/utils_test.py
@@ -0,0 +1,90 @@
+# Lint as: python2, python3
+# Copyright 2018 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Tests for utils.py."""
+
+import numpy as np
+import tensorflow as tf
+
+from deeplab.core import utils
+
+
+class UtilsTest(tf.test.TestCase):
+
+ def testScaleDimensionOutput(self):
+ self.assertEqual(161, utils.scale_dimension(321, 0.5))
+ self.assertEqual(193, utils.scale_dimension(321, 0.6))
+ self.assertEqual(241, utils.scale_dimension(321, 0.75))
+
+ def testGetLabelWeightMask_withFloatLabelWeights(self):
+ labels = tf.constant([0, 4, 1, 3, 2])
+ ignore_label = 4
+ num_classes = 5
+ label_weights = 0.5
+ expected_label_weight_mask = np.array([0.5, 0.0, 0.5, 0.5, 0.5],
+ dtype=np.float32)
+
+ with self.test_session() as sess:
+ label_weight_mask = utils.get_label_weight_mask(
+ labels, ignore_label, num_classes, label_weights=label_weights)
+ label_weight_mask = sess.run(label_weight_mask)
+ self.assertAllEqual(label_weight_mask, expected_label_weight_mask)
+
+ def testGetLabelWeightMask_withListLabelWeights(self):
+ labels = tf.constant([0, 4, 1, 3, 2])
+ ignore_label = 4
+ num_classes = 5
+ label_weights = [0.0, 0.1, 0.2, 0.3, 0.4]
+ expected_label_weight_mask = np.array([0.0, 0.0, 0.1, 0.3, 0.2],
+ dtype=np.float32)
+
+ with self.test_session() as sess:
+ label_weight_mask = utils.get_label_weight_mask(
+ labels, ignore_label, num_classes, label_weights=label_weights)
+ label_weight_mask = sess.run(label_weight_mask)
+ self.assertAllEqual(label_weight_mask, expected_label_weight_mask)
+
+ def testGetLabelWeightMask_withInvalidLabelWeightsType(self):
+ labels = tf.constant([0, 4, 1, 3, 2])
+ ignore_label = 4
+ num_classes = 5
+
+ self.assertRaisesWithRegexpMatch(
+ ValueError,
+ '^The type of label_weights is invalid, it must be a float or a list',
+ utils.get_label_weight_mask,
+ labels=labels,
+ ignore_label=ignore_label,
+ num_classes=num_classes,
+ label_weights=None)
+
+ def testGetLabelWeightMask_withInvalidLabelWeightsLength(self):
+ labels = tf.constant([0, 4, 1, 3, 2])
+ ignore_label = 4
+ num_classes = 5
+ label_weights = [0.0, 0.1, 0.2]
+
+ self.assertRaisesWithRegexpMatch(
+ ValueError,
+ '^Length of label_weights must be equal to num_classes if it is a list',
+ utils.get_label_weight_mask,
+ labels=labels,
+ ignore_label=ignore_label,
+ num_classes=num_classes,
+ label_weights=label_weights)
+
+
+if __name__ == '__main__':
+ tf.test.main()
diff --git a/deeplab/models/research/deeplab/core/xception.py b/deeplab/models/research/deeplab/core/xception.py
new file mode 100644
index 0000000..f992571
--- /dev/null
+++ b/deeplab/models/research/deeplab/core/xception.py
@@ -0,0 +1,945 @@
+# Lint as: python2, python3
+# Copyright 2018 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+
+r"""Xception model.
+
+"Xception: Deep Learning with Depthwise Separable Convolutions"
+Fran{\c{c}}ois Chollet
+https://arxiv.org/abs/1610.02357
+
+We implement the modified version by Jifeng Dai et al. for their COCO 2017
+detection challenge submission, where the model is made deeper and has aligned
+features for dense prediction tasks. See their slides for details:
+
+"Deformable Convolutional Networks -- COCO Detection and Segmentation Challenge
+2017 Entry"
+Haozhi Qi, Zheng Zhang, Bin Xiao, Han Hu, Bowen Cheng, Yichen Wei and Jifeng Dai
+ICCV 2017 COCO Challenge workshop
+http://presentations.cocodataset.org/COCO17-Detect-MSRA.pdf
+
+We made a few more changes on top of MSRA's modifications:
+1. Fully convolutional: All the max-pooling layers are replaced with separable
+ conv2d with stride = 2. This allows us to use atrous convolution to extract
+ feature maps at any resolution.
+
+2. We support adding ReLU and BatchNorm after depthwise convolution, motivated
+ by the design of MobileNetv1.
+
+"MobileNets: Efficient Convolutional Neural Networks for Mobile Vision
+Applications"
+Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang,
+Tobias Weyand, Marco Andreetto, Hartwig Adam
+https://arxiv.org/abs/1704.04861
+"""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import collections
+from six.moves import range
+import tensorflow as tf
+from tensorflow.contrib import slim as contrib_slim
+
+from deeplab.core import utils
+from tensorflow.contrib.slim.nets import resnet_utils
+from nets.mobilenet import conv_blocks as mobilenet_v3_ops
+
+slim = contrib_slim
+
+
+_DEFAULT_MULTI_GRID = [1, 1, 1]
+# The cap for tf.clip_by_value.
+_CLIP_CAP = 6
+
+
+class Block(collections.namedtuple('Block', ['scope', 'unit_fn', 'args'])):
+ """A named tuple describing an Xception block.
+
+ Its parts are:
+ scope: The scope of the block.
+ unit_fn: The Xception unit function which takes as input a tensor and
+ returns another tensor with the output of the Xception unit.
+ args: A list of length equal to the number of units in the block. The list
+ contains one dictionary for each unit in the block to serve as argument to
+ unit_fn.
+ """
+
+
+def fixed_padding(inputs, kernel_size, rate=1):
+ """Pads the input along the spatial dimensions independently of input size.
+
+ Args:
+ inputs: A tensor of size [batch, height_in, width_in, channels].
+ kernel_size: The kernel to be used in the conv2d or max_pool2d operation.
+ Should be a positive integer.
+ rate: An integer, rate for atrous convolution.
+
+ Returns:
+ output: A tensor of size [batch, height_out, width_out, channels] with the
+ input, either intact (if kernel_size == 1) or padded (if kernel_size > 1).
+ """
+ kernel_size_effective = kernel_size + (kernel_size - 1) * (rate - 1)
+ pad_total = kernel_size_effective - 1
+ pad_beg = pad_total // 2
+ pad_end = pad_total - pad_beg
+ padded_inputs = tf.pad(inputs, [[0, 0], [pad_beg, pad_end],
+ [pad_beg, pad_end], [0, 0]])
+ return padded_inputs
+
+
+@slim.add_arg_scope
+def separable_conv2d_same(inputs,
+ num_outputs,
+ kernel_size,
+ depth_multiplier,
+ stride,
+ rate=1,
+ use_explicit_padding=True,
+ regularize_depthwise=False,
+ scope=None,
+ **kwargs):
+ """Strided 2-D separable convolution with 'SAME' padding.
+
+ If stride > 1 and use_explicit_padding is True, then we do explicit zero-
+ padding, followed by conv2d with 'VALID' padding.
+
+ Note that
+
+ net = separable_conv2d_same(inputs, num_outputs, 3,
+ depth_multiplier=1, stride=stride)
+
+ is equivalent to
+
+ net = slim.separable_conv2d(inputs, num_outputs, 3,
+ depth_multiplier=1, stride=1, padding='SAME')
+ net = resnet_utils.subsample(net, factor=stride)
+
+ whereas
+
+ net = slim.separable_conv2d(inputs, num_outputs, 3, stride=stride,
+ depth_multiplier=1, padding='SAME')
+
+ is different when the input's height or width is even, which is why we add the
+ current function.
+
+ Consequently, if the input feature map has even height or width, setting
+ `use_explicit_padding=False` will result in feature misalignment by one pixel
+ along the corresponding dimension.
+
+ Args:
+ inputs: A 4-D tensor of size [batch, height_in, width_in, channels].
+ num_outputs: An integer, the number of output filters.
+ kernel_size: An int with the kernel_size of the filters.
+ depth_multiplier: The number of depthwise convolution output channels for
+ each input channel. The total number of depthwise convolution output
+ channels will be equal to `num_filters_in * depth_multiplier`.
+ stride: An integer, the output stride.
+ rate: An integer, rate for atrous convolution.
+ use_explicit_padding: If True, use explicit padding to make the model fully
+ compatible with the open source version, otherwise use the native
+ Tensorflow 'SAME' padding.
+ regularize_depthwise: Whether or not apply L2-norm regularization on the
+ depthwise convolution weights.
+ scope: Scope.
+ **kwargs: additional keyword arguments to pass to slim.conv2d
+
+ Returns:
+ output: A 4-D tensor of size [batch, height_out, width_out, channels] with
+ the convolution output.
+ """
+ def _separable_conv2d(padding):
+ """Wrapper for separable conv2d."""
+ return slim.separable_conv2d(inputs,
+ num_outputs,
+ kernel_size,
+ depth_multiplier=depth_multiplier,
+ stride=stride,
+ rate=rate,
+ padding=padding,
+ scope=scope,
+ **kwargs)
+ def _split_separable_conv2d(padding):
+ """Splits separable conv2d into depthwise and pointwise conv2d."""
+ outputs = slim.separable_conv2d(inputs,
+ None,
+ kernel_size,
+ depth_multiplier=depth_multiplier,
+ stride=stride,
+ rate=rate,
+ padding=padding,
+ scope=scope + '_depthwise',
+ **kwargs)
+ return slim.conv2d(outputs,
+ num_outputs,
+ 1,
+ scope=scope + '_pointwise',
+ **kwargs)
+ if stride == 1 or not use_explicit_padding:
+ if regularize_depthwise:
+ outputs = _separable_conv2d(padding='SAME')
+ else:
+ outputs = _split_separable_conv2d(padding='SAME')
+ else:
+ inputs = fixed_padding(inputs, kernel_size, rate)
+ if regularize_depthwise:
+ outputs = _separable_conv2d(padding='VALID')
+ else:
+ outputs = _split_separable_conv2d(padding='VALID')
+ return outputs
+
+
+@slim.add_arg_scope
+def xception_module(inputs,
+ depth_list,
+ skip_connection_type,
+ stride,
+ kernel_size=3,
+ unit_rate_list=None,
+ rate=1,
+ activation_fn_in_separable_conv=False,
+ regularize_depthwise=False,
+ outputs_collections=None,
+ scope=None,
+ use_bounded_activation=False,
+ use_explicit_padding=True,
+ use_squeeze_excite=False,
+ se_pool_size=None):
+ """An Xception module.
+
+ The output of one Xception module is equal to the sum of `residual` and
+ `shortcut`, where `residual` is the feature computed by three separable
+ convolution. The `shortcut` is the feature computed by 1x1 convolution with
+ or without striding. In some cases, the `shortcut` path could be a simple
+ identity function or none (i.e, no shortcut).
+
+ Note that we replace the max pooling operations in the Xception module with
+ another separable convolution with striding, since atrous rate is not properly
+ supported in current TensorFlow max pooling implementation.
+
+ Args:
+ inputs: A tensor of size [batch, height, width, channels].
+ depth_list: A list of three integers specifying the depth values of one
+ Xception module.
+ skip_connection_type: Skip connection type for the residual path. Only
+ supports 'conv', 'sum', or 'none'.
+ stride: The block unit's stride. Determines the amount of downsampling of
+ the units output compared to its input.
+ kernel_size: Integer, convolution kernel size.
+ unit_rate_list: A list of three integers, determining the unit rate for
+ each separable convolution in the xception module.
+ rate: An integer, rate for atrous convolution.
+ activation_fn_in_separable_conv: Includes activation function in the
+ separable convolution or not.
+ regularize_depthwise: Whether or not apply L2-norm regularization on the
+ depthwise convolution weights.
+ outputs_collections: Collection to add the Xception unit output.
+ scope: Optional variable_scope.
+ use_bounded_activation: Whether or not to use bounded activations. Bounded
+ activations better lend themselves to quantized inference.
+ use_explicit_padding: If True, use explicit padding to make the model fully
+ compatible with the open source version, otherwise use the native
+ Tensorflow 'SAME' padding.
+ use_squeeze_excite: Boolean, use squeeze-and-excitation or not.
+ se_pool_size: None or integer specifying the pooling size used in SE module.
+
+ Returns:
+ The Xception module's output.
+
+ Raises:
+ ValueError: If depth_list and unit_rate_list do not contain three elements,
+ or if stride != 1 for the third separable convolution operation in the
+ residual path, or unsupported skip connection type.
+ """
+ if len(depth_list) != 3:
+ raise ValueError('Expect three elements in depth_list.')
+ if unit_rate_list:
+ if len(unit_rate_list) != 3:
+ raise ValueError('Expect three elements in unit_rate_list.')
+
+ with tf.variable_scope(scope, 'xception_module', [inputs]) as sc:
+ residual = inputs
+
+ def _separable_conv(features, depth, kernel_size, depth_multiplier,
+ regularize_depthwise, rate, stride, scope):
+ """Separable conv block."""
+ if activation_fn_in_separable_conv:
+ activation_fn = tf.nn.relu6 if use_bounded_activation else tf.nn.relu
+ else:
+ if use_bounded_activation:
+ # When use_bounded_activation is True, we clip the feature values and
+ # apply relu6 for activation.
+ activation_fn = lambda x: tf.clip_by_value(x, -_CLIP_CAP, _CLIP_CAP)
+ features = tf.nn.relu6(features)
+ else:
+ # Original network design.
+ activation_fn = None
+ features = tf.nn.relu(features)
+ return separable_conv2d_same(features,
+ depth,
+ kernel_size,
+ depth_multiplier=depth_multiplier,
+ stride=stride,
+ rate=rate,
+ activation_fn=activation_fn,
+ use_explicit_padding=use_explicit_padding,
+ regularize_depthwise=regularize_depthwise,
+ scope=scope)
+ for i in range(3):
+ residual = _separable_conv(residual,
+ depth_list[i],
+ kernel_size=kernel_size,
+ depth_multiplier=1,
+ regularize_depthwise=regularize_depthwise,
+ rate=rate*unit_rate_list[i],
+ stride=stride if i == 2 else 1,
+ scope='separable_conv' + str(i+1))
+ if use_squeeze_excite:
+ residual = mobilenet_v3_ops.squeeze_excite(
+ input_tensor=residual,
+ squeeze_factor=16,
+ inner_activation_fn=tf.nn.relu,
+ gating_fn=lambda x: tf.nn.relu6(x+3)*0.16667,
+ pool=se_pool_size)
+
+ if skip_connection_type == 'conv':
+ shortcut = slim.conv2d(inputs,
+ depth_list[-1],
+ [1, 1],
+ stride=stride,
+ activation_fn=None,
+ scope='shortcut')
+ if use_bounded_activation:
+ residual = tf.clip_by_value(residual, -_CLIP_CAP, _CLIP_CAP)
+ shortcut = tf.clip_by_value(shortcut, -_CLIP_CAP, _CLIP_CAP)
+ outputs = residual + shortcut
+ if use_bounded_activation:
+ outputs = tf.nn.relu6(outputs)
+ elif skip_connection_type == 'sum':
+ if use_bounded_activation:
+ residual = tf.clip_by_value(residual, -_CLIP_CAP, _CLIP_CAP)
+ inputs = tf.clip_by_value(inputs, -_CLIP_CAP, _CLIP_CAP)
+ outputs = residual + inputs
+ if use_bounded_activation:
+ outputs = tf.nn.relu6(outputs)
+ elif skip_connection_type == 'none':
+ outputs = residual
+ else:
+ raise ValueError('Unsupported skip connection type.')
+
+ return slim.utils.collect_named_outputs(outputs_collections,
+ sc.name,
+ outputs)
+
+
+@slim.add_arg_scope
+def stack_blocks_dense(net,
+ blocks,
+ output_stride=None,
+ outputs_collections=None):
+ """Stacks Xception blocks and controls output feature density.
+
+ First, this function creates scopes for the Xception in the form of
+ 'block_name/unit_1', 'block_name/unit_2', etc.
+
+ Second, this function allows the user to explicitly control the output
+ stride, which is the ratio of the input to output spatial resolution. This
+ is useful for dense prediction tasks such as semantic segmentation or
+ object detection.
+
+ Control of the output feature density is implemented by atrous convolution.
+
+ Args:
+ net: A tensor of size [batch, height, width, channels].
+ blocks: A list of length equal to the number of Xception blocks. Each
+ element is an Xception Block object describing the units in the block.
+ output_stride: If None, then the output will be computed at the nominal
+ network stride. If output_stride is not None, it specifies the requested
+ ratio of input to output spatial resolution, which needs to be equal to
+ the product of unit strides from the start up to some level of Xception.
+ For example, if the Xception employs units with strides 1, 2, 1, 3, 4, 1,
+ then valid values for the output_stride are 1, 2, 6, 24 or None (which
+ is equivalent to output_stride=24).
+ outputs_collections: Collection to add the Xception block outputs.
+
+ Returns:
+ net: Output tensor with stride equal to the specified output_stride.
+
+ Raises:
+ ValueError: If the target output_stride is not valid.
+ """
+ # The current_stride variable keeps track of the effective stride of the
+ # activations. This allows us to invoke atrous convolution whenever applying
+ # the next residual unit would result in the activations having stride larger
+ # than the target output_stride.
+ current_stride = 1
+
+ # The atrous convolution rate parameter.
+ rate = 1
+
+ for block in blocks:
+ with tf.variable_scope(block.scope, 'block', [net]) as sc:
+ for i, unit in enumerate(block.args):
+ if output_stride is not None and current_stride > output_stride:
+ raise ValueError('The target output_stride cannot be reached.')
+ with tf.variable_scope('unit_%d' % (i + 1), values=[net]):
+ # If we have reached the target output_stride, then we need to employ
+ # atrous convolution with stride=1 and multiply the atrous rate by the
+ # current unit's stride for use in subsequent layers.
+ if output_stride is not None and current_stride == output_stride:
+ net = block.unit_fn(net, rate=rate, **dict(unit, stride=1))
+ rate *= unit.get('stride', 1)
+ else:
+ net = block.unit_fn(net, rate=1, **unit)
+ current_stride *= unit.get('stride', 1)
+
+ # Collect activations at the block's end before performing subsampling.
+ net = slim.utils.collect_named_outputs(outputs_collections, sc.name, net)
+
+ if output_stride is not None and current_stride != output_stride:
+ raise ValueError('The target output_stride cannot be reached.')
+
+ return net
+
+
+def xception(inputs,
+ blocks,
+ num_classes=None,
+ is_training=True,
+ global_pool=True,
+ keep_prob=0.5,
+ output_stride=None,
+ reuse=None,
+ scope=None,
+ sync_batch_norm_method='None'):
+ """Generator for Xception models.
+
+ This function generates a family of Xception models. See the xception_*()
+ methods for specific model instantiations, obtained by selecting different
+ block instantiations that produce Xception of various depths.
+
+ Args:
+ inputs: A tensor of size [batch, height_in, width_in, channels]. Must be
+ floating point. If a pretrained checkpoint is used, pixel values should be
+ the same as during training (see go/slim-classification-models for
+ specifics).
+ blocks: A list of length equal to the number of Xception blocks. Each
+ element is an Xception Block object describing the units in the block.
+ num_classes: Number of predicted classes for classification tasks.
+ If 0 or None, we return the features before the logit layer.
+ is_training: whether batch_norm layers are in training mode.
+ global_pool: If True, we perform global average pooling before computing the
+ logits. Set to True for image classification, False for dense prediction.
+ keep_prob: Keep probability used in the pre-logits dropout layer.
+ output_stride: If None, then the output will be computed at the nominal
+ network stride. If output_stride is not None, it specifies the requested
+ ratio of input to output spatial resolution.
+ reuse: whether or not the network and its variables should be reused. To be
+ able to reuse 'scope' must be given.
+ scope: Optional variable_scope.
+ sync_batch_norm_method: String, sync batchnorm method. Currently only
+ support `None`.
+
+ Returns:
+ net: A rank-4 tensor of size [batch, height_out, width_out, channels_out].
+ If global_pool is False, then height_out and width_out are reduced by a
+ factor of output_stride compared to the respective height_in and width_in,
+ else both height_out and width_out equal one. If num_classes is 0 or None,
+ then net is the output of the last Xception block, potentially after
+ global average pooling. If num_classes is a non-zero integer, net contains
+ the pre-softmax activations.
+ end_points: A dictionary from components of the network to the corresponding
+ activation.
+
+ Raises:
+ ValueError: If the target output_stride is not valid.
+ """
+ with tf.variable_scope(
+ scope, 'xception', [inputs], reuse=reuse) as sc:
+ end_points_collection = sc.original_name_scope + 'end_points'
+ batch_norm = utils.get_batch_norm_fn(sync_batch_norm_method)
+ with slim.arg_scope([slim.conv2d,
+ slim.separable_conv2d,
+ xception_module,
+ stack_blocks_dense],
+ outputs_collections=end_points_collection):
+ with slim.arg_scope([batch_norm], is_training=is_training):
+ net = inputs
+ if output_stride is not None:
+ if output_stride % 2 != 0:
+ raise ValueError('The output_stride needs to be a multiple of 2.')
+ output_stride //= 2
+ # Root block function operated on inputs.
+ net = resnet_utils.conv2d_same(net, 32, 3, stride=2,
+ scope='entry_flow/conv1_1')
+ net = resnet_utils.conv2d_same(net, 64, 3, stride=1,
+ scope='entry_flow/conv1_2')
+
+ # Extract features for entry_flow, middle_flow, and exit_flow.
+ net = stack_blocks_dense(net, blocks, output_stride)
+
+ # Convert end_points_collection into a dictionary of end_points.
+ end_points = slim.utils.convert_collection_to_dict(
+ end_points_collection, clear_collection=True)
+
+ if global_pool:
+ # Global average pooling.
+ net = tf.reduce_mean(net, [1, 2], name='global_pool', keepdims=True)
+ end_points['global_pool'] = net
+ if num_classes:
+ net = slim.dropout(net, keep_prob=keep_prob, is_training=is_training,
+ scope='prelogits_dropout')
+ net = slim.conv2d(net, num_classes, [1, 1], activation_fn=None,
+ normalizer_fn=None, scope='logits')
+ end_points[sc.name + '/logits'] = net
+ end_points['predictions'] = slim.softmax(net, scope='predictions')
+ return net, end_points
+
+
+def xception_block(scope,
+ depth_list,
+ skip_connection_type,
+ activation_fn_in_separable_conv,
+ regularize_depthwise,
+ num_units,
+ stride,
+ kernel_size=3,
+ unit_rate_list=None,
+ use_squeeze_excite=False,
+ se_pool_size=None):
+ """Helper function for creating a Xception block.
+
+ Args:
+ scope: The scope of the block.
+ depth_list: The depth of the bottleneck layer for each unit.
+ skip_connection_type: Skip connection type for the residual path. Only
+ supports 'conv', 'sum', or 'none'.
+ activation_fn_in_separable_conv: Includes activation function in the
+ separable convolution or not.
+ regularize_depthwise: Whether or not apply L2-norm regularization on the
+ depthwise convolution weights.
+ num_units: The number of units in the block.
+ stride: The stride of the block, implemented as a stride in the last unit.
+ All other units have stride=1.
+ kernel_size: Integer, convolution kernel size.
+ unit_rate_list: A list of three integers, determining the unit rate in the
+ corresponding xception block.
+ use_squeeze_excite: Boolean, use squeeze-and-excitation or not.
+ se_pool_size: None or integer specifying the pooling size used in SE module.
+
+ Returns:
+ An Xception block.
+ """
+ if unit_rate_list is None:
+ unit_rate_list = _DEFAULT_MULTI_GRID
+ return Block(scope, xception_module, [{
+ 'depth_list': depth_list,
+ 'skip_connection_type': skip_connection_type,
+ 'activation_fn_in_separable_conv': activation_fn_in_separable_conv,
+ 'regularize_depthwise': regularize_depthwise,
+ 'stride': stride,
+ 'kernel_size': kernel_size,
+ 'unit_rate_list': unit_rate_list,
+ 'use_squeeze_excite': use_squeeze_excite,
+ 'se_pool_size': se_pool_size,
+ }] * num_units)
+
+
+def xception_41(inputs,
+ num_classes=None,
+ is_training=True,
+ global_pool=True,
+ keep_prob=0.5,
+ output_stride=None,
+ regularize_depthwise=False,
+ multi_grid=None,
+ reuse=None,
+ scope='xception_41',
+ sync_batch_norm_method='None'):
+ """Xception-41 model."""
+ blocks = [
+ xception_block('entry_flow/block1',
+ depth_list=[128, 128, 128],
+ skip_connection_type='conv',
+ activation_fn_in_separable_conv=False,
+ regularize_depthwise=regularize_depthwise,
+ num_units=1,
+ stride=2),
+ xception_block('entry_flow/block2',
+ depth_list=[256, 256, 256],
+ skip_connection_type='conv',
+ activation_fn_in_separable_conv=False,
+ regularize_depthwise=regularize_depthwise,
+ num_units=1,
+ stride=2),
+ xception_block('entry_flow/block3',
+ depth_list=[728, 728, 728],
+ skip_connection_type='conv',
+ activation_fn_in_separable_conv=False,
+ regularize_depthwise=regularize_depthwise,
+ num_units=1,
+ stride=2),
+ xception_block('middle_flow/block1',
+ depth_list=[728, 728, 728],
+ skip_connection_type='sum',
+ activation_fn_in_separable_conv=False,
+ regularize_depthwise=regularize_depthwise,
+ num_units=8,
+ stride=1),
+ xception_block('exit_flow/block1',
+ depth_list=[728, 1024, 1024],
+ skip_connection_type='conv',
+ activation_fn_in_separable_conv=False,
+ regularize_depthwise=regularize_depthwise,
+ num_units=1,
+ stride=2),
+ xception_block('exit_flow/block2',
+ depth_list=[1536, 1536, 2048],
+ skip_connection_type='none',
+ activation_fn_in_separable_conv=True,
+ regularize_depthwise=regularize_depthwise,
+ num_units=1,
+ stride=1,
+ unit_rate_list=multi_grid),
+ ]
+ return xception(inputs,
+ blocks=blocks,
+ num_classes=num_classes,
+ is_training=is_training,
+ global_pool=global_pool,
+ keep_prob=keep_prob,
+ output_stride=output_stride,
+ reuse=reuse,
+ scope=scope,
+ sync_batch_norm_method=sync_batch_norm_method)
+
+
+def xception_65_factory(inputs,
+ num_classes=None,
+ is_training=True,
+ global_pool=True,
+ keep_prob=0.5,
+ output_stride=None,
+ regularize_depthwise=False,
+ kernel_size=3,
+ multi_grid=None,
+ reuse=None,
+ use_squeeze_excite=False,
+ se_pool_size=None,
+ scope='xception_65',
+ sync_batch_norm_method='None'):
+ """Xception-65 model factory."""
+ blocks = [
+ xception_block('entry_flow/block1',
+ depth_list=[128, 128, 128],
+ skip_connection_type='conv',
+ activation_fn_in_separable_conv=False,
+ regularize_depthwise=regularize_depthwise,
+ num_units=1,
+ stride=2,
+ kernel_size=kernel_size,
+ use_squeeze_excite=False,
+ se_pool_size=se_pool_size),
+ xception_block('entry_flow/block2',
+ depth_list=[256, 256, 256],
+ skip_connection_type='conv',
+ activation_fn_in_separable_conv=False,
+ regularize_depthwise=regularize_depthwise,
+ num_units=1,
+ stride=2,
+ kernel_size=kernel_size,
+ use_squeeze_excite=False,
+ se_pool_size=se_pool_size),
+ xception_block('entry_flow/block3',
+ depth_list=[728, 728, 728],
+ skip_connection_type='conv',
+ activation_fn_in_separable_conv=False,
+ regularize_depthwise=regularize_depthwise,
+ num_units=1,
+ stride=2,
+ kernel_size=kernel_size,
+ use_squeeze_excite=use_squeeze_excite,
+ se_pool_size=se_pool_size),
+ xception_block('middle_flow/block1',
+ depth_list=[728, 728, 728],
+ skip_connection_type='sum',
+ activation_fn_in_separable_conv=False,
+ regularize_depthwise=regularize_depthwise,
+ num_units=16,
+ stride=1,
+ kernel_size=kernel_size,
+ use_squeeze_excite=use_squeeze_excite,
+ se_pool_size=se_pool_size),
+ xception_block('exit_flow/block1',
+ depth_list=[728, 1024, 1024],
+ skip_connection_type='conv',
+ activation_fn_in_separable_conv=False,
+ regularize_depthwise=regularize_depthwise,
+ num_units=1,
+ stride=2,
+ kernel_size=kernel_size,
+ use_squeeze_excite=use_squeeze_excite,
+ se_pool_size=se_pool_size),
+ xception_block('exit_flow/block2',
+ depth_list=[1536, 1536, 2048],
+ skip_connection_type='none',
+ activation_fn_in_separable_conv=True,
+ regularize_depthwise=regularize_depthwise,
+ num_units=1,
+ stride=1,
+ kernel_size=kernel_size,
+ unit_rate_list=multi_grid,
+ use_squeeze_excite=False,
+ se_pool_size=se_pool_size),
+ ]
+ return xception(inputs,
+ blocks=blocks,
+ num_classes=num_classes,
+ is_training=is_training,
+ global_pool=global_pool,
+ keep_prob=keep_prob,
+ output_stride=output_stride,
+ reuse=reuse,
+ scope=scope,
+ sync_batch_norm_method=sync_batch_norm_method)
+
+
+def xception_65(inputs,
+ num_classes=None,
+ is_training=True,
+ global_pool=True,
+ keep_prob=0.5,
+ output_stride=None,
+ regularize_depthwise=False,
+ multi_grid=None,
+ reuse=None,
+ scope='xception_65',
+ sync_batch_norm_method='None'):
+ """Xception-65 model."""
+ return xception_65_factory(
+ inputs=inputs,
+ num_classes=num_classes,
+ is_training=is_training,
+ global_pool=global_pool,
+ keep_prob=keep_prob,
+ output_stride=output_stride,
+ regularize_depthwise=regularize_depthwise,
+ multi_grid=multi_grid,
+ reuse=reuse,
+ scope=scope,
+ use_squeeze_excite=False,
+ se_pool_size=None,
+ sync_batch_norm_method=sync_batch_norm_method)
+
+
+def xception_71_factory(inputs,
+ num_classes=None,
+ is_training=True,
+ global_pool=True,
+ keep_prob=0.5,
+ output_stride=None,
+ regularize_depthwise=False,
+ kernel_size=3,
+ multi_grid=None,
+ reuse=None,
+ scope='xception_71',
+ use_squeeze_excite=False,
+ se_pool_size=None,
+ sync_batch_norm_method='None'):
+ """Xception-71 model factory."""
+ blocks = [
+ xception_block('entry_flow/block1',
+ depth_list=[128, 128, 128],
+ skip_connection_type='conv',
+ activation_fn_in_separable_conv=False,
+ regularize_depthwise=regularize_depthwise,
+ num_units=1,
+ stride=2,
+ kernel_size=kernel_size,
+ use_squeeze_excite=False,
+ se_pool_size=se_pool_size),
+ xception_block('entry_flow/block2',
+ depth_list=[256, 256, 256],
+ skip_connection_type='conv',
+ activation_fn_in_separable_conv=False,
+ regularize_depthwise=regularize_depthwise,
+ num_units=1,
+ stride=1,
+ kernel_size=kernel_size,
+ use_squeeze_excite=False,
+ se_pool_size=se_pool_size),
+ xception_block('entry_flow/block3',
+ depth_list=[256, 256, 256],
+ skip_connection_type='conv',
+ activation_fn_in_separable_conv=False,
+ regularize_depthwise=regularize_depthwise,
+ num_units=1,
+ stride=2,
+ kernel_size=kernel_size,
+ use_squeeze_excite=False,
+ se_pool_size=se_pool_size),
+ xception_block('entry_flow/block4',
+ depth_list=[728, 728, 728],
+ skip_connection_type='conv',
+ activation_fn_in_separable_conv=False,
+ regularize_depthwise=regularize_depthwise,
+ num_units=1,
+ stride=1,
+ kernel_size=kernel_size,
+ use_squeeze_excite=use_squeeze_excite,
+ se_pool_size=se_pool_size),
+ xception_block('entry_flow/block5',
+ depth_list=[728, 728, 728],
+ skip_connection_type='conv',
+ activation_fn_in_separable_conv=False,
+ regularize_depthwise=regularize_depthwise,
+ num_units=1,
+ stride=2,
+ kernel_size=kernel_size,
+ use_squeeze_excite=use_squeeze_excite,
+ se_pool_size=se_pool_size),
+ xception_block('middle_flow/block1',
+ depth_list=[728, 728, 728],
+ skip_connection_type='sum',
+ activation_fn_in_separable_conv=False,
+ regularize_depthwise=regularize_depthwise,
+ num_units=16,
+ stride=1,
+ kernel_size=kernel_size,
+ use_squeeze_excite=use_squeeze_excite,
+ se_pool_size=se_pool_size),
+ xception_block('exit_flow/block1',
+ depth_list=[728, 1024, 1024],
+ skip_connection_type='conv',
+ activation_fn_in_separable_conv=False,
+ regularize_depthwise=regularize_depthwise,
+ num_units=1,
+ stride=2,
+ kernel_size=kernel_size,
+ use_squeeze_excite=use_squeeze_excite,
+ se_pool_size=se_pool_size),
+ xception_block('exit_flow/block2',
+ depth_list=[1536, 1536, 2048],
+ skip_connection_type='none',
+ activation_fn_in_separable_conv=True,
+ regularize_depthwise=regularize_depthwise,
+ num_units=1,
+ stride=1,
+ kernel_size=kernel_size,
+ unit_rate_list=multi_grid,
+ use_squeeze_excite=False,
+ se_pool_size=se_pool_size),
+ ]
+ return xception(inputs,
+ blocks=blocks,
+ num_classes=num_classes,
+ is_training=is_training,
+ global_pool=global_pool,
+ keep_prob=keep_prob,
+ output_stride=output_stride,
+ reuse=reuse,
+ scope=scope,
+ sync_batch_norm_method=sync_batch_norm_method)
+
+
+def xception_71(inputs,
+ num_classes=None,
+ is_training=True,
+ global_pool=True,
+ keep_prob=0.5,
+ output_stride=None,
+ regularize_depthwise=False,
+ multi_grid=None,
+ reuse=None,
+ scope='xception_71',
+ sync_batch_norm_method='None'):
+ """Xception-71 model."""
+ return xception_71_factory(
+ inputs=inputs,
+ num_classes=num_classes,
+ is_training=is_training,
+ global_pool=global_pool,
+ keep_prob=keep_prob,
+ output_stride=output_stride,
+ regularize_depthwise=regularize_depthwise,
+ multi_grid=multi_grid,
+ reuse=reuse,
+ scope=scope,
+ use_squeeze_excite=False,
+ se_pool_size=None,
+ sync_batch_norm_method=sync_batch_norm_method)
+
+
+def xception_arg_scope(weight_decay=0.00004,
+ batch_norm_decay=0.9997,
+ batch_norm_epsilon=0.001,
+ batch_norm_scale=True,
+ weights_initializer_stddev=0.09,
+ regularize_depthwise=False,
+ use_batch_norm=True,
+ use_bounded_activation=False,
+ sync_batch_norm_method='None'):
+ """Defines the default Xception arg scope.
+
+ Args:
+ weight_decay: The weight decay to use for regularizing the model.
+ batch_norm_decay: The moving average decay when estimating layer activation
+ statistics in batch normalization.
+ batch_norm_epsilon: Small constant to prevent division by zero when
+ normalizing activations by their variance in batch normalization.
+ batch_norm_scale: If True, uses an explicit `gamma` multiplier to scale the
+ activations in the batch normalization layer.
+ weights_initializer_stddev: The standard deviation of the trunctated normal
+ weight initializer.
+ regularize_depthwise: Whether or not apply L2-norm regularization on the
+ depthwise convolution weights.
+ use_batch_norm: Whether or not to use batch normalization.
+ use_bounded_activation: Whether or not to use bounded activations. Bounded
+ activations better lend themselves to quantized inference.
+ sync_batch_norm_method: String, sync batchnorm method. Currently only
+ support `None`. Also, it is only effective for Xception.
+
+ Returns:
+ An `arg_scope` to use for the Xception models.
+ """
+ batch_norm_params = {
+ 'decay': batch_norm_decay,
+ 'epsilon': batch_norm_epsilon,
+ 'scale': batch_norm_scale,
+ }
+ if regularize_depthwise:
+ depthwise_regularizer = slim.l2_regularizer(weight_decay)
+ else:
+ depthwise_regularizer = None
+ activation_fn = tf.nn.relu6 if use_bounded_activation else tf.nn.relu
+ batch_norm = utils.get_batch_norm_fn(sync_batch_norm_method)
+ with slim.arg_scope(
+ [slim.conv2d, slim.separable_conv2d],
+ weights_initializer=tf.truncated_normal_initializer(
+ stddev=weights_initializer_stddev),
+ activation_fn=activation_fn,
+ normalizer_fn=batch_norm if use_batch_norm else None):
+ with slim.arg_scope([batch_norm], **batch_norm_params):
+ with slim.arg_scope(
+ [slim.conv2d],
+ weights_regularizer=slim.l2_regularizer(weight_decay)):
+ with slim.arg_scope(
+ [slim.separable_conv2d],
+ weights_regularizer=depthwise_regularizer):
+ with slim.arg_scope(
+ [xception_module],
+ use_bounded_activation=use_bounded_activation,
+ use_explicit_padding=not use_bounded_activation) as arg_sc:
+ return arg_sc
diff --git a/deeplab/models/research/deeplab/core/xception_test.py b/deeplab/models/research/deeplab/core/xception_test.py
new file mode 100644
index 0000000..fc338da
--- /dev/null
+++ b/deeplab/models/research/deeplab/core/xception_test.py
@@ -0,0 +1,488 @@
+# Lint as: python2, python3
+# Copyright 2018 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+
+"""Tests for xception.py."""
+import numpy as np
+import six
+import tensorflow as tf
+from tensorflow.contrib import slim as contrib_slim
+
+from deeplab.core import xception
+from tensorflow.contrib.slim.nets import resnet_utils
+
+slim = contrib_slim
+
+
+def create_test_input(batch, height, width, channels):
+ """Create test input tensor."""
+ if None in [batch, height, width, channels]:
+ return tf.placeholder(tf.float32, (batch, height, width, channels))
+ else:
+ return tf.cast(
+ np.tile(
+ np.reshape(
+ np.reshape(np.arange(height), [height, 1]) +
+ np.reshape(np.arange(width), [1, width]),
+ [1, height, width, 1]),
+ [batch, 1, 1, channels]),
+ tf.float32)
+
+
+class UtilityFunctionTest(tf.test.TestCase):
+
+ def testSeparableConv2DSameWithInputEvenSize(self):
+ n, n2 = 4, 2
+
+ # Input image.
+ x = create_test_input(1, n, n, 1)
+
+ # Convolution kernel.
+ dw = create_test_input(1, 3, 3, 1)
+ dw = tf.reshape(dw, [3, 3, 1, 1])
+
+ tf.get_variable('Conv/depthwise_weights', initializer=dw)
+ tf.get_variable('Conv/pointwise_weights',
+ initializer=tf.ones([1, 1, 1, 1]))
+ tf.get_variable('Conv/biases', initializer=tf.zeros([1]))
+ tf.get_variable_scope().reuse_variables()
+
+ y1 = slim.separable_conv2d(x, 1, [3, 3], depth_multiplier=1,
+ stride=1, scope='Conv')
+ y1_expected = tf.cast([[14, 28, 43, 26],
+ [28, 48, 66, 37],
+ [43, 66, 84, 46],
+ [26, 37, 46, 22]], tf.float32)
+ y1_expected = tf.reshape(y1_expected, [1, n, n, 1])
+
+ y2 = resnet_utils.subsample(y1, 2)
+ y2_expected = tf.cast([[14, 43],
+ [43, 84]], tf.float32)
+ y2_expected = tf.reshape(y2_expected, [1, n2, n2, 1])
+
+ y3 = xception.separable_conv2d_same(x, 1, 3, depth_multiplier=1,
+ regularize_depthwise=True,
+ stride=2, scope='Conv')
+ y3_expected = y2_expected
+
+ y4 = slim.separable_conv2d(x, 1, [3, 3], depth_multiplier=1,
+ stride=2, scope='Conv')
+ y4_expected = tf.cast([[48, 37],
+ [37, 22]], tf.float32)
+ y4_expected = tf.reshape(y4_expected, [1, n2, n2, 1])
+
+ with self.test_session() as sess:
+ sess.run(tf.global_variables_initializer())
+ self.assertAllClose(y1.eval(), y1_expected.eval())
+ self.assertAllClose(y2.eval(), y2_expected.eval())
+ self.assertAllClose(y3.eval(), y3_expected.eval())
+ self.assertAllClose(y4.eval(), y4_expected.eval())
+
+ def testSeparableConv2DSameWithInputOddSize(self):
+ n, n2 = 5, 3
+
+ # Input image.
+ x = create_test_input(1, n, n, 1)
+
+ # Convolution kernel.
+ dw = create_test_input(1, 3, 3, 1)
+ dw = tf.reshape(dw, [3, 3, 1, 1])
+
+ tf.get_variable('Conv/depthwise_weights', initializer=dw)
+ tf.get_variable('Conv/pointwise_weights',
+ initializer=tf.ones([1, 1, 1, 1]))
+ tf.get_variable('Conv/biases', initializer=tf.zeros([1]))
+ tf.get_variable_scope().reuse_variables()
+
+ y1 = slim.separable_conv2d(x, 1, [3, 3], depth_multiplier=1,
+ stride=1, scope='Conv')
+ y1_expected = tf.cast([[14, 28, 43, 58, 34],
+ [28, 48, 66, 84, 46],
+ [43, 66, 84, 102, 55],
+ [58, 84, 102, 120, 64],
+ [34, 46, 55, 64, 30]], tf.float32)
+ y1_expected = tf.reshape(y1_expected, [1, n, n, 1])
+
+ y2 = resnet_utils.subsample(y1, 2)
+ y2_expected = tf.cast([[14, 43, 34],
+ [43, 84, 55],
+ [34, 55, 30]], tf.float32)
+ y2_expected = tf.reshape(y2_expected, [1, n2, n2, 1])
+
+ y3 = xception.separable_conv2d_same(x, 1, 3, depth_multiplier=1,
+ regularize_depthwise=True,
+ stride=2, scope='Conv')
+ y3_expected = y2_expected
+
+ y4 = slim.separable_conv2d(x, 1, [3, 3], depth_multiplier=1,
+ stride=2, scope='Conv')
+ y4_expected = y2_expected
+
+ with self.test_session() as sess:
+ sess.run(tf.global_variables_initializer())
+ self.assertAllClose(y1.eval(), y1_expected.eval())
+ self.assertAllClose(y2.eval(), y2_expected.eval())
+ self.assertAllClose(y3.eval(), y3_expected.eval())
+ self.assertAllClose(y4.eval(), y4_expected.eval())
+
+
+class XceptionNetworkTest(tf.test.TestCase):
+ """Tests with small Xception network."""
+
+ def _xception_small(self,
+ inputs,
+ num_classes=None,
+ is_training=True,
+ global_pool=True,
+ output_stride=None,
+ regularize_depthwise=True,
+ reuse=None,
+ scope='xception_small'):
+ """A shallow and thin Xception for faster tests."""
+ block = xception.xception_block
+ blocks = [
+ block('entry_flow/block1',
+ depth_list=[1, 1, 1],
+ skip_connection_type='conv',
+ activation_fn_in_separable_conv=False,
+ regularize_depthwise=regularize_depthwise,
+ num_units=1,
+ stride=2),
+ block('entry_flow/block2',
+ depth_list=[2, 2, 2],
+ skip_connection_type='conv',
+ activation_fn_in_separable_conv=False,
+ regularize_depthwise=regularize_depthwise,
+ num_units=1,
+ stride=2),
+ block('entry_flow/block3',
+ depth_list=[4, 4, 4],
+ skip_connection_type='conv',
+ activation_fn_in_separable_conv=False,
+ regularize_depthwise=regularize_depthwise,
+ num_units=1,
+ stride=1),
+ block('entry_flow/block4',
+ depth_list=[4, 4, 4],
+ skip_connection_type='conv',
+ activation_fn_in_separable_conv=False,
+ regularize_depthwise=regularize_depthwise,
+ num_units=1,
+ stride=2),
+ block('middle_flow/block1',
+ depth_list=[4, 4, 4],
+ skip_connection_type='sum',
+ activation_fn_in_separable_conv=False,
+ regularize_depthwise=regularize_depthwise,
+ num_units=2,
+ stride=1),
+ block('exit_flow/block1',
+ depth_list=[8, 8, 8],
+ skip_connection_type='conv',
+ activation_fn_in_separable_conv=False,
+ regularize_depthwise=regularize_depthwise,
+ num_units=1,
+ stride=2),
+ block('exit_flow/block2',
+ depth_list=[16, 16, 16],
+ skip_connection_type='none',
+ activation_fn_in_separable_conv=True,
+ regularize_depthwise=regularize_depthwise,
+ num_units=1,
+ stride=1),
+ ]
+ return xception.xception(inputs,
+ blocks=blocks,
+ num_classes=num_classes,
+ is_training=is_training,
+ global_pool=global_pool,
+ output_stride=output_stride,
+ reuse=reuse,
+ scope=scope)
+
+ def testClassificationEndPoints(self):
+ global_pool = True
+ num_classes = 3
+ inputs = create_test_input(2, 32, 32, 3)
+ with slim.arg_scope(xception.xception_arg_scope()):
+ logits, end_points = self._xception_small(
+ inputs,
+ num_classes=num_classes,
+ global_pool=global_pool,
+ scope='xception')
+ self.assertTrue(
+ logits.op.name.startswith('xception/logits'))
+ self.assertListEqual(logits.get_shape().as_list(), [2, 1, 1, num_classes])
+ self.assertTrue('predictions' in end_points)
+ self.assertListEqual(end_points['predictions'].get_shape().as_list(),
+ [2, 1, 1, num_classes])
+ self.assertTrue('global_pool' in end_points)
+ self.assertListEqual(end_points['global_pool'].get_shape().as_list(),
+ [2, 1, 1, 16])
+
+ def testEndpointNames(self):
+ global_pool = True
+ num_classes = 3
+ inputs = create_test_input(2, 32, 32, 3)
+ with slim.arg_scope(xception.xception_arg_scope()):
+ _, end_points = self._xception_small(
+ inputs,
+ num_classes=num_classes,
+ global_pool=global_pool,
+ scope='xception')
+ expected = [
+ 'xception/entry_flow/conv1_1',
+ 'xception/entry_flow/conv1_2',
+ 'xception/entry_flow/block1/unit_1/xception_module/separable_conv1',
+ 'xception/entry_flow/block1/unit_1/xception_module/separable_conv2',
+ 'xception/entry_flow/block1/unit_1/xception_module/separable_conv3',
+ 'xception/entry_flow/block1/unit_1/xception_module/shortcut',
+ 'xception/entry_flow/block1/unit_1/xception_module',
+ 'xception/entry_flow/block1',
+ 'xception/entry_flow/block2/unit_1/xception_module/separable_conv1',
+ 'xception/entry_flow/block2/unit_1/xception_module/separable_conv2',
+ 'xception/entry_flow/block2/unit_1/xception_module/separable_conv3',
+ 'xception/entry_flow/block2/unit_1/xception_module/shortcut',
+ 'xception/entry_flow/block2/unit_1/xception_module',
+ 'xception/entry_flow/block2',
+ 'xception/entry_flow/block3/unit_1/xception_module/separable_conv1',
+ 'xception/entry_flow/block3/unit_1/xception_module/separable_conv2',
+ 'xception/entry_flow/block3/unit_1/xception_module/separable_conv3',
+ 'xception/entry_flow/block3/unit_1/xception_module/shortcut',
+ 'xception/entry_flow/block3/unit_1/xception_module',
+ 'xception/entry_flow/block3',
+ 'xception/entry_flow/block4/unit_1/xception_module/separable_conv1',
+ 'xception/entry_flow/block4/unit_1/xception_module/separable_conv2',
+ 'xception/entry_flow/block4/unit_1/xception_module/separable_conv3',
+ 'xception/entry_flow/block4/unit_1/xception_module/shortcut',
+ 'xception/entry_flow/block4/unit_1/xception_module',
+ 'xception/entry_flow/block4',
+ 'xception/middle_flow/block1/unit_1/xception_module/separable_conv1',
+ 'xception/middle_flow/block1/unit_1/xception_module/separable_conv2',
+ 'xception/middle_flow/block1/unit_1/xception_module/separable_conv3',
+ 'xception/middle_flow/block1/unit_1/xception_module',
+ 'xception/middle_flow/block1/unit_2/xception_module/separable_conv1',
+ 'xception/middle_flow/block1/unit_2/xception_module/separable_conv2',
+ 'xception/middle_flow/block1/unit_2/xception_module/separable_conv3',
+ 'xception/middle_flow/block1/unit_2/xception_module',
+ 'xception/middle_flow/block1',
+ 'xception/exit_flow/block1/unit_1/xception_module/separable_conv1',
+ 'xception/exit_flow/block1/unit_1/xception_module/separable_conv2',
+ 'xception/exit_flow/block1/unit_1/xception_module/separable_conv3',
+ 'xception/exit_flow/block1/unit_1/xception_module/shortcut',
+ 'xception/exit_flow/block1/unit_1/xception_module',
+ 'xception/exit_flow/block1',
+ 'xception/exit_flow/block2/unit_1/xception_module/separable_conv1',
+ 'xception/exit_flow/block2/unit_1/xception_module/separable_conv2',
+ 'xception/exit_flow/block2/unit_1/xception_module/separable_conv3',
+ 'xception/exit_flow/block2/unit_1/xception_module',
+ 'xception/exit_flow/block2',
+ 'global_pool',
+ 'xception/logits',
+ 'predictions',
+ ]
+ self.assertItemsEqual(list(end_points.keys()), expected)
+
+ def testClassificationShapes(self):
+ global_pool = True
+ num_classes = 3
+ inputs = create_test_input(2, 64, 64, 3)
+ with slim.arg_scope(xception.xception_arg_scope()):
+ _, end_points = self._xception_small(
+ inputs,
+ num_classes,
+ global_pool=global_pool,
+ scope='xception')
+ endpoint_to_shape = {
+ 'xception/entry_flow/conv1_1': [2, 32, 32, 32],
+ 'xception/entry_flow/block1': [2, 16, 16, 1],
+ 'xception/entry_flow/block2': [2, 8, 8, 2],
+ 'xception/entry_flow/block4': [2, 4, 4, 4],
+ 'xception/middle_flow/block1': [2, 4, 4, 4],
+ 'xception/exit_flow/block1': [2, 2, 2, 8],
+ 'xception/exit_flow/block2': [2, 2, 2, 16]}
+ for endpoint, shape in six.iteritems(endpoint_to_shape):
+ self.assertListEqual(end_points[endpoint].get_shape().as_list(), shape)
+
+ def testFullyConvolutionalEndpointShapes(self):
+ global_pool = False
+ num_classes = 3
+ inputs = create_test_input(2, 65, 65, 3)
+ with slim.arg_scope(xception.xception_arg_scope()):
+ _, end_points = self._xception_small(
+ inputs,
+ num_classes,
+ global_pool=global_pool,
+ scope='xception')
+ endpoint_to_shape = {
+ 'xception/entry_flow/conv1_1': [2, 33, 33, 32],
+ 'xception/entry_flow/block1': [2, 17, 17, 1],
+ 'xception/entry_flow/block2': [2, 9, 9, 2],
+ 'xception/entry_flow/block4': [2, 5, 5, 4],
+ 'xception/middle_flow/block1': [2, 5, 5, 4],
+ 'xception/exit_flow/block1': [2, 3, 3, 8],
+ 'xception/exit_flow/block2': [2, 3, 3, 16]}
+ for endpoint, shape in six.iteritems(endpoint_to_shape):
+ self.assertListEqual(end_points[endpoint].get_shape().as_list(), shape)
+
+ def testAtrousFullyConvolutionalEndpointShapes(self):
+ global_pool = False
+ num_classes = 3
+ output_stride = 8
+ inputs = create_test_input(2, 65, 65, 3)
+ with slim.arg_scope(xception.xception_arg_scope()):
+ _, end_points = self._xception_small(
+ inputs,
+ num_classes,
+ global_pool=global_pool,
+ output_stride=output_stride,
+ scope='xception')
+ endpoint_to_shape = {
+ 'xception/entry_flow/block1': [2, 17, 17, 1],
+ 'xception/entry_flow/block2': [2, 9, 9, 2],
+ 'xception/entry_flow/block4': [2, 9, 9, 4],
+ 'xception/middle_flow/block1': [2, 9, 9, 4],
+ 'xception/exit_flow/block1': [2, 9, 9, 8],
+ 'xception/exit_flow/block2': [2, 9, 9, 16]}
+ for endpoint, shape in six.iteritems(endpoint_to_shape):
+ self.assertListEqual(end_points[endpoint].get_shape().as_list(), shape)
+
+ def testAtrousFullyConvolutionalValues(self):
+ """Verify dense feature extraction with atrous convolution."""
+ nominal_stride = 32
+ for output_stride in [4, 8, 16, 32, None]:
+ with slim.arg_scope(xception.xception_arg_scope()):
+ with tf.Graph().as_default():
+ with self.test_session() as sess:
+ tf.set_random_seed(0)
+ inputs = create_test_input(2, 96, 97, 3)
+ # Dense feature extraction followed by subsampling.
+ output, _ = self._xception_small(
+ inputs,
+ None,
+ is_training=False,
+ global_pool=False,
+ output_stride=output_stride)
+ if output_stride is None:
+ factor = 1
+ else:
+ factor = nominal_stride // output_stride
+ output = resnet_utils.subsample(output, factor)
+ # Make the two networks use the same weights.
+ tf.get_variable_scope().reuse_variables()
+ # Feature extraction at the nominal network rate.
+ expected, _ = self._xception_small(
+ inputs,
+ None,
+ is_training=False,
+ global_pool=False)
+ sess.run(tf.global_variables_initializer())
+ self.assertAllClose(output.eval(), expected.eval(),
+ atol=1e-5, rtol=1e-5)
+
+ def testUnknownBatchSize(self):
+ batch = 2
+ height, width = 65, 65
+ global_pool = True
+ num_classes = 10
+ inputs = create_test_input(None, height, width, 3)
+ with slim.arg_scope(xception.xception_arg_scope()):
+ logits, _ = self._xception_small(
+ inputs,
+ num_classes,
+ global_pool=global_pool,
+ scope='xception')
+ self.assertTrue(logits.op.name.startswith('xception/logits'))
+ self.assertListEqual(logits.get_shape().as_list(),
+ [None, 1, 1, num_classes])
+ images = create_test_input(batch, height, width, 3)
+ with self.test_session() as sess:
+ sess.run(tf.global_variables_initializer())
+ output = sess.run(logits, {inputs: images.eval()})
+ self.assertEquals(output.shape, (batch, 1, 1, num_classes))
+
+ def testFullyConvolutionalUnknownHeightWidth(self):
+ batch = 2
+ height, width = 65, 65
+ global_pool = False
+ inputs = create_test_input(batch, None, None, 3)
+ with slim.arg_scope(xception.xception_arg_scope()):
+ output, _ = self._xception_small(
+ inputs,
+ None,
+ global_pool=global_pool)
+ self.assertListEqual(output.get_shape().as_list(),
+ [batch, None, None, 16])
+ images = create_test_input(batch, height, width, 3)
+ with self.test_session() as sess:
+ sess.run(tf.global_variables_initializer())
+ output = sess.run(output, {inputs: images.eval()})
+ self.assertEquals(output.shape, (batch, 3, 3, 16))
+
+ def testAtrousFullyConvolutionalUnknownHeightWidth(self):
+ batch = 2
+ height, width = 65, 65
+ global_pool = False
+ output_stride = 8
+ inputs = create_test_input(batch, None, None, 3)
+ with slim.arg_scope(xception.xception_arg_scope()):
+ output, _ = self._xception_small(
+ inputs,
+ None,
+ global_pool=global_pool,
+ output_stride=output_stride)
+ self.assertListEqual(output.get_shape().as_list(),
+ [batch, None, None, 16])
+ images = create_test_input(batch, height, width, 3)
+ with self.test_session() as sess:
+ sess.run(tf.global_variables_initializer())
+ output = sess.run(output, {inputs: images.eval()})
+ self.assertEquals(output.shape, (batch, 9, 9, 16))
+
+ def testEndpointsReuse(self):
+ inputs = create_test_input(2, 32, 32, 3)
+ with slim.arg_scope(xception.xception_arg_scope()):
+ _, end_points0 = xception.xception_65(
+ inputs,
+ num_classes=10,
+ reuse=False)
+ with slim.arg_scope(xception.xception_arg_scope()):
+ _, end_points1 = xception.xception_65(
+ inputs,
+ num_classes=10,
+ reuse=True)
+ self.assertItemsEqual(list(end_points0.keys()), list(end_points1.keys()))
+
+ def testUseBoundedAcitvation(self):
+ global_pool = False
+ num_classes = 3
+ output_stride = 16
+ for use_bounded_activation in (True, False):
+ tf.reset_default_graph()
+ inputs = create_test_input(2, 65, 65, 3)
+ with slim.arg_scope(xception.xception_arg_scope(
+ use_bounded_activation=use_bounded_activation)):
+ _, _ = self._xception_small(
+ inputs,
+ num_classes,
+ global_pool=global_pool,
+ output_stride=output_stride,
+ scope='xception')
+ for node in tf.get_default_graph().as_graph_def().node:
+ if node.op.startswith('Relu'):
+ self.assertEqual(node.op == 'Relu6', use_bounded_activation)
+
+if __name__ == '__main__':
+ tf.test.main()
diff --git a/deeplab/models/research/deeplab/datasets/PQR/dataset/ImageSets/train.txt b/deeplab/models/research/deeplab/datasets/PQR/dataset/ImageSets/train.txt
new file mode 100644
index 0000000..4cb3d73
--- /dev/null
+++ b/deeplab/models/research/deeplab/datasets/PQR/dataset/ImageSets/train.txt
@@ -0,0 +1,3 @@
+Heart_O_img_0
+Heart_O_img_1
+Heart_O_img_2
\ No newline at end of file
diff --git a/deeplab/models/research/deeplab/datasets/PQR/dataset/ImageSets/trainval.txt b/deeplab/models/research/deeplab/datasets/PQR/dataset/ImageSets/trainval.txt
new file mode 100644
index 0000000..8c158fb
--- /dev/null
+++ b/deeplab/models/research/deeplab/datasets/PQR/dataset/ImageSets/trainval.txt
@@ -0,0 +1,5 @@
+Heart_O_img_0
+Heart_O_img_1
+Heart_O_img_2
+Heart_O_img_3
+Heart_O_img_4
\ No newline at end of file
diff --git a/deeplab/models/research/deeplab/datasets/PQR/dataset/ImageSets/val.txt b/deeplab/models/research/deeplab/datasets/PQR/dataset/ImageSets/val.txt
new file mode 100644
index 0000000..c154e99
--- /dev/null
+++ b/deeplab/models/research/deeplab/datasets/PQR/dataset/ImageSets/val.txt
@@ -0,0 +1,2 @@
+Heart_O_img_3
+Heart_O_img_4
\ No newline at end of file
diff --git a/deeplab/models/research/deeplab/datasets/PQR/dataset/JPEGImages/Heart_O_img_0.jpg b/deeplab/models/research/deeplab/datasets/PQR/dataset/JPEGImages/Heart_O_img_0.jpg
new file mode 100644
index 0000000..fcbc9ab
Binary files /dev/null and b/deeplab/models/research/deeplab/datasets/PQR/dataset/JPEGImages/Heart_O_img_0.jpg differ
diff --git a/deeplab/models/research/deeplab/datasets/PQR/dataset/JPEGImages/Heart_O_img_1.jpg b/deeplab/models/research/deeplab/datasets/PQR/dataset/JPEGImages/Heart_O_img_1.jpg
new file mode 100644
index 0000000..b35cac0
Binary files /dev/null and b/deeplab/models/research/deeplab/datasets/PQR/dataset/JPEGImages/Heart_O_img_1.jpg differ
diff --git a/deeplab/models/research/deeplab/datasets/PQR/dataset/JPEGImages/Heart_O_img_2.jpg b/deeplab/models/research/deeplab/datasets/PQR/dataset/JPEGImages/Heart_O_img_2.jpg
new file mode 100644
index 0000000..5febf5e
Binary files /dev/null and b/deeplab/models/research/deeplab/datasets/PQR/dataset/JPEGImages/Heart_O_img_2.jpg differ
diff --git a/deeplab/models/research/deeplab/datasets/PQR/dataset/JPEGImages/Heart_O_img_3.jpg b/deeplab/models/research/deeplab/datasets/PQR/dataset/JPEGImages/Heart_O_img_3.jpg
new file mode 100644
index 0000000..8cbabc2
Binary files /dev/null and b/deeplab/models/research/deeplab/datasets/PQR/dataset/JPEGImages/Heart_O_img_3.jpg differ
diff --git a/deeplab/models/research/deeplab/datasets/PQR/dataset/JPEGImages/Heart_O_img_4.jpg b/deeplab/models/research/deeplab/datasets/PQR/dataset/JPEGImages/Heart_O_img_4.jpg
new file mode 100644
index 0000000..1212a0f
Binary files /dev/null and b/deeplab/models/research/deeplab/datasets/PQR/dataset/JPEGImages/Heart_O_img_4.jpg differ
diff --git a/deeplab/models/research/deeplab/datasets/PQR/dataset/SegmentationClass/Heart_O_img_0.jpg b/deeplab/models/research/deeplab/datasets/PQR/dataset/SegmentationClass/Heart_O_img_0.jpg
new file mode 100644
index 0000000..ca48633
Binary files /dev/null and b/deeplab/models/research/deeplab/datasets/PQR/dataset/SegmentationClass/Heart_O_img_0.jpg differ
diff --git a/deeplab/models/research/deeplab/datasets/PQR/dataset/SegmentationClass/Heart_O_img_1.jpg b/deeplab/models/research/deeplab/datasets/PQR/dataset/SegmentationClass/Heart_O_img_1.jpg
new file mode 100644
index 0000000..0d0f771
Binary files /dev/null and b/deeplab/models/research/deeplab/datasets/PQR/dataset/SegmentationClass/Heart_O_img_1.jpg differ
diff --git a/deeplab/models/research/deeplab/datasets/PQR/dataset/SegmentationClass/Heart_O_img_2.jpg b/deeplab/models/research/deeplab/datasets/PQR/dataset/SegmentationClass/Heart_O_img_2.jpg
new file mode 100644
index 0000000..56d3ed3
Binary files /dev/null and b/deeplab/models/research/deeplab/datasets/PQR/dataset/SegmentationClass/Heart_O_img_2.jpg differ
diff --git a/deeplab/models/research/deeplab/datasets/PQR/dataset/SegmentationClass/Heart_O_img_3.jpg b/deeplab/models/research/deeplab/datasets/PQR/dataset/SegmentationClass/Heart_O_img_3.jpg
new file mode 100644
index 0000000..18dd87a
Binary files /dev/null and b/deeplab/models/research/deeplab/datasets/PQR/dataset/SegmentationClass/Heart_O_img_3.jpg differ
diff --git a/deeplab/models/research/deeplab/datasets/PQR/dataset/SegmentationClass/Heart_O_img_4.jpg b/deeplab/models/research/deeplab/datasets/PQR/dataset/SegmentationClass/Heart_O_img_4.jpg
new file mode 100644
index 0000000..e517088
Binary files /dev/null and b/deeplab/models/research/deeplab/datasets/PQR/dataset/SegmentationClass/Heart_O_img_4.jpg differ
diff --git a/deeplab/models/research/deeplab/datasets/PQR/dataset/SegmentationClassRaw/Heart_O_img_0.png b/deeplab/models/research/deeplab/datasets/PQR/dataset/SegmentationClassRaw/Heart_O_img_0.png
new file mode 100644
index 0000000..fb1b47a
Binary files /dev/null and b/deeplab/models/research/deeplab/datasets/PQR/dataset/SegmentationClassRaw/Heart_O_img_0.png differ
diff --git a/deeplab/models/research/deeplab/datasets/PQR/dataset/SegmentationClassRaw/Heart_O_img_1.png b/deeplab/models/research/deeplab/datasets/PQR/dataset/SegmentationClassRaw/Heart_O_img_1.png
new file mode 100644
index 0000000..fb1b47a
Binary files /dev/null and b/deeplab/models/research/deeplab/datasets/PQR/dataset/SegmentationClassRaw/Heart_O_img_1.png differ
diff --git a/deeplab/models/research/deeplab/datasets/PQR/dataset/SegmentationClassRaw/Heart_O_img_2.png b/deeplab/models/research/deeplab/datasets/PQR/dataset/SegmentationClassRaw/Heart_O_img_2.png
new file mode 100644
index 0000000..fb1b47a
Binary files /dev/null and b/deeplab/models/research/deeplab/datasets/PQR/dataset/SegmentationClassRaw/Heart_O_img_2.png differ
diff --git a/deeplab/models/research/deeplab/datasets/PQR/dataset/SegmentationClassRaw/Heart_O_img_3.png b/deeplab/models/research/deeplab/datasets/PQR/dataset/SegmentationClassRaw/Heart_O_img_3.png
new file mode 100644
index 0000000..fb1b47a
Binary files /dev/null and b/deeplab/models/research/deeplab/datasets/PQR/dataset/SegmentationClassRaw/Heart_O_img_3.png differ
diff --git a/deeplab/models/research/deeplab/datasets/PQR/dataset/SegmentationClassRaw/Heart_O_img_4.png b/deeplab/models/research/deeplab/datasets/PQR/dataset/SegmentationClassRaw/Heart_O_img_4.png
new file mode 100644
index 0000000..fb1b47a
Binary files /dev/null and b/deeplab/models/research/deeplab/datasets/PQR/dataset/SegmentationClassRaw/Heart_O_img_4.png differ
diff --git a/deeplab/models/research/deeplab/datasets/__init__.py b/deeplab/models/research/deeplab/datasets/__init__.py
new file mode 100644
index 0000000..e69de29
diff --git a/deeplab/models/research/deeplab/datasets/build_ade20k_data.py b/deeplab/models/research/deeplab/datasets/build_ade20k_data.py
new file mode 100644
index 0000000..fc04ed0
--- /dev/null
+++ b/deeplab/models/research/deeplab/datasets/build_ade20k_data.py
@@ -0,0 +1,123 @@
+# Lint as: python2, python3
+# Copyright 2018 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+
+"""Converts ADE20K data to TFRecord file format with Example protos."""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import math
+import os
+import random
+import sys
+import build_data
+from six.moves import range
+import tensorflow as tf
+
+FLAGS = tf.app.flags.FLAGS
+
+tf.app.flags.DEFINE_string(
+ 'train_image_folder',
+ './ADE20K/ADEChallengeData2016/images/training',
+ 'Folder containing trainng images')
+tf.app.flags.DEFINE_string(
+ 'train_image_label_folder',
+ './ADE20K/ADEChallengeData2016/annotations/training',
+ 'Folder containing annotations for trainng images')
+
+tf.app.flags.DEFINE_string(
+ 'val_image_folder',
+ './ADE20K/ADEChallengeData2016/images/validation',
+ 'Folder containing validation images')
+
+tf.app.flags.DEFINE_string(
+ 'val_image_label_folder',
+ './ADE20K/ADEChallengeData2016/annotations/validation',
+ 'Folder containing annotations for validation')
+
+tf.app.flags.DEFINE_string(
+ 'output_dir', './ADE20K/tfrecord',
+ 'Path to save converted tfrecord of Tensorflow example')
+
+_NUM_SHARDS = 4
+
+
+def _convert_dataset(dataset_split, dataset_dir, dataset_label_dir):
+ """Converts the ADE20k dataset into into tfrecord format.
+
+ Args:
+ dataset_split: Dataset split (e.g., train, val).
+ dataset_dir: Dir in which the dataset locates.
+ dataset_label_dir: Dir in which the annotations locates.
+
+ Raises:
+ RuntimeError: If loaded image and label have different shape.
+ """
+
+ img_names = tf.gfile.Glob(os.path.join(dataset_dir, '*.jpg'))
+ random.shuffle(img_names)
+ seg_names = []
+ for f in img_names:
+ # get the filename without the extension
+ basename = os.path.basename(f).split('.')[0]
+ # cover its corresponding *_seg.png
+ seg = os.path.join(dataset_label_dir, basename+'.png')
+ seg_names.append(seg)
+
+ num_images = len(img_names)
+ num_per_shard = int(math.ceil(num_images / _NUM_SHARDS))
+
+ image_reader = build_data.ImageReader('jpeg', channels=3)
+ label_reader = build_data.ImageReader('png', channels=1)
+
+ for shard_id in range(_NUM_SHARDS):
+ output_filename = os.path.join(
+ FLAGS.output_dir,
+ '%s-%05d-of-%05d.tfrecord' % (dataset_split, shard_id, _NUM_SHARDS))
+ with tf.python_io.TFRecordWriter(output_filename) as tfrecord_writer:
+ start_idx = shard_id * num_per_shard
+ end_idx = min((shard_id + 1) * num_per_shard, num_images)
+ for i in range(start_idx, end_idx):
+ sys.stdout.write('\r>> Converting image %d/%d shard %d' % (
+ i + 1, num_images, shard_id))
+ sys.stdout.flush()
+ # Read the image.
+ image_filename = img_names[i]
+ image_data = tf.gfile.FastGFile(image_filename, 'rb').read()
+ height, width = image_reader.read_image_dims(image_data)
+ # Read the semantic segmentation annotation.
+ seg_filename = seg_names[i]
+ seg_data = tf.gfile.FastGFile(seg_filename, 'rb').read()
+ seg_height, seg_width = label_reader.read_image_dims(seg_data)
+ if height != seg_height or width != seg_width:
+ raise RuntimeError('Shape mismatched between image and label.')
+ # Convert to tf example.
+ example = build_data.image_seg_to_tfexample(
+ image_data, img_names[i], height, width, seg_data)
+ tfrecord_writer.write(example.SerializeToString())
+ sys.stdout.write('\n')
+ sys.stdout.flush()
+
+
+def main(unused_argv):
+ tf.gfile.MakeDirs(FLAGS.output_dir)
+ _convert_dataset(
+ 'train', FLAGS.train_image_folder, FLAGS.train_image_label_folder)
+ _convert_dataset('val', FLAGS.val_image_folder, FLAGS.val_image_label_folder)
+
+
+if __name__ == '__main__':
+ tf.app.run()
diff --git a/deeplab/models/research/deeplab/datasets/build_cityscapes_data.py b/deeplab/models/research/deeplab/datasets/build_cityscapes_data.py
new file mode 100644
index 0000000..53c11e3
--- /dev/null
+++ b/deeplab/models/research/deeplab/datasets/build_cityscapes_data.py
@@ -0,0 +1,198 @@
+# Lint as: python2, python3
+# Copyright 2018 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+
+"""Converts Cityscapes data to TFRecord file format with Example protos.
+
+The Cityscapes dataset is expected to have the following directory structure:
+
+ + cityscapes
+ - build_cityscapes_data.py (current working directiory).
+ - build_data.py
+ + cityscapesscripts
+ + annotation
+ + evaluation
+ + helpers
+ + preparation
+ + viewer
+ + gtFine
+ + train
+ + val
+ + test
+ + leftImg8bit
+ + train
+ + val
+ + test
+ + tfrecord
+
+This script converts data into sharded data files and save at tfrecord folder.
+
+Note that before running this script, the users should (1) register the
+Cityscapes dataset website at https://www.cityscapes-dataset.com to
+download the dataset, and (2) run the script provided by Cityscapes
+`preparation/createTrainIdLabelImgs.py` to generate the training groundtruth.
+
+Also note that the tensorflow model will be trained with `TrainId' instead
+of `EvalId' used on the evaluation server. Thus, the users need to convert
+the predicted labels to `EvalId` for evaluation on the server. See the
+vis.py for more details.
+
+The Example proto contains the following fields:
+
+ image/encoded: encoded image content.
+ image/filename: image filename.
+ image/format: image file format.
+ image/height: image height.
+ image/width: image width.
+ image/channels: image channels.
+ image/segmentation/class/encoded: encoded semantic segmentation content.
+ image/segmentation/class/format: semantic segmentation file format.
+"""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import glob
+import math
+import os.path
+import re
+import sys
+import build_data
+from six.moves import range
+import tensorflow as tf
+
+FLAGS = tf.app.flags.FLAGS
+
+tf.app.flags.DEFINE_string('cityscapes_root',
+ './cityscapes',
+ 'Cityscapes dataset root folder.')
+
+tf.app.flags.DEFINE_string(
+ 'output_dir',
+ './tfrecord',
+ 'Path to save converted SSTable of TensorFlow examples.')
+
+
+_NUM_SHARDS = 10
+
+# A map from data type to folder name that saves the data.
+_FOLDERS_MAP = {
+ 'image': 'leftImg8bit',
+ 'label': 'gtFine',
+}
+
+# A map from data type to filename postfix.
+_POSTFIX_MAP = {
+ 'image': '_leftImg8bit',
+ 'label': '_gtFine_labelTrainIds',
+}
+
+# A map from data type to data format.
+_DATA_FORMAT_MAP = {
+ 'image': 'png',
+ 'label': 'png',
+}
+
+# Image file pattern.
+_IMAGE_FILENAME_RE = re.compile('(.+)' + _POSTFIX_MAP['image'])
+
+
+def _get_files(data, dataset_split):
+ """Gets files for the specified data type and dataset split.
+
+ Args:
+ data: String, desired data ('image' or 'label').
+ dataset_split: String, dataset split ('train_fine', 'val_fine', 'test_fine')
+
+ Returns:
+ A list of sorted file names or None when getting label for
+ test set.
+ """
+ if dataset_split == 'train_fine':
+ split_dir = 'train'
+ elif dataset_split == 'val_fine':
+ split_dir = 'val'
+ elif dataset_split == 'test_fine':
+ split_dir = 'test'
+ else:
+ raise RuntimeError("Split {} is not supported".format(dataset_split))
+ pattern = '*%s.%s' % (_POSTFIX_MAP[data], _DATA_FORMAT_MAP[data])
+ search_files = os.path.join(
+ FLAGS.cityscapes_root, _FOLDERS_MAP[data], split_dir, '*', pattern)
+ filenames = glob.glob(search_files)
+ return sorted(filenames)
+
+
+def _convert_dataset(dataset_split):
+ """Converts the specified dataset split to TFRecord format.
+
+ Args:
+ dataset_split: The dataset split (e.g., train_fine, val_fine).
+
+ Raises:
+ RuntimeError: If loaded image and label have different shape, or if the
+ image file with specified postfix could not be found.
+ """
+ image_files = _get_files('image', dataset_split)
+ label_files = _get_files('label', dataset_split)
+
+ num_images = len(image_files)
+ num_labels = len(label_files)
+ num_per_shard = int(math.ceil(num_images / _NUM_SHARDS))
+
+ if num_images != num_labels:
+ raise RuntimeError("The number of images and labels doesn't match: {} {}".format(num_images, num_labels))
+
+ image_reader = build_data.ImageReader('png', channels=3)
+ label_reader = build_data.ImageReader('png', channels=1)
+
+ for shard_id in range(_NUM_SHARDS):
+ shard_filename = '%s-%05d-of-%05d.tfrecord' % (
+ dataset_split, shard_id, _NUM_SHARDS)
+ output_filename = os.path.join(FLAGS.output_dir, shard_filename)
+ with tf.python_io.TFRecordWriter(output_filename) as tfrecord_writer:
+ start_idx = shard_id * num_per_shard
+ end_idx = min((shard_id + 1) * num_per_shard, num_images)
+ for i in range(start_idx, end_idx):
+ sys.stdout.write('\r>> Converting image %d/%d shard %d' % (
+ i + 1, num_images, shard_id))
+ sys.stdout.flush()
+ # Read the image.
+ image_data = tf.gfile.FastGFile(image_files[i], 'rb').read()
+ height, width = image_reader.read_image_dims(image_data)
+ # Read the semantic segmentation annotation.
+ seg_data = tf.gfile.FastGFile(label_files[i], 'rb').read()
+ seg_height, seg_width = label_reader.read_image_dims(seg_data)
+ if height != seg_height or width != seg_width:
+ raise RuntimeError('Shape mismatched between image and label.')
+ # Convert to tf example.
+ re_match = _IMAGE_FILENAME_RE.search(image_files[i])
+ if re_match is None:
+ raise RuntimeError('Invalid image filename: ' + image_files[i])
+ filename = os.path.basename(re_match.group(1))
+ example = build_data.image_seg_to_tfexample(
+ image_data, filename, height, width, seg_data)
+ tfrecord_writer.write(example.SerializeToString())
+ sys.stdout.write('\n')
+ sys.stdout.flush()
+
+
+def main(unused_argv):
+ # Only support converting 'train_fine', 'val_fine' and 'test_fine' sets for now.
+ for dataset_split in ['train_fine', 'val_fine', 'test_fine']:
+ _convert_dataset(dataset_split)
+
+
+if __name__ == '__main__':
+ tf.app.run()
diff --git a/deeplab/models/research/deeplab/datasets/build_data.py b/deeplab/models/research/deeplab/datasets/build_data.py
new file mode 100644
index 0000000..4562867
--- /dev/null
+++ b/deeplab/models/research/deeplab/datasets/build_data.py
@@ -0,0 +1,161 @@
+# Copyright 2018 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+
+"""Contains common utility functions and classes for building dataset.
+
+This script contains utility functions and classes to converts dataset to
+TFRecord file format with Example protos.
+
+The Example proto contains the following fields:
+
+ image/encoded: encoded image content.
+ image/filename: image filename.
+ image/format: image file format.
+ image/height: image height.
+ image/width: image width.
+ image/channels: image channels.
+ image/segmentation/class/encoded: encoded semantic segmentation content.
+ image/segmentation/class/format: semantic segmentation file format.
+"""
+import collections
+import six
+import tensorflow as tf
+
+FLAGS = tf.app.flags.FLAGS
+
+tf.app.flags.DEFINE_enum('image_format', 'png', ['jpg', 'jpeg', 'png'],
+ 'Image format.')
+
+tf.app.flags.DEFINE_enum('label_format', 'png', ['png'],
+ 'Segmentation label format.')
+
+# A map from image format to expected data format.
+_IMAGE_FORMAT_MAP = {
+ 'jpg': 'jpeg',
+ 'jpeg': 'jpeg',
+ 'png': 'png',
+}
+
+
+class ImageReader(object):
+ """Helper class that provides TensorFlow image coding utilities."""
+
+ def __init__(self, image_format='jpeg', channels=3):
+ """Class constructor.
+
+ Args:
+ image_format: Image format. Only 'jpeg', 'jpg', or 'png' are supported.
+ channels: Image channels.
+ """
+ with tf.Graph().as_default():
+ self._decode_data = tf.placeholder(dtype=tf.string)
+ self._image_format = image_format
+ self._session = tf.Session()
+ if self._image_format in ('jpeg', 'jpg'):
+ self._decode = tf.image.decode_jpeg(self._decode_data,
+ channels=channels)
+ elif self._image_format == 'png':
+ self._decode = tf.image.decode_png(self._decode_data,
+ channels=channels)
+
+ def read_image_dims(self, image_data):
+ """Reads the image dimensions.
+
+ Args:
+ image_data: string of image data.
+
+ Returns:
+ image_height and image_width.
+ """
+ image = self.decode_image(image_data)
+ return image.shape[:2]
+
+ def decode_image(self, image_data):
+ """Decodes the image data string.
+
+ Args:
+ image_data: string of image data.
+
+ Returns:
+ Decoded image data.
+
+ Raises:
+ ValueError: Value of image channels not supported.
+ """
+ image = self._session.run(self._decode,
+ feed_dict={self._decode_data: image_data})
+ if len(image.shape) != 3 or image.shape[2] not in (1, 3):
+ raise ValueError('The image channels not supported.')
+
+ return image
+
+
+def _int64_list_feature(values):
+ """Returns a TF-Feature of int64_list.
+
+ Args:
+ values: A scalar or list of values.
+
+ Returns:
+ A TF-Feature.
+ """
+ if not isinstance(values, collections.Iterable):
+ values = [values]
+
+ return tf.train.Feature(int64_list=tf.train.Int64List(value=values))
+
+
+def _bytes_list_feature(values):
+ """Returns a TF-Feature of bytes.
+
+ Args:
+ values: A string.
+
+ Returns:
+ A TF-Feature.
+ """
+ def norm2bytes(value):
+ return value.encode() if isinstance(value, str) and six.PY3 else value
+
+ return tf.train.Feature(
+ bytes_list=tf.train.BytesList(value=[norm2bytes(values)]))
+
+
+def image_seg_to_tfexample(image_data, filename, height, width, seg_data):
+ """Converts one image/segmentation pair to tf example.
+
+ Args:
+ image_data: string of image data.
+ filename: image filename.
+ height: image height.
+ width: image width.
+ seg_data: string of semantic segmentation data.
+
+ Returns:
+ tf example of one image/segmentation pair.
+ """
+ return tf.train.Example(features=tf.train.Features(feature={
+ 'image/encoded': _bytes_list_feature(image_data),
+ 'image/filename': _bytes_list_feature(filename),
+ 'image/format': _bytes_list_feature(
+ _IMAGE_FORMAT_MAP[FLAGS.image_format]),
+ 'image/height': _int64_list_feature(height),
+ 'image/width': _int64_list_feature(width),
+ 'image/channels': _int64_list_feature(3),
+ 'image/segmentation/class/encoded': (
+ _bytes_list_feature(seg_data)),
+ 'image/segmentation/class/format': _bytes_list_feature(
+ FLAGS.label_format),
+ }))
diff --git a/deeplab/models/research/deeplab/datasets/build_new_pqr_data.py b/deeplab/models/research/deeplab/datasets/build_new_pqr_data.py
new file mode 100644
index 0000000..defeb35
--- /dev/null
+++ b/deeplab/models/research/deeplab/datasets/build_new_pqr_data.py
@@ -0,0 +1,102 @@
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import math
+import os
+import random
+import sys
+import build_data
+from six.moves import range
+import tensorflow as tf
+
+FLAGS = tf.app.flags.FLAGS
+
+cwd = os.getcwd()
+
+tf.app.flags.DEFINE_string('image_folder',
+ './PQR/JPEGImages',
+ 'Folder containing images.')
+
+tf.app.flags.DEFINE_string(
+'semantic_segmentation_folder',
+'./PQR/SegmentationClassRaw',
+'Folder containing semantic segmentation annotations.')
+
+tf.app.flags.DEFINE_string(
+'list_folder',
+'./PQR/ImageSets',
+'Folder containing lists for training and validation')
+
+tf.app.flags.DEFINE_string(
+'output_dir',
+'./PQR/tfrecord',
+'Path to save converted SSTable of TensorFlow examples.')
+
+_NUM_SHARDS = 4
+
+
+def _convert_dataset(dataset_split):
+ """Converts the specified dataset split to TFRecord format.
+
+ Args:
+ dataset_split: The dataset split (e.g., train, test).
+
+ Raises:
+ RuntimeError: If loaded image and label have different shape.
+ """
+ dataset = os.path.basename(dataset_split)[:-4]
+ sys.stdout.write('Processing ' + dataset)
+ filenames = [x.strip('\n') for x in open(dataset_split, 'r')]
+ num_images = len(filenames)
+ num_per_shard = int(math.ceil(num_images / _NUM_SHARDS))
+
+ image_reader = build_data.ImageReader('jpeg', channels=3)
+ label_reader = build_data.ImageReader('png', channels=1)
+
+ for shard_id in range(_NUM_SHARDS):
+ output_filename = os.path.join(
+ FLAGS.output_dir,
+ '%s-%05d-of-%05d.tfrecord' % (dataset, shard_id, _NUM_SHARDS))
+ with tf.python_io.TFRecordWriter(output_filename) as tfrecord_writer:
+ start_idx = shard_id * num_per_shard
+ end_idx = min((shard_id + 1) * num_per_shard, num_images)
+ for i in range(start_idx, end_idx):
+ sys.stdout.write('\r>> Converting image %d/%d shard %d' % (
+ i + 1, len(filenames), shard_id))
+ sys.stdout.flush()
+ # Read the image.
+ sys.stdout.write(FLAGS.image_folder)
+ image_filename = os.path.join(
+ FLAGS.image_folder, filenames[i] + '.' + FLAGS.image_format)
+ image_data = tf.gfile.GFile(image_filename, 'rb').read()
+ height, width = image_reader.read_image_dims(image_data)
+ # Read the semantic segmentation annotation.
+ seg_filename = os.path.join(
+ FLAGS.semantic_segmentation_folder,
+ filenames[i] + '.' + FLAGS.label_format)
+ seg_data = tf.gfile.GFile(seg_filename, 'rb').read()
+ seg_height, seg_width = label_reader.read_image_dims(seg_data)
+ if height != seg_height or width != seg_width:
+ raise RuntimeError('Shape mismatched between image and label.')
+ # Convert to tf example.
+ example = build_data.image_seg_to_tfexample(
+ image_data, filenames[i], height, width, seg_data)
+ tfrecord_writer.write(example.SerializeToString())
+ sys.stdout.write('\n')
+ sys.stdout.flush()
+
+
+# def main(unused_argv):
+# tf.gfile.MakeDirs(FLAGS.output_dir)
+# _convert_dataset(
+# 'train', FLAGS.train_image_folder, FLAGS.train_image_label_folder)
+# _convert_dataset('val', FLAGS.val_image_folder, FLAGS.val_image_label_folder)
+
+def main(unused_argv):
+ dataset_splits = tf.gfile.Glob(os.path.join(FLAGS.list_folder, '*.txt'))
+ for dataset_split in dataset_splits:
+ _convert_dataset(dataset_split)
+
+
+if __name__ == '__main__':
+ tf.app.run()
\ No newline at end of file
diff --git a/deeplab/models/research/deeplab/datasets/build_voc2012_data.py b/deeplab/models/research/deeplab/datasets/build_voc2012_data.py
new file mode 100644
index 0000000..f0bdecb
--- /dev/null
+++ b/deeplab/models/research/deeplab/datasets/build_voc2012_data.py
@@ -0,0 +1,146 @@
+# Lint as: python2, python3
+# Copyright 2018 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+
+"""Converts PASCAL VOC 2012 data to TFRecord file format with Example protos.
+
+PASCAL VOC 2012 dataset is expected to have the following directory structure:
+
+ + pascal_voc_seg
+ - build_data.py
+ - build_voc2012_data.py (current working directory).
+ + VOCdevkit
+ + VOC2012
+ + JPEGImages
+ + SegmentationClass
+ + ImageSets
+ + Segmentation
+ + tfrecord
+
+Image folder:
+ ./VOCdevkit/VOC2012/JPEGImages
+
+Semantic segmentation annotations:
+ ./VOCdevkit/VOC2012/SegmentationClass
+
+list folder:
+ ./VOCdevkit/VOC2012/ImageSets/Segmentation
+
+This script converts data into sharded data files and save at tfrecord folder.
+
+The Example proto contains the following fields:
+
+ image/encoded: encoded image content.
+ image/filename: image filename.
+ image/format: image file format.
+ image/height: image height.
+ image/width: image width.
+ image/channels: image channels.
+ image/segmentation/class/encoded: encoded semantic segmentation content.
+ image/segmentation/class/format: semantic segmentation file format.
+"""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import math
+import os.path
+import sys
+import build_data
+from six.moves import range
+import tensorflow as tf
+
+FLAGS = tf.app.flags.FLAGS
+
+tf.app.flags.DEFINE_string('image_folder',
+ './VOCdevkit/VOC2012/JPEGImages',
+ 'Folder containing images.')
+
+tf.app.flags.DEFINE_string(
+ 'semantic_segmentation_folder',
+ './VOCdevkit/VOC2012/SegmentationClassRaw',
+ 'Folder containing semantic segmentation annotations.')
+
+tf.app.flags.DEFINE_string(
+ 'list_folder',
+ './VOCdevkit/VOC2012/ImageSets/Segmentation',
+ 'Folder containing lists for training and validation')
+
+tf.app.flags.DEFINE_string(
+ 'output_dir',
+ './tfrecord',
+ 'Path to save converted SSTable of TensorFlow examples.')
+
+
+_NUM_SHARDS = 4
+
+
+def _convert_dataset(dataset_split):
+ """Converts the specified dataset split to TFRecord format.
+
+ Args:
+ dataset_split: The dataset split (e.g., train, test).
+
+ Raises:
+ RuntimeError: If loaded image and label have different shape.
+ """
+ dataset = os.path.basename(dataset_split)[:-4]
+ sys.stdout.write('Processing ' + dataset)
+ filenames = [x.strip('\n') for x in open(dataset_split, 'r')]
+ num_images = len(filenames)
+ num_per_shard = int(math.ceil(num_images / _NUM_SHARDS))
+
+ image_reader = build_data.ImageReader('jpeg', channels=3)
+ label_reader = build_data.ImageReader('png', channels=1)
+
+ for shard_id in range(_NUM_SHARDS):
+ output_filename = os.path.join(
+ FLAGS.output_dir,
+ '%s-%05d-of-%05d.tfrecord' % (dataset, shard_id, _NUM_SHARDS))
+ with tf.python_io.TFRecordWriter(output_filename) as tfrecord_writer:
+ start_idx = shard_id * num_per_shard
+ end_idx = min((shard_id + 1) * num_per_shard, num_images)
+ for i in range(start_idx, end_idx):
+ sys.stdout.write('\r>> Converting image %d/%d shard %d' % (
+ i + 1, len(filenames), shard_id))
+ sys.stdout.flush()
+ # Read the image.
+ image_filename = os.path.join(
+ FLAGS.image_folder, filenames[i] + '.' + FLAGS.image_format)
+ image_data = tf.gfile.GFile(image_filename, 'rb').read()
+ height, width = image_reader.read_image_dims(image_data)
+ # Read the semantic segmentation annotation.
+ seg_filename = os.path.join(
+ FLAGS.semantic_segmentation_folder,
+ filenames[i] + '.' + FLAGS.label_format)
+ seg_data = tf.gfile.GFile(seg_filename, 'rb').read()
+ seg_height, seg_width = label_reader.read_image_dims(seg_data)
+ if height != seg_height or width != seg_width:
+ raise RuntimeError('Shape mismatched between image and label.')
+ # Convert to tf example.
+ example = build_data.image_seg_to_tfexample(
+ image_data, filenames[i], height, width, seg_data)
+ tfrecord_writer.write(example.SerializeToString())
+ sys.stdout.write('\n')
+ sys.stdout.flush()
+
+
+def main(unused_argv):
+ dataset_splits = tf.gfile.Glob(os.path.join(FLAGS.list_folder, '*.txt'))
+ for dataset_split in dataset_splits:
+ _convert_dataset(dataset_split)
+
+
+if __name__ == '__main__':
+ tf.app.run()
diff --git a/deeplab/models/research/deeplab/datasets/convert_cityscapes.sh b/deeplab/models/research/deeplab/datasets/convert_cityscapes.sh
new file mode 100644
index 0000000..ddc39fb
--- /dev/null
+++ b/deeplab/models/research/deeplab/datasets/convert_cityscapes.sh
@@ -0,0 +1,60 @@
+#!/bin/bash
+# Copyright 2018 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+#
+# Script to preprocess the Cityscapes dataset. Note (1) the users should
+# register the Cityscapes dataset website at
+# https://www.cityscapes-dataset.com/downloads/ to download the dataset,
+# and (2) the users should download the utility scripts provided by
+# Cityscapes at https://github.com/mcordts/cityscapesScripts.
+#
+# Usage:
+# bash ./convert_cityscapes.sh
+#
+# The folder structure is assumed to be:
+# + datasets
+# - build_cityscapes_data.py
+# - convert_cityscapes.sh
+# + cityscapes
+# + cityscapesscripts (downloaded scripts)
+# + gtFine
+# + leftImg8bit
+#
+
+# Exit immediately if a command exits with a non-zero status.
+set -e
+
+CURRENT_DIR=$(pwd)
+WORK_DIR="."
+
+# Root path for Cityscapes dataset.
+CITYSCAPES_ROOT="${WORK_DIR}/cityscapes"
+
+export PYTHONPATH="${CITYSCAPES_ROOT}:${PYTHONPATH}"
+
+# Create training labels.
+python "${CITYSCAPES_ROOT}/cityscapesscripts/preparation/createTrainIdLabelImgs.py"
+
+# Build TFRecords of the dataset.
+# First, create output directory for storing TFRecords.
+OUTPUT_DIR="${CITYSCAPES_ROOT}/tfrecord"
+mkdir -p "${OUTPUT_DIR}"
+
+BUILD_SCRIPT="${CURRENT_DIR}/build_cityscapes_data.py"
+
+echo "Converting Cityscapes dataset..."
+python "${BUILD_SCRIPT}" \
+ --cityscapes_root="${CITYSCAPES_ROOT}" \
+ --output_dir="${OUTPUT_DIR}" \
diff --git a/deeplab/models/research/deeplab/datasets/convert_pqr.sh b/deeplab/models/research/deeplab/datasets/convert_pqr.sh
new file mode 100644
index 0000000..845865d
--- /dev/null
+++ b/deeplab/models/research/deeplab/datasets/convert_pqr.sh
@@ -0,0 +1,30 @@
+CURRENT_DIR=$(pwd)
+# WORK_DIR="./PQR"
+WORK_DIR = "${CURRENT_DIR}/PQR"
+PQR_ROOT="${WORK_DIR}/dataset"
+SEG_FOLDER="${PQR_ROOT}/SegmentationClass"
+SEMANTIC_SEG_FOLDER="${PQR_ROOT}/SegmentationClassRaw"
+
+echo "Removing the color map in ground truth annotations..."
+python3.7 remove_gt_colormap.py \
+ --original_gt_folder="${SEG_FOLDER}" \
+ --output_dir="${SEMANTIC_SEG_FOLDER}"
+
+# Build TFRecords of the dataset.
+OUTPUT_DIR="${WORK_DIR}/tfrecord"
+mkdir -p "${OUTPUT_DIR}"
+
+# IMAGE_FOLDER="${PQR_ROOT}/JPEGImages"
+IMAGE_FOLDER="/Users/mandywoo/Documents/UAV-Forge/image-proc_2020-21/models/research/deeplab/datasets/PQR/dataset/JPEGImages"
+LIST_FOLDER="${PQR_ROOT}/ImageSets"
+
+echo ${IMAGE_FOLDER}
+
+echo "Converting PQR dataset..."
+python3.7 ./build_new_pqr_data.py \
+# python3.7 ./build_data.py \
+ --image_folder="${IMAGE_FOLDER}" \
+ --semantic_segmentation_folder="${SEMANTIC_SEG_FOLDER}" \
+ --list_folder="${LIST_FOLDER}" \
+ --image_format="jpg" \
+ --output_dir="${OUTPUT_DIR}"
diff --git a/deeplab/models/research/deeplab/datasets/data_generator.py b/deeplab/models/research/deeplab/datasets/data_generator.py
new file mode 100644
index 0000000..6cc230a
--- /dev/null
+++ b/deeplab/models/research/deeplab/datasets/data_generator.py
@@ -0,0 +1,361 @@
+# Lint as: python2, python3
+# Copyright 2018 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Wrapper for providing semantic segmentaion data.
+
+The SegmentationDataset class provides both images and annotations (semantic
+segmentation and/or instance segmentation) for TensorFlow. Currently, we
+support the following datasets:
+
+1. PASCAL VOC 2012 (http://host.robots.ox.ac.uk/pascal/VOC/voc2012/).
+
+PASCAL VOC 2012 semantic segmentation dataset annotates 20 foreground objects
+(e.g., bike, person, and so on) and leaves all the other semantic classes as
+one background class. The dataset contains 1464, 1449, and 1456 annotated
+images for the training, validation and test respectively.
+
+2. Cityscapes dataset (https://www.cityscapes-dataset.com)
+
+The Cityscapes dataset contains 19 semantic labels (such as road, person, car,
+and so on) for urban street scenes.
+
+3. ADE20K dataset (http://groups.csail.mit.edu/vision/datasets/ADE20K)
+
+The ADE20K dataset contains 150 semantic labels both urban street scenes and
+indoor scenes.
+
+References:
+ M. Everingham, S. M. A. Eslami, L. V. Gool, C. K. I. Williams, J. Winn,
+ and A. Zisserman, The pascal visual object classes challenge a retrospective.
+ IJCV, 2014.
+
+ M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson,
+ U. Franke, S. Roth, and B. Schiele, "The cityscapes dataset for semantic urban
+ scene understanding," In Proc. of CVPR, 2016.
+
+ B. Zhou, H. Zhao, X. Puig, S. Fidler, A. Barriuso, A. Torralba, "Scene Parsing
+ through ADE20K dataset", In Proc. of CVPR, 2017.
+"""
+
+import collections
+import os
+import tensorflow as tf
+from deeplab import common
+from deeplab import input_preprocess
+
+# Named tuple to describe the dataset properties.
+DatasetDescriptor = collections.namedtuple(
+ 'DatasetDescriptor',
+ [
+ 'splits_to_sizes', # Splits of the dataset into training, val and test.
+ 'num_classes', # Number of semantic classes, including the
+ # background class (if exists). For example, there
+ # are 20 foreground classes + 1 background class in
+ # the PASCAL VOC 2012 dataset. Thus, we set
+ # num_classes=21.
+ 'ignore_label', # Ignore label value.
+ ])
+
+_CITYSCAPES_INFORMATION = DatasetDescriptor(
+ splits_to_sizes={'train_fine': 2975,
+ 'train_coarse': 22973,
+ 'trainval_fine': 3475,
+ 'trainval_coarse': 23473,
+ 'val_fine': 500,
+ 'test_fine': 1525},
+ num_classes=19,
+ ignore_label=255,
+)
+
+_PASCAL_VOC_SEG_INFORMATION = DatasetDescriptor(
+ splits_to_sizes={
+ 'train': 1464,
+ 'train_aug': 10582,
+ 'trainval': 2913,
+ 'val': 1449,
+ },
+ num_classes=21,
+ ignore_label=255,
+)
+
+_ADE20K_INFORMATION = DatasetDescriptor(
+ splits_to_sizes={
+ 'train': 20210, # num of samples in images/training
+ 'val': 2000, # num of samples in images/validation
+ },
+ num_classes=151,
+ ignore_label=0,
+)
+
+_PQR_SEG_INFORMATION = DatasetDescriptor(
+ splits_to_sizes={
+ 'train': 3, # number of file in the train folder
+ 'trainval': 5,
+ 'val': 2,
+ },
+ num_classes=2, # number of classes in your dataset
+ ignore_label=255, # white edges that will be ignored to be class
+)
+
+_DATASETS_INFORMATION = {
+ 'cityscapes': _CITYSCAPES_INFORMATION,
+ 'pascal_voc_seg': _PASCAL_VOC_SEG_INFORMATION,
+ 'ade20k': _ADE20K_INFORMATION,
+ 'pqr': _PQR_SEG_INFORMATION
+}
+
+# Default file pattern of TFRecord of TensorFlow Example.
+_FILE_PATTERN = '%s-*'
+
+
+def get_cityscapes_dataset_name():
+ return 'cityscapes'
+
+
+class Dataset(object):
+ """Represents input dataset for deeplab model."""
+
+ def __init__(self,
+ dataset_name,
+ split_name,
+ dataset_dir,
+ batch_size,
+ crop_size,
+ min_resize_value=None,
+ max_resize_value=None,
+ resize_factor=None,
+ min_scale_factor=1.,
+ max_scale_factor=1.,
+ scale_factor_step_size=0,
+ model_variant=None,
+ num_readers=1,
+ is_training=False,
+ should_shuffle=False,
+ should_repeat=False):
+ """Initializes the dataset.
+
+ Args:
+ dataset_name: Dataset name.
+ split_name: A train/val Split name.
+ dataset_dir: The directory of the dataset sources.
+ batch_size: Batch size.
+ crop_size: The size used to crop the image and label.
+ min_resize_value: Desired size of the smaller image side.
+ max_resize_value: Maximum allowed size of the larger image side.
+ resize_factor: Resized dimensions are multiple of factor plus one.
+ min_scale_factor: Minimum scale factor value.
+ max_scale_factor: Maximum scale factor value.
+ scale_factor_step_size: The step size from min scale factor to max scale
+ factor. The input is randomly scaled based on the value of
+ (min_scale_factor, max_scale_factor, scale_factor_step_size).
+ model_variant: Model variant (string) for choosing how to mean-subtract
+ the images. See feature_extractor.network_map for supported model
+ variants.
+ num_readers: Number of readers for data provider.
+ is_training: Boolean, if dataset is for training or not.
+ should_shuffle: Boolean, if should shuffle the input data.
+ should_repeat: Boolean, if should repeat the input data.
+
+ Raises:
+ ValueError: Dataset name and split name are not supported.
+ """
+ if dataset_name not in _DATASETS_INFORMATION:
+ raise ValueError('The specified dataset is not supported yet.')
+ self.dataset_name = dataset_name
+
+ splits_to_sizes = _DATASETS_INFORMATION[dataset_name].splits_to_sizes
+
+ if split_name not in splits_to_sizes:
+ raise ValueError('data split name %s not recognized' % split_name)
+
+ if model_variant is None:
+ tf.logging.warning('Please specify a model_variant. See '
+ 'feature_extractor.network_map for supported model '
+ 'variants.')
+
+ self.split_name = split_name
+ self.dataset_dir = dataset_dir
+ self.batch_size = batch_size
+ self.crop_size = crop_size
+ self.min_resize_value = min_resize_value
+ self.max_resize_value = max_resize_value
+ self.resize_factor = resize_factor
+ self.min_scale_factor = min_scale_factor
+ self.max_scale_factor = max_scale_factor
+ self.scale_factor_step_size = scale_factor_step_size
+ self.model_variant = model_variant
+ self.num_readers = num_readers
+ self.is_training = is_training
+ self.should_shuffle = should_shuffle
+ self.should_repeat = should_repeat
+
+ self.num_of_classes = _DATASETS_INFORMATION[self.dataset_name].num_classes
+ self.ignore_label = _DATASETS_INFORMATION[self.dataset_name].ignore_label
+
+ def _parse_function(self, example_proto):
+ """Function to parse the example proto.
+
+ Args:
+ example_proto: Proto in the format of tf.Example.
+
+ Returns:
+ A dictionary with parsed image, label, height, width and image name.
+
+ Raises:
+ ValueError: Label is of wrong shape.
+ """
+
+ # Currently only supports jpeg and png.
+ # Need to use this logic because the shape is not known for
+ # tf.image.decode_image and we rely on this info to
+ # extend label if necessary.
+ def _decode_image(content, channels):
+ return tf.cond(
+ tf.image.is_jpeg(content),
+ lambda: tf.image.decode_jpeg(content, channels),
+ lambda: tf.image.decode_png(content, channels))
+
+ features = {
+ 'image/encoded':
+ tf.FixedLenFeature((), tf.string, default_value=''),
+ 'image/filename':
+ tf.FixedLenFeature((), tf.string, default_value=''),
+ 'image/format':
+ tf.FixedLenFeature((), tf.string, default_value='jpeg'),
+ 'image/height':
+ tf.FixedLenFeature((), tf.int64, default_value=0),
+ 'image/width':
+ tf.FixedLenFeature((), tf.int64, default_value=0),
+ 'image/segmentation/class/encoded':
+ tf.FixedLenFeature((), tf.string, default_value=''),
+ 'image/segmentation/class/format':
+ tf.FixedLenFeature((), tf.string, default_value='png'),
+ }
+
+ parsed_features = tf.parse_single_example(example_proto, features)
+
+ image = _decode_image(parsed_features['image/encoded'], channels=3)
+
+ label = None
+ if self.split_name != common.TEST_SET:
+ label = _decode_image(
+ parsed_features['image/segmentation/class/encoded'], channels=1)
+
+ image_name = parsed_features['image/filename']
+ if image_name is None:
+ image_name = tf.constant('')
+
+ sample = {
+ common.IMAGE: image,
+ common.IMAGE_NAME: image_name,
+ common.HEIGHT: parsed_features['image/height'],
+ common.WIDTH: parsed_features['image/width'],
+ }
+
+ if label is not None:
+ if label.get_shape().ndims == 2:
+ label = tf.expand_dims(label, 2)
+ elif label.get_shape().ndims == 3 and label.shape.dims[2] == 1:
+ pass
+ else:
+ raise ValueError('Input label shape must be [height, width], or '
+ '[height, width, 1].')
+
+ label.set_shape([None, None, 1])
+
+ sample[common.LABELS_CLASS] = label
+
+ return sample
+
+ def _preprocess_image(self, sample):
+ """Preprocesses the image and label.
+
+ Args:
+ sample: A sample containing image and label.
+
+ Returns:
+ sample: Sample with preprocessed image and label.
+
+ Raises:
+ ValueError: Ground truth label not provided during training.
+ """
+ image = sample[common.IMAGE]
+ label = sample[common.LABELS_CLASS]
+
+ original_image, image, label = input_preprocess.preprocess_image_and_label(
+ image=image,
+ label=label,
+ crop_height=self.crop_size[0],
+ crop_width=self.crop_size[1],
+ min_resize_value=self.min_resize_value,
+ max_resize_value=self.max_resize_value,
+ resize_factor=self.resize_factor,
+ min_scale_factor=self.min_scale_factor,
+ max_scale_factor=self.max_scale_factor,
+ scale_factor_step_size=self.scale_factor_step_size,
+ ignore_label=self.ignore_label,
+ is_training=self.is_training,
+ model_variant=self.model_variant)
+
+ sample[common.IMAGE] = image
+
+ if not self.is_training:
+ # Original image is only used during visualization.
+ sample[common.ORIGINAL_IMAGE] = original_image
+
+ if label is not None:
+ sample[common.LABEL] = label
+
+ # Remove common.LABEL_CLASS key in the sample since it is only used to
+ # derive label and not used in training and evaluation.
+ sample.pop(common.LABELS_CLASS, None)
+
+ return sample
+
+ def get_one_shot_iterator(self):
+ """Gets an iterator that iterates across the dataset once.
+
+ Returns:
+ An iterator of type tf.data.Iterator.
+ """
+
+ files = self._get_all_files()
+
+ dataset = (
+ tf.data.TFRecordDataset(files, num_parallel_reads=self.num_readers)
+ .map(self._parse_function, num_parallel_calls=self.num_readers)
+ .map(self._preprocess_image, num_parallel_calls=self.num_readers))
+
+ if self.should_shuffle:
+ dataset = dataset.shuffle(buffer_size=100)
+
+ if self.should_repeat:
+ dataset = dataset.repeat() # Repeat forever for training.
+ else:
+ dataset = dataset.repeat(1)
+
+ dataset = dataset.batch(self.batch_size).prefetch(self.batch_size)
+ return dataset.make_one_shot_iterator()
+
+ def _get_all_files(self):
+ """Gets all the files to read data from.
+
+ Returns:
+ A list of input files.
+ """
+ file_pattern = _FILE_PATTERN
+ file_pattern = os.path.join(self.dataset_dir,
+ file_pattern % self.split_name)
+ return tf.gfile.Glob(file_pattern)
diff --git a/deeplab/models/research/deeplab/datasets/data_generator_test.py b/deeplab/models/research/deeplab/datasets/data_generator_test.py
new file mode 100644
index 0000000..f4425d0
--- /dev/null
+++ b/deeplab/models/research/deeplab/datasets/data_generator_test.py
@@ -0,0 +1,115 @@
+# Lint as: python2, python3
+# Copyright 2018 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Tests for deeplab.datasets.data_generator."""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import collections
+
+from six.moves import range
+import tensorflow as tf
+
+from deeplab import common
+from deeplab.datasets import data_generator
+
+ImageAttributes = collections.namedtuple(
+ 'ImageAttributes', ['image', 'label', 'height', 'width', 'image_name'])
+
+
+class DatasetTest(tf.test.TestCase):
+
+ # Note: training dataset cannot be tested since there is shuffle operation.
+ # When disabling the shuffle, training dataset is operated same as validation
+ # dataset. Therefore it is not tested again.
+ def testPascalVocSegTestData(self):
+ dataset = data_generator.Dataset(
+ dataset_name='pascal_voc_seg',
+ split_name='val',
+ dataset_dir=
+ 'deeplab/testing/pascal_voc_seg',
+ batch_size=1,
+ crop_size=[3, 3], # Use small size for testing.
+ min_resize_value=3,
+ max_resize_value=3,
+ resize_factor=None,
+ min_scale_factor=0.01,
+ max_scale_factor=2.0,
+ scale_factor_step_size=0.25,
+ is_training=False,
+ model_variant='mobilenet_v2')
+
+ self.assertAllEqual(dataset.num_of_classes, 21)
+ self.assertAllEqual(dataset.ignore_label, 255)
+
+ num_of_images = 3
+ with self.test_session() as sess:
+ iterator = dataset.get_one_shot_iterator()
+
+ for i in range(num_of_images):
+ batch = iterator.get_next()
+ batch, = sess.run([batch])
+ image_attributes = _get_attributes_of_image(i)
+ self.assertEqual(batch[common.HEIGHT][0], image_attributes.height)
+ self.assertEqual(batch[common.WIDTH][0], image_attributes.width)
+ self.assertEqual(batch[common.IMAGE_NAME][0],
+ image_attributes.image_name.encode())
+
+ # All data have been read.
+ with self.assertRaisesRegexp(tf.errors.OutOfRangeError, ''):
+ sess.run([iterator.get_next()])
+
+
+def _get_attributes_of_image(index):
+ """Gets the attributes of the image.
+
+ Args:
+ index: Index of image in all images.
+
+ Returns:
+ Attributes of the image in the format of ImageAttributes.
+
+ Raises:
+ ValueError: If index is of wrong value.
+ """
+ if index == 0:
+ return ImageAttributes(
+ image=None,
+ label=None,
+ height=366,
+ width=500,
+ image_name='2007_000033')
+ elif index == 1:
+ return ImageAttributes(
+ image=None,
+ label=None,
+ height=335,
+ width=500,
+ image_name='2007_000042')
+ elif index == 2:
+ return ImageAttributes(
+ image=None,
+ label=None,
+ height=333,
+ width=500,
+ image_name='2007_000061')
+ else:
+ raise ValueError('Index can only be 0, 1 or 2.')
+
+
+if __name__ == '__main__':
+ tf.test.main()
diff --git a/deeplab/models/research/deeplab/datasets/download_and_convert_ade20k.sh b/deeplab/models/research/deeplab/datasets/download_and_convert_ade20k.sh
new file mode 100644
index 0000000..3614ae4
--- /dev/null
+++ b/deeplab/models/research/deeplab/datasets/download_and_convert_ade20k.sh
@@ -0,0 +1,80 @@
+#!/bin/bash
+# Copyright 2018 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+#
+# Script to download and preprocess the ADE20K dataset.
+#
+# Usage:
+# bash ./download_and_convert_ade20k.sh
+#
+# The folder structure is assumed to be:
+# + datasets
+# - build_data.py
+# - build_ade20k_data.py
+# - download_and_convert_ade20k.sh
+# + ADE20K
+# + tfrecord
+# + ADEChallengeData2016
+# + annotations
+# + training
+# + validation
+# + images
+# + training
+# + validation
+
+# Exit immediately if a command exits with a non-zero status.
+set -e
+
+CURRENT_DIR=$(pwd)
+WORK_DIR="./ADE20K"
+mkdir -p "${WORK_DIR}"
+cd "${WORK_DIR}"
+
+# Helper function to download and unpack ADE20K dataset.
+download_and_uncompress() {
+ local BASE_URL=${1}
+ local FILENAME=${2}
+
+ if [ ! -f "${FILENAME}" ]; then
+ echo "Downloading ${FILENAME} to ${WORK_DIR}"
+ wget -nd -c "${BASE_URL}/${FILENAME}"
+ fi
+ echo "Uncompressing ${FILENAME}"
+ unzip "${FILENAME}"
+}
+
+# Download the images.
+BASE_URL="http://data.csail.mit.edu/places/ADEchallenge"
+FILENAME="ADEChallengeData2016.zip"
+
+download_and_uncompress "${BASE_URL}" "${FILENAME}"
+
+cd "${CURRENT_DIR}"
+
+# Root path for ADE20K dataset.
+ADE20K_ROOT="${WORK_DIR}/ADEChallengeData2016"
+
+# Build TFRecords of the dataset.
+# First, create output directory for storing TFRecords.
+OUTPUT_DIR="${WORK_DIR}/tfrecord"
+mkdir -p "${OUTPUT_DIR}"
+
+echo "Converting ADE20K dataset..."
+python ./build_ade20k_data.py \
+ --train_image_folder="${ADE20K_ROOT}/images/training/" \
+ --train_image_label_folder="${ADE20K_ROOT}/annotations/training/" \
+ --val_image_folder="${ADE20K_ROOT}/images/validation/" \
+ --val_image_label_folder="${ADE20K_ROOT}/annotations/validation/" \
+ --output_dir="${OUTPUT_DIR}"
diff --git a/deeplab/models/research/deeplab/datasets/download_and_convert_voc2012.sh b/deeplab/models/research/deeplab/datasets/download_and_convert_voc2012.sh
new file mode 100644
index 0000000..607c654
--- /dev/null
+++ b/deeplab/models/research/deeplab/datasets/download_and_convert_voc2012.sh
@@ -0,0 +1,94 @@
+#!/bin/bash
+# Copyright 2018 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+#
+# Script to download and preprocess the PASCAL VOC 2012 dataset.
+#
+# Usage:
+# bash ./download_and_convert_voc2012.sh
+#
+# The folder structure is assumed to be:
+# + datasets
+# - build_data.py
+# - build_voc2012_data.py
+# - download_and_convert_voc2012.sh
+# - remove_gt_colormap.py
+# + pascal_voc_seg
+# + VOCdevkit
+# + VOC2012
+# + JPEGImages
+# + SegmentationClass
+#
+
+# Exit immediately if a command exits with a non-zero status.
+set -e
+
+CURRENT_DIR=$(pwd)
+WORK_DIR="./pascal_voc_seg"
+SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
+mkdir -p "${WORK_DIR}"
+cd "${WORK_DIR}"
+
+# Helper function to download and unpack VOC 2012 dataset.
+download_and_uncompress() {
+ local BASE_URL=${1}
+ local FILENAME=${2}
+
+ if [ ! -f "${FILENAME}" ]; then
+ echo "Downloading ${FILENAME} to ${WORK_DIR}"
+ wget -nd -c "${BASE_URL}/${FILENAME}"
+ fi
+ echo "Uncompressing ${FILENAME}"
+ sudo apt install unzip
+ unzip "${FILENAME}"
+}
+
+# Download the images.
+BASE_URL="https://data.deepai.org/"
+FILENAME="PascalVOC2012.zip"
+
+# download_and_uncompress "${BASE_URL}" "${FILENAME}"
+# wget "${BASE_URL}" "${FILENAME}"
+
+cd "${CURRENT_DIR}"
+
+# Root path for PASCAL VOC 2012 dataset.
+PASCAL_ROOT="${WORK_DIR}/VOC2012"
+
+# Remove the colormap in the ground truth annotations.
+SEG_FOLDER="${PASCAL_ROOT}/SegmentationClass"
+SEMANTIC_SEG_FOLDER="${PASCAL_ROOT}/SegmentationClassRaw"
+
+echo "Removing the color map in ground truth annotations..."
+python3 "${SCRIPT_DIR}/remove_gt_colormap.py" \
+ --original_gt_folder="${SEG_FOLDER}" \
+ --output_dir="${SEMANTIC_SEG_FOLDER}"
+
+# Build TFRecords of the dataset.
+# First, create output directory for storing TFRecords.
+OUTPUT_DIR="${WORK_DIR}/tfrecord"
+mkdir -p "${OUTPUT_DIR}"
+
+IMAGE_FOLDER="${PASCAL_ROOT}/JPEGImages"
+LIST_FOLDER="${PASCAL_ROOT}/ImageSets/Segmentation"
+echo ${IMAGE_FOLDER}
+
+echo "Converting PASCAL VOC 2012 dataset..."
+python3 "${SCRIPT_DIR}/build_voc2012_data.py" \
+ --image_folder="${IMAGE_FOLDER}" \
+ --semantic_segmentation_folder="${SEMANTIC_SEG_FOLDER}" \
+ --list_folder="${LIST_FOLDER}" \
+ --image_format="jpg" \
+ --output_dir="${OUTPUT_DIR}"
diff --git a/deeplab/models/research/deeplab/datasets/label_pqr.py b/deeplab/models/research/deeplab/datasets/label_pqr.py
new file mode 100644
index 0000000..ec90fc6
--- /dev/null
+++ b/deeplab/models/research/deeplab/datasets/label_pqr.py
@@ -0,0 +1,38 @@
+import tensorflow as tf
+from PIL import Image
+from tqdm import tqdm
+import numpy as np
+
+import os, shutil
+
+# palette (color map) describes the (R, G, B): Label pair
+palette = {(0, 0, 0) : 0 ,
+ (0, 0, 255) : 1}
+
+def convert_from_color_segmentation(arr_3d):
+ arr_2d = np.zeros((arr_3d.shape[0], arr_3d.shape[1]), dtype=np.uint8)
+
+ for c, i in palette.items():
+ m = np.all(arr_3d == np.array(c).reshape(1, 1, 3), axis=2)
+ arr_2d[m] = i
+ return arr_2d
+
+
+label_dir = './PQR/dataset/SegmentationClass/'
+new_label_dir = './PQR/dataset/SegmentationClassRaw/'
+
+if not os.path.isdir(new_label_dir):
+ print("creating folder: ",new_label_dir)
+ os.mkdir(new_label_dir)
+else:
+ print("Folder alread exists. Delete the folder and re-run the code!!!")
+
+
+label_files = os.listdir(label_dir)
+
+for l_f in tqdm(label_files):
+ arr = np.array(Image.open(label_dir + l_f))
+ arr = arr[:,:,0:3]
+ arr_2d = convert_from_color_segmentation(arr)
+ l_f = l_f[:-4] + '.png'
+ Image.fromarray(arr_2d).save(new_label_dir + l_f)
diff --git a/deeplab/models/research/deeplab/datasets/remove_gt_colormap.py b/deeplab/models/research/deeplab/datasets/remove_gt_colormap.py
new file mode 100644
index 0000000..9005700
--- /dev/null
+++ b/deeplab/models/research/deeplab/datasets/remove_gt_colormap.py
@@ -0,0 +1,83 @@
+# Copyright 2018 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+
+"""Removes the color map from segmentation annotations.
+
+Removes the color map from the ground truth segmentation annotations and save
+the results to output_dir.
+"""
+import glob
+import os.path
+import numpy as np
+
+from PIL import Image
+
+import tensorflow as tf
+
+FLAGS = tf.compat.v1.flags.FLAGS
+
+tf.compat.v1.flags.DEFINE_string('original_gt_folder',
+ './VOCdevkit/VOC2012/SegmentationClass',
+ 'Original ground truth annotations.')
+
+tf.compat.v1.flags.DEFINE_string('segmentation_format', 'png', 'Segmentation format.')
+
+tf.compat.v1.flags.DEFINE_string('output_dir',
+ './VOCdevkit/VOC2012/SegmentationClassRaw',
+ 'folder to save modified ground truth annotations.')
+
+
+def _remove_colormap(filename):
+ """Removes the color map from the annotation.
+
+ Args:
+ filename: Ground truth annotation filename.
+
+ Returns:
+ Annotation without color map.
+ """
+ return np.array(Image.open(filename))
+
+
+def _save_annotation(annotation, filename):
+ """Saves the annotation as png file.
+
+ Args:
+ annotation: Segmentation annotation.
+ filename: Output filename.
+ """
+ pil_image = Image.fromarray(annotation.astype(dtype=np.uint8))
+ with tf.io.gfile.GFile(filename, mode='w') as f:
+ pil_image.save(f, 'PNG')
+
+
+def main(unused_argv):
+ # Create the output directory if not exists.
+ if not tf.io.gfile.isdir(FLAGS.output_dir):
+ tf.io.gfile.makedirs(FLAGS.output_dir)
+
+ annotations = glob.glob(os.path.join(FLAGS.original_gt_folder,
+ '*.' + FLAGS.segmentation_format))
+ for annotation in annotations:
+ raw_annotation = _remove_colormap(annotation)
+ filename = os.path.basename(annotation)[:-4]
+ _save_annotation(raw_annotation,
+ os.path.join(
+ FLAGS.output_dir,
+ filename + '.' + FLAGS.segmentation_format))
+
+
+if __name__ == '__main__':
+ tf.compat.v1.app.run()
diff --git a/deeplab/models/research/deeplab/datasets/test.py b/deeplab/models/research/deeplab/datasets/test.py
new file mode 100644
index 0000000..108b864
--- /dev/null
+++ b/deeplab/models/research/deeplab/datasets/test.py
@@ -0,0 +1,19 @@
+python3.7 ./build_new_pqr_data.py --image_folder="/Users/mandywoo/Documents/UAV-Forge/image-proc_2020-21/models/research/deeplab/datasets/PQR/dataset/JPEGImages" --semantic_segmentation_folder="/Users/mandywoo/Documents/UAV-Forge/image-proc_2020-21/models/research/deeplab/datasets/PQR/dataset/SegmentationClassRaw" --list_folder="/Users/mandywoo/Documents/UAV-Forge/image-proc_2020-21/models/research/deeplab/datasets/PQR/dataset/ImageSets" --image_format="jpg" --output_dir="/Users/mandywoo/Documents/UAV-Forge/image-proc_2020-21/models/research/deeplab/datasets/PQR/tfrecord"
+
+
+python3.7 ./export_model.py \
+ --logtostderr \
+ --checkpoint_path="/Users/mandywoo/Documents/UAV-Forge/image-proc_2020-21/models/research/deeplab/datasets/PQR/exp/train_on_trainval_set/train/model.ckpt-5" \
+ --export_path="/Users/mandywoo/Documents/UAV-Forge/image-proc_2020-21/models/research/deeplab/datasets/PQR/exp/train_on_trainval_set/export/frozen_inference_graph.pb" \
+ --model_variant="xception_65" \
+ --atrous_rates=6 \
+ --atrous_rates=12 \
+ --atrous_rates=18 \
+ --output_stride=16 \
+ --decoder_output_stride=4 \
+ --num_classes=2 \
+ --crop_size=448 \
+ --crop_size=448 \
+ --inference_scales=1.0
+
+python deeplab/export_model.py --checkpoint_path=/code/models/research/deeplab/weights_input_level_17/model.ckpt-22000 --export_path=/code/models/research/deeplab/frozen_weights_level_17/frozen_inference_graph.pb --model_variant="xception_65" --atrous_rates=6 --atrous_rates=12 --atrous_rates=18 --output_stride=16 --crop_size=2048 --crop_size=2048 --num_classes=3
\ No newline at end of file
diff --git a/deeplab/models/research/deeplab/deeplab_demo.ipynb b/deeplab/models/research/deeplab/deeplab_demo.ipynb
new file mode 100644
index 0000000..81ccfde
--- /dev/null
+++ b/deeplab/models/research/deeplab/deeplab_demo.ipynb
@@ -0,0 +1,369 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "colab_type": "text",
+ "id": "KFPcBuVFw61h"
+ },
+ "source": [
+ "# Overview\n",
+ "\n",
+ "This colab demonstrates the steps to use the DeepLab model to perform semantic segmentation on a sample input image. Expected outputs are semantic labels overlayed on the sample image.\n",
+ "\n",
+ "### About DeepLab\n",
+ "The models used in this colab perform semantic segmentation. Semantic segmentation models focus on assigning semantic labels, such as sky, person, or car, to multiple objects and stuff in a single image."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "colab_type": "text",
+ "id": "t3ozFsEEP-u_"
+ },
+ "source": [
+ "# Instructions\n",
+ "\u003ch3\u003e\u003ca href=\"https://cloud.google.com/tpu/\"\u003e\u003cimg valign=\"middle\" src=\"https://raw.githubusercontent.com/GoogleCloudPlatform/tensorflow-without-a-phd/master/tensorflow-rl-pong/images/tpu-hexagon.png\" width=\"50\"\u003e\u003c/a\u003e \u0026nbsp;\u0026nbsp;Use a free TPU device\u003c/h3\u003e\n",
+ "\n",
+ " 1. On the main menu, click Runtime and select **Change runtime type**. Set \"TPU\" as the hardware accelerator.\n",
+ " 1. Click Runtime again and select **Runtime \u003e Run All**. You can also run the cells manually with Shift-ENTER."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "colab_type": "text",
+ "id": "7cRiapZ1P3wy"
+ },
+ "source": [
+ "## Import Libraries"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 0,
+ "metadata": {
+ "cellView": "code",
+ "colab": {},
+ "colab_type": "code",
+ "id": "kAbdmRmvq0Je"
+ },
+ "outputs": [],
+ "source": [
+ "import os\n",
+ "from io import BytesIO\n",
+ "import tarfile\n",
+ "import tempfile\n",
+ "from six.moves import urllib\n",
+ "\n",
+ "from matplotlib import gridspec\n",
+ "from matplotlib import pyplot as plt\n",
+ "import numpy as np\n",
+ "from PIL import Image\n",
+ "\n",
+ "%tensorflow_version 1.x\n",
+ "import tensorflow as tf"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "colab_type": "text",
+ "id": "p47cYGGOQE1W"
+ },
+ "source": [
+ "## Import helper methods\n",
+ "These methods help us perform the following tasks:\n",
+ "* Load the latest version of the pretrained DeepLab model\n",
+ "* Load the colormap from the PASCAL VOC dataset\n",
+ "* Adds colors to various labels, such as \"pink\" for people, \"green\" for bicycle and more\n",
+ "* Visualize an image, and add an overlay of colors on various regions"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 0,
+ "metadata": {
+ "cellView": "code",
+ "colab": {},
+ "colab_type": "code",
+ "id": "vN0kU6NJ1Ye5"
+ },
+ "outputs": [],
+ "source": [
+ "class DeepLabModel(object):\n",
+ " \"\"\"Class to load deeplab model and run inference.\"\"\"\n",
+ "\n",
+ " INPUT_TENSOR_NAME = 'ImageTensor:0'\n",
+ " OUTPUT_TENSOR_NAME = 'SemanticPredictions:0'\n",
+ " INPUT_SIZE = 513\n",
+ " FROZEN_GRAPH_NAME = 'frozen_inference_graph'\n",
+ "\n",
+ " def __init__(self, tarball_path):\n",
+ " \"\"\"Creates and loads pretrained deeplab model.\"\"\"\n",
+ " self.graph = tf.Graph()\n",
+ "\n",
+ " graph_def = None\n",
+ " # Extract frozen graph from tar archive.\n",
+ " tar_file = tarfile.open(tarball_path)\n",
+ " for tar_info in tar_file.getmembers():\n",
+ " if self.FROZEN_GRAPH_NAME in os.path.basename(tar_info.name):\n",
+ " file_handle = tar_file.extractfile(tar_info)\n",
+ " graph_def = tf.GraphDef.FromString(file_handle.read())\n",
+ " break\n",
+ "\n",
+ " tar_file.close()\n",
+ "\n",
+ " if graph_def is None:\n",
+ " raise RuntimeError('Cannot find inference graph in tar archive.')\n",
+ "\n",
+ " with self.graph.as_default():\n",
+ " tf.import_graph_def(graph_def, name='')\n",
+ "\n",
+ " self.sess = tf.Session(graph=self.graph)\n",
+ "\n",
+ " def run(self, image):\n",
+ " \"\"\"Runs inference on a single image.\n",
+ "\n",
+ " Args:\n",
+ " image: A PIL.Image object, raw input image.\n",
+ "\n",
+ " Returns:\n",
+ " resized_image: RGB image resized from original input image.\n",
+ " seg_map: Segmentation map of `resized_image`.\n",
+ " \"\"\"\n",
+ " width, height = image.size\n",
+ " resize_ratio = 1.0 * self.INPUT_SIZE / max(width, height)\n",
+ " target_size = (int(resize_ratio * width), int(resize_ratio * height))\n",
+ " resized_image = image.convert('RGB').resize(target_size, Image.ANTIALIAS)\n",
+ " batch_seg_map = self.sess.run(\n",
+ " self.OUTPUT_TENSOR_NAME,\n",
+ " feed_dict={self.INPUT_TENSOR_NAME: [np.asarray(resized_image)]})\n",
+ " seg_map = batch_seg_map[0]\n",
+ " return resized_image, seg_map\n",
+ "\n",
+ "\n",
+ "def create_pascal_label_colormap():\n",
+ " \"\"\"Creates a label colormap used in PASCAL VOC segmentation benchmark.\n",
+ "\n",
+ " Returns:\n",
+ " A Colormap for visualizing segmentation results.\n",
+ " \"\"\"\n",
+ " colormap = np.zeros((256, 3), dtype=int)\n",
+ " ind = np.arange(256, dtype=int)\n",
+ "\n",
+ " for shift in reversed(range(8)):\n",
+ " for channel in range(3):\n",
+ " colormap[:, channel] |= ((ind \u003e\u003e channel) \u0026 1) \u003c\u003c shift\n",
+ " ind \u003e\u003e= 3\n",
+ "\n",
+ " return colormap\n",
+ "\n",
+ "\n",
+ "def label_to_color_image(label):\n",
+ " \"\"\"Adds color defined by the dataset colormap to the label.\n",
+ "\n",
+ " Args:\n",
+ " label: A 2D array with integer type, storing the segmentation label.\n",
+ "\n",
+ " Returns:\n",
+ " result: A 2D array with floating type. The element of the array\n",
+ " is the color indexed by the corresponding element in the input label\n",
+ " to the PASCAL color map.\n",
+ "\n",
+ " Raises:\n",
+ " ValueError: If label is not of rank 2 or its value is larger than color\n",
+ " map maximum entry.\n",
+ " \"\"\"\n",
+ " if label.ndim != 2:\n",
+ " raise ValueError('Expect 2-D input label')\n",
+ "\n",
+ " colormap = create_pascal_label_colormap()\n",
+ "\n",
+ " if np.max(label) \u003e= len(colormap):\n",
+ " raise ValueError('label value too large.')\n",
+ "\n",
+ " return colormap[label]\n",
+ "\n",
+ "\n",
+ "def vis_segmentation(image, seg_map):\n",
+ " \"\"\"Visualizes input image, segmentation map and overlay view.\"\"\"\n",
+ " plt.figure(figsize=(15, 5))\n",
+ " grid_spec = gridspec.GridSpec(1, 4, width_ratios=[6, 6, 6, 1])\n",
+ "\n",
+ " plt.subplot(grid_spec[0])\n",
+ " plt.imshow(image)\n",
+ " plt.axis('off')\n",
+ " plt.title('input image')\n",
+ "\n",
+ " plt.subplot(grid_spec[1])\n",
+ " seg_image = label_to_color_image(seg_map).astype(np.uint8)\n",
+ " plt.imshow(seg_image)\n",
+ " plt.axis('off')\n",
+ " plt.title('segmentation map')\n",
+ "\n",
+ " plt.subplot(grid_spec[2])\n",
+ " plt.imshow(image)\n",
+ " plt.imshow(seg_image, alpha=0.7)\n",
+ " plt.axis('off')\n",
+ " plt.title('segmentation overlay')\n",
+ "\n",
+ " unique_labels = np.unique(seg_map)\n",
+ " ax = plt.subplot(grid_spec[3])\n",
+ " plt.imshow(\n",
+ " FULL_COLOR_MAP[unique_labels].astype(np.uint8), interpolation='nearest')\n",
+ " ax.yaxis.tick_right()\n",
+ " plt.yticks(range(len(unique_labels)), LABEL_NAMES[unique_labels])\n",
+ " plt.xticks([], [])\n",
+ " ax.tick_params(width=0.0)\n",
+ " plt.grid('off')\n",
+ " plt.show()\n",
+ "\n",
+ "\n",
+ "LABEL_NAMES = np.asarray([\n",
+ " 'background', 'aeroplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus',\n",
+ " 'car', 'cat', 'chair', 'cow', 'diningtable', 'dog', 'horse', 'motorbike',\n",
+ " 'person', 'pottedplant', 'sheep', 'sofa', 'train', 'tv'\n",
+ "])\n",
+ "\n",
+ "FULL_LABEL_MAP = np.arange(len(LABEL_NAMES)).reshape(len(LABEL_NAMES), 1)\n",
+ "FULL_COLOR_MAP = label_to_color_image(FULL_LABEL_MAP)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "colab_type": "text",
+ "id": "nGcZzNkASG9A"
+ },
+ "source": [
+ "## Select a pretrained model\n",
+ "We have trained the DeepLab model using various backbone networks. Select one from the MODEL_NAME list."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 0,
+ "metadata": {
+ "colab": {},
+ "colab_type": "code",
+ "id": "c4oXKmnjw6i_"
+ },
+ "outputs": [],
+ "source": [
+ "MODEL_NAME = 'mobilenetv2_coco_voctrainaug' # @param ['mobilenetv2_coco_voctrainaug', 'mobilenetv2_coco_voctrainval', 'xception_coco_voctrainaug', 'xception_coco_voctrainval']\n",
+ "\n",
+ "_DOWNLOAD_URL_PREFIX = 'http://download.tensorflow.org/models/'\n",
+ "_MODEL_URLS = {\n",
+ " 'mobilenetv2_coco_voctrainaug':\n",
+ " 'deeplabv3_mnv2_pascal_train_aug_2018_01_29.tar.gz',\n",
+ " 'mobilenetv2_coco_voctrainval':\n",
+ " 'deeplabv3_mnv2_pascal_trainval_2018_01_29.tar.gz',\n",
+ " 'xception_coco_voctrainaug':\n",
+ " 'deeplabv3_pascal_train_aug_2018_01_04.tar.gz',\n",
+ " 'xception_coco_voctrainval':\n",
+ " 'deeplabv3_pascal_trainval_2018_01_04.tar.gz',\n",
+ "}\n",
+ "_TARBALL_NAME = 'deeplab_model.tar.gz'\n",
+ "\n",
+ "model_dir = tempfile.mkdtemp()\n",
+ "tf.gfile.MakeDirs(model_dir)\n",
+ "\n",
+ "download_path = os.path.join(model_dir, _TARBALL_NAME)\n",
+ "print('downloading model, this might take a while...')\n",
+ "urllib.request.urlretrieve(_DOWNLOAD_URL_PREFIX + _MODEL_URLS[MODEL_NAME],\n",
+ " download_path)\n",
+ "print('download completed! loading DeepLab model...')\n",
+ "\n",
+ "MODEL = DeepLabModel(download_path)\n",
+ "print('model loaded successfully!')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "colab_type": "text",
+ "id": "SZst78N-4OKO"
+ },
+ "source": [
+ "## Run on sample images\n",
+ "\n",
+ "Select one of sample images (leave `IMAGE_URL` empty) or feed any internet image\n",
+ "url for inference.\n",
+ "\n",
+ "Note that this colab uses single scale inference for fast computation,\n",
+ "so the results may slightly differ from the visualizations in the\n",
+ "[README](https://github.com/tensorflow/models/blob/master/research/deeplab/README.md) file,\n",
+ "which uses multi-scale and left-right flipped inputs."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 0,
+ "metadata": {
+ "cellView": "form",
+ "colab": {},
+ "colab_type": "code",
+ "id": "edGukUHXyymr"
+ },
+ "outputs": [],
+ "source": [
+ "\n",
+ "SAMPLE_IMAGE = 'image1' # @param ['image1', 'image2', 'image3']\n",
+ "IMAGE_URL = '' #@param {type:\"string\"}\n",
+ "\n",
+ "_SAMPLE_URL = ('https://github.com/tensorflow/models/blob/master/research/'\n",
+ " 'deeplab/g3doc/img/%s.jpg?raw=true')\n",
+ "\n",
+ "\n",
+ "def run_visualization(url):\n",
+ " \"\"\"Inferences DeepLab model and visualizes result.\"\"\"\n",
+ " try:\n",
+ " f = urllib.request.urlopen(url)\n",
+ " jpeg_str = f.read()\n",
+ " original_im = Image.open(BytesIO(jpeg_str))\n",
+ " except IOError:\n",
+ " print('Cannot retrieve image. Please check url: ' + url)\n",
+ " return\n",
+ "\n",
+ " print('running deeplab on image %s...' % url)\n",
+ " resized_im, seg_map = MODEL.run(original_im)\n",
+ "\n",
+ " vis_segmentation(resized_im, seg_map)\n",
+ "\n",
+ "\n",
+ "image_url = IMAGE_URL or _SAMPLE_URL % SAMPLE_IMAGE\n",
+ "run_visualization(image_url)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "colab_type": "text",
+ "id": "aUbVoHScTJYe"
+ },
+ "source": [
+ "## What's next\n",
+ "\n",
+ "* Learn about [Cloud TPUs](https://cloud.google.com/tpu/docs) that Google designed and optimized specifically to speed up and scale up ML workloads for training and inference and to enable ML engineers and researchers to iterate more quickly.\n",
+ "* Explore the range of [Cloud TPU tutorials and Colabs](https://cloud.google.com/tpu/docs/tutorials) to find other examples that can be used when implementing your ML project.\n",
+ "* For more information on running the DeepLab model on Cloud TPUs, see the [DeepLab tutorial](https://cloud.google.com/tpu/docs/tutorials/deeplab).\n"
+ ]
+ }
+ ],
+ "metadata": {
+ "colab": {
+ "collapsed_sections": [],
+ "name": "DeepLab Demo.ipynb",
+ "provenance": [],
+ "toc_visible": true,
+ "version": "0.3.2"
+ },
+ "kernelspec": {
+ "display_name": "Python 3",
+ "name": "python3"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}
diff --git a/deeplab/models/research/deeplab/deprecated/__init__.py b/deeplab/models/research/deeplab/deprecated/__init__.py
new file mode 100644
index 0000000..e69de29
diff --git a/deeplab/models/research/deeplab/deprecated/segmentation_dataset.py b/deeplab/models/research/deeplab/deprecated/segmentation_dataset.py
new file mode 100644
index 0000000..4a1de09
--- /dev/null
+++ b/deeplab/models/research/deeplab/deprecated/segmentation_dataset.py
@@ -0,0 +1,210 @@
+# Lint as: python2, python3
+# Copyright 2018 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+
+"""Provides data from semantic segmentation datasets.
+
+The SegmentationDataset class provides both images and annotations (semantic
+segmentation and/or instance segmentation) for TensorFlow. Currently, we
+support the following datasets:
+
+1. PASCAL VOC 2012 (http://host.robots.ox.ac.uk/pascal/VOC/voc2012/).
+
+PASCAL VOC 2012 semantic segmentation dataset annotates 20 foreground objects
+(e.g., bike, person, and so on) and leaves all the other semantic classes as
+one background class. The dataset contains 1464, 1449, and 1456 annotated
+images for the training, validation and test respectively.
+
+2. Cityscapes dataset (https://www.cityscapes-dataset.com)
+
+The Cityscapes dataset contains 19 semantic labels (such as road, person, car,
+and so on) for urban street scenes.
+
+3. ADE20K dataset (http://groups.csail.mit.edu/vision/datasets/ADE20K)
+
+The ADE20K dataset contains 150 semantic labels both urban street scenes and
+indoor scenes.
+
+References:
+ M. Everingham, S. M. A. Eslami, L. V. Gool, C. K. I. Williams, J. Winn,
+ and A. Zisserman, The pascal visual object classes challenge a retrospective.
+ IJCV, 2014.
+
+ M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson,
+ U. Franke, S. Roth, and B. Schiele, "The cityscapes dataset for semantic urban
+ scene understanding," In Proc. of CVPR, 2016.
+
+ B. Zhou, H. Zhao, X. Puig, S. Fidler, A. Barriuso, A. Torralba, "Scene Parsing
+ through ADE20K dataset", In Proc. of CVPR, 2017.
+"""
+import collections
+import os.path
+import tensorflow as tf
+from tensorflow.contrib import slim as contrib_slim
+
+slim = contrib_slim
+
+dataset = slim.dataset
+
+tfexample_decoder = slim.tfexample_decoder
+
+
+_ITEMS_TO_DESCRIPTIONS = {
+ 'image': 'A color image of varying height and width.',
+ 'labels_class': ('A semantic segmentation label whose size matches image.'
+ 'Its values range from 0 (background) to num_classes.'),
+}
+
+# Named tuple to describe the dataset properties.
+DatasetDescriptor = collections.namedtuple(
+ 'DatasetDescriptor',
+ ['splits_to_sizes', # Splits of the dataset into training, val, and test.
+ 'num_classes', # Number of semantic classes, including the background
+ # class (if exists). For example, there are 20
+ # foreground classes + 1 background class in the PASCAL
+ # VOC 2012 dataset. Thus, we set num_classes=21.
+ 'ignore_label', # Ignore label value.
+ ]
+)
+
+_CITYSCAPES_INFORMATION = DatasetDescriptor(
+ splits_to_sizes={
+ 'train_fine': 2975,
+ 'val_fine': 500,
+ },
+ num_classes=19,
+ ignore_label=255,
+)
+
+_PASCAL_VOC_SEG_INFORMATION = DatasetDescriptor(
+ splits_to_sizes={
+ 'train': 1464,
+ 'train_aug': 10582,
+ 'trainval': 2913,
+ 'val': 1449,
+ },
+ num_classes=21,
+ ignore_label=255,
+)
+
+# These number (i.e., 'train'/'test') seems to have to be hard coded
+# You are required to figure it out for your training/testing example.
+_ADE20K_INFORMATION = DatasetDescriptor(
+ splits_to_sizes={
+ 'train': 20210, # num of samples in images/training
+ 'val': 2000, # num of samples in images/validation
+ },
+ num_classes=151,
+ ignore_label=0,
+)
+
+_PQR_INFORMATION = DatasetDescriptor(
+splits_to_sizes={
+ 'train': 3,
+ 'val': 2,
+ 'trainval': 5,
+},
+num_classes=2,
+ignore_label=255,
+)
+
+_DATASETS_INFORMATION = {
+ 'cityscapes': _CITYSCAPES_INFORMATION,
+ 'pascal_voc_seg': _PASCAL_VOC_SEG_INFORMATION,
+ 'ade20k': _ADE20K_INFORMATION,
+ 'pqr': _PQR_INFORMATION,
+}
+
+# Default file pattern of TFRecord of TensorFlow Example.
+_FILE_PATTERN = '%s-*'
+
+
+def get_cityscapes_dataset_name():
+ return 'cityscapes'
+
+
+def get_dataset(dataset_name, split_name, dataset_dir):
+ """Gets an instance of slim Dataset.
+
+ Args:
+ dataset_name: Dataset name.
+ split_name: A train/val Split name.
+ dataset_dir: The directory of the dataset sources.
+
+ Returns:
+ An instance of slim Dataset.
+
+ Raises:
+ ValueError: if the dataset_name or split_name is not recognized.
+ """
+ if dataset_name not in _DATASETS_INFORMATION:
+ raise ValueError('The specified dataset is not supported yet.')
+
+ splits_to_sizes = _DATASETS_INFORMATION[dataset_name].splits_to_sizes
+
+ if split_name not in splits_to_sizes:
+ raise ValueError('data split name %s not recognized' % split_name)
+
+ # Prepare the variables for different datasets.
+ num_classes = _DATASETS_INFORMATION[dataset_name].num_classes
+ ignore_label = _DATASETS_INFORMATION[dataset_name].ignore_label
+
+ file_pattern = _FILE_PATTERN
+ file_pattern = os.path.join(dataset_dir, file_pattern % split_name)
+
+ # Specify how the TF-Examples are decoded.
+ keys_to_features = {
+ 'image/encoded': tf.FixedLenFeature(
+ (), tf.string, default_value=''),
+ 'image/filename': tf.FixedLenFeature(
+ (), tf.string, default_value=''),
+ 'image/format': tf.FixedLenFeature(
+ (), tf.string, default_value='jpeg'),
+ 'image/height': tf.FixedLenFeature(
+ (), tf.int64, default_value=0),
+ 'image/width': tf.FixedLenFeature(
+ (), tf.int64, default_value=0),
+ 'image/segmentation/class/encoded': tf.FixedLenFeature(
+ (), tf.string, default_value=''),
+ 'image/segmentation/class/format': tf.FixedLenFeature(
+ (), tf.string, default_value='png'),
+ }
+ items_to_handlers = {
+ 'image': tfexample_decoder.Image(
+ image_key='image/encoded',
+ format_key='image/format',
+ channels=3),
+ 'image_name': tfexample_decoder.Tensor('image/filename'),
+ 'height': tfexample_decoder.Tensor('image/height'),
+ 'width': tfexample_decoder.Tensor('image/width'),
+ 'labels_class': tfexample_decoder.Image(
+ image_key='image/segmentation/class/encoded',
+ format_key='image/segmentation/class/format',
+ channels=1),
+ }
+
+ decoder = tfexample_decoder.TFExampleDecoder(
+ keys_to_features, items_to_handlers)
+
+ return dataset.Dataset(
+ data_sources=file_pattern,
+ reader=tf.TFRecordReader,
+ decoder=decoder,
+ num_samples=splits_to_sizes[split_name],
+ items_to_descriptions=_ITEMS_TO_DESCRIPTIONS,
+ ignore_label=ignore_label,
+ num_classes=num_classes,
+ name=dataset_name,
+ multi_label=True)
diff --git a/deeplab/models/research/deeplab/eval.py b/deeplab/models/research/deeplab/eval.py
new file mode 100644
index 0000000..4f5fb8b
--- /dev/null
+++ b/deeplab/models/research/deeplab/eval.py
@@ -0,0 +1,227 @@
+# Lint as: python2, python3
+# Copyright 2018 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Evaluation script for the DeepLab model.
+
+See model.py for more details and usage.
+"""
+
+import numpy as np
+import six
+import tensorflow as tf
+from tensorflow.contrib import metrics as contrib_metrics
+from tensorflow.contrib import quantize as contrib_quantize
+from tensorflow.contrib import tfprof as contrib_tfprof
+from tensorflow.contrib import training as contrib_training
+from deeplab import common
+from deeplab import model
+from deeplab.datasets import data_generator
+
+flags = tf.app.flags
+FLAGS = flags.FLAGS
+
+flags.DEFINE_string('master', '', 'BNS name of the tensorflow server')
+
+# Settings for log directories.
+
+flags.DEFINE_string('eval_logdir', None, 'Where to write the event logs.')
+
+flags.DEFINE_string('checkpoint_dir', None, 'Directory of model checkpoints.')
+
+# Settings for evaluating the model.
+
+flags.DEFINE_integer('eval_batch_size', 1,
+ 'The number of images in each batch during evaluation.')
+
+flags.DEFINE_list('eval_crop_size', '513,513',
+ 'Image crop size [height, width] for evaluation.')
+
+flags.DEFINE_integer('eval_interval_secs', 60 * 5,
+ 'How often (in seconds) to run evaluation.')
+
+# For `xception_65`, use atrous_rates = [12, 24, 36] if output_stride = 8, or
+# rates = [6, 12, 18] if output_stride = 16. For `mobilenet_v2`, use None. Note
+# one could use different atrous_rates/output_stride during training/evaluation.
+flags.DEFINE_multi_integer('atrous_rates', None,
+ 'Atrous rates for atrous spatial pyramid pooling.')
+
+flags.DEFINE_integer('output_stride', 16,
+ 'The ratio of input to output spatial resolution.')
+
+# Change to [0.5, 0.75, 1.0, 1.25, 1.5, 1.75] for multi-scale test.
+flags.DEFINE_multi_float('eval_scales', [1.0],
+ 'The scales to resize images for evaluation.')
+
+# Change to True for adding flipped images during test.
+flags.DEFINE_bool('add_flipped_images', False,
+ 'Add flipped images for evaluation or not.')
+
+flags.DEFINE_integer(
+ 'quantize_delay_step', -1,
+ 'Steps to start quantized training. If < 0, will not quantize model.')
+
+# Dataset settings.
+
+flags.DEFINE_string('dataset', 'pascal_voc_seg',
+ 'Name of the segmentation dataset.')
+
+flags.DEFINE_string('eval_split', 'val',
+ 'Which split of the dataset used for evaluation')
+
+flags.DEFINE_string('dataset_dir', None, 'Where the dataset reside.')
+
+flags.DEFINE_integer('max_number_of_evaluations', 0,
+ 'Maximum number of eval iterations. Will loop '
+ 'indefinitely upon nonpositive values.')
+
+
+def main(unused_argv):
+ tf.logging.set_verbosity(tf.logging.INFO)
+
+ dataset = data_generator.Dataset(
+ dataset_name=FLAGS.dataset,
+ split_name=FLAGS.eval_split,
+ dataset_dir=FLAGS.dataset_dir,
+ batch_size=FLAGS.eval_batch_size,
+ crop_size=[int(sz) for sz in FLAGS.eval_crop_size],
+ min_resize_value=FLAGS.min_resize_value,
+ max_resize_value=FLAGS.max_resize_value,
+ resize_factor=FLAGS.resize_factor,
+ model_variant=FLAGS.model_variant,
+ num_readers=2,
+ is_training=False,
+ should_shuffle=False,
+ should_repeat=False)
+
+ tf.gfile.MakeDirs(FLAGS.eval_logdir)
+ tf.logging.info('Evaluating on %s set', FLAGS.eval_split)
+
+ with tf.Graph().as_default():
+ samples = dataset.get_one_shot_iterator().get_next()
+
+ model_options = common.ModelOptions(
+ outputs_to_num_classes={common.OUTPUT_TYPE: dataset.num_of_classes},
+ crop_size=[int(sz) for sz in FLAGS.eval_crop_size],
+ atrous_rates=FLAGS.atrous_rates,
+ output_stride=FLAGS.output_stride)
+
+ # Set shape in order for tf.contrib.tfprof.model_analyzer to work properly.
+ samples[common.IMAGE].set_shape(
+ [FLAGS.eval_batch_size,
+ int(FLAGS.eval_crop_size[0]),
+ int(FLAGS.eval_crop_size[1]),
+ 3])
+ if tuple(FLAGS.eval_scales) == (1.0,):
+ tf.logging.info('Performing single-scale test.')
+ predictions = model.predict_labels(samples[common.IMAGE], model_options,
+ image_pyramid=FLAGS.image_pyramid)
+ else:
+ tf.logging.info('Performing multi-scale test.')
+ if FLAGS.quantize_delay_step >= 0:
+ raise ValueError(
+ 'Quantize mode is not supported with multi-scale test.')
+
+ predictions = model.predict_labels_multi_scale(
+ samples[common.IMAGE],
+ model_options=model_options,
+ eval_scales=FLAGS.eval_scales,
+ add_flipped_images=FLAGS.add_flipped_images)
+ predictions = predictions[common.OUTPUT_TYPE]
+ predictions = tf.reshape(predictions, shape=[-1])
+ labels = tf.reshape(samples[common.LABEL], shape=[-1])
+ weights = tf.to_float(tf.not_equal(labels, dataset.ignore_label))
+
+ # Set ignore_label regions to label 0, because metrics.mean_iou requires
+ # range of labels = [0, dataset.num_classes). Note the ignore_label regions
+ # are not evaluated since the corresponding regions contain weights = 0.
+ labels = tf.where(
+ tf.equal(labels, dataset.ignore_label), tf.zeros_like(labels), labels)
+
+ predictions_tag = 'miou'
+ for eval_scale in FLAGS.eval_scales:
+ predictions_tag += '_' + str(eval_scale)
+ if FLAGS.add_flipped_images:
+ predictions_tag += '_flipped'
+
+ # Define the evaluation metric.
+ metric_map = {}
+ num_classes = dataset.num_of_classes
+ metric_map['eval/%s_overall' % predictions_tag] = tf.metrics.mean_iou(
+ labels=labels, predictions=predictions, num_classes=num_classes,
+ weights=weights)
+ # IoU for each class.
+ one_hot_predictions = tf.one_hot(predictions, num_classes)
+ one_hot_predictions = tf.reshape(one_hot_predictions, [-1, num_classes])
+ one_hot_labels = tf.one_hot(labels, num_classes)
+ one_hot_labels = tf.reshape(one_hot_labels, [-1, num_classes])
+ for c in range(num_classes):
+ predictions_tag_c = '%s_class_%d' % (predictions_tag, c)
+ tp, tp_op = tf.metrics.true_positives(
+ labels=one_hot_labels[:, c], predictions=one_hot_predictions[:, c],
+ weights=weights)
+ fp, fp_op = tf.metrics.false_positives(
+ labels=one_hot_labels[:, c], predictions=one_hot_predictions[:, c],
+ weights=weights)
+ fn, fn_op = tf.metrics.false_negatives(
+ labels=one_hot_labels[:, c], predictions=one_hot_predictions[:, c],
+ weights=weights)
+ tp_fp_fn_op = tf.group(tp_op, fp_op, fn_op)
+ iou = tf.where(tf.greater(tp + fn, 0.0),
+ tp / (tp + fn + fp),
+ tf.constant(np.NaN))
+ metric_map['eval/%s' % predictions_tag_c] = (iou, tp_fp_fn_op)
+
+ (metrics_to_values,
+ metrics_to_updates) = contrib_metrics.aggregate_metric_map(metric_map)
+
+ summary_ops = []
+ for metric_name, metric_value in six.iteritems(metrics_to_values):
+ op = tf.summary.scalar(metric_name, metric_value)
+ op = tf.Print(op, [metric_value], metric_name)
+ summary_ops.append(op)
+
+ summary_op = tf.summary.merge(summary_ops)
+ summary_hook = contrib_training.SummaryAtEndHook(
+ log_dir=FLAGS.eval_logdir, summary_op=summary_op)
+ hooks = [summary_hook]
+
+ num_eval_iters = None
+ if FLAGS.max_number_of_evaluations > 0:
+ num_eval_iters = FLAGS.max_number_of_evaluations
+
+ if FLAGS.quantize_delay_step >= 0:
+ contrib_quantize.create_eval_graph()
+
+ contrib_tfprof.model_analyzer.print_model_analysis(
+ tf.get_default_graph(),
+ tfprof_options=contrib_tfprof.model_analyzer
+ .TRAINABLE_VARS_PARAMS_STAT_OPTIONS)
+ contrib_tfprof.model_analyzer.print_model_analysis(
+ tf.get_default_graph(),
+ tfprof_options=contrib_tfprof.model_analyzer.FLOAT_OPS_OPTIONS)
+ contrib_training.evaluate_repeatedly(
+ checkpoint_dir=FLAGS.checkpoint_dir,
+ master=FLAGS.master,
+ eval_ops=list(metrics_to_updates.values()),
+ max_number_of_evaluations=num_eval_iters,
+ hooks=hooks,
+ eval_interval_secs=FLAGS.eval_interval_secs)
+
+
+if __name__ == '__main__':
+ flags.mark_flag_as_required('checkpoint_dir')
+ flags.mark_flag_as_required('eval_logdir')
+ flags.mark_flag_as_required('dataset_dir')
+ tf.app.run()
diff --git a/deeplab/models/research/deeplab/evaluation/README.md b/deeplab/models/research/deeplab/evaluation/README.md
new file mode 100644
index 0000000..6925538
--- /dev/null
+++ b/deeplab/models/research/deeplab/evaluation/README.md
@@ -0,0 +1,311 @@
+# Evaluation Metrics for Whole Image Parsing
+
+Whole Image Parsing [1], also known as Panoptic Segmentation [2], generalizes
+the tasks of semantic segmentation for "stuff" classes and instance
+segmentation for "thing" classes, assigning both semantic and instance labels
+to every pixel in an image.
+
+Previous works evaluate the parsing result with separate metrics (e.g., one for
+semantic segmentation result and one for object detection result). Recently,
+Kirillov et al. propose the unified instance-based Panoptic Quality (PQ) metric
+[2] into several benchmarks [3, 4].
+
+However, we notice that the instance-based PQ metric often places
+disproportionate emphasis on small instance parsing, as well as on "thing" over
+"stuff" classes. To remedy these effects, we propose an alternative
+region-based Parsing Covering (PC) metric [5], which adapts the Covering
+metric [6], previously used for class-agnostics segmentation quality
+evaluation, to the task of image parsing.
+
+Here, we provide implementation of both PQ and PC for evaluating the parsing
+results. We briefly explain both metrics below for reference.
+
+## Panoptic Quality (PQ)
+
+Given a groundtruth segmentation S and a predicted segmentation S', PQ is
+defined as follows:
+
+
+
+
+
+where R and R' are groundtruth regions and predicted regions respectively,
+and |TP|, |FP|, and |FN| are the number of true positives, false postives,
+and false negatives. The matching is determined by a threshold of 0.5
+Intersection-Over-Union (IOU).
+
+PQ treats all regions of the same ‘stuff‘ class as one instance, and the
+size of instances is not considered. For example, instances with 10 Ă— 10
+pixels contribute equally to the metric as instances with 1000 Ă— 1000 pixels.
+Therefore, PQ is sensitive to false positives with small regions and some
+heuristics could improve the performance, such as removing those small
+regions (as also pointed out in the open-sourced evaluation code from [2]).
+Thus, we argue that PQ is suitable in applications where one cares equally for
+the parsing quality of instances irrespective of their sizes.
+
+## Parsing Covering (PC)
+
+We notice that there are applications where one pays more attention to large
+objects, e.g., autonomous driving (where nearby objects are more important
+than far away ones). Motivated by this, we propose to also evaluate the
+quality of image parsing results by extending the existing Covering metric [5],
+which accounts for instance sizes. Specifically, our proposed metric, Parsing
+Covering (PC), is defined as follows:
+
+
+
+
+
+
+where Si and Si' are the groundtruth segmentation and
+predicted segmentation for the i-th semantic class respectively, and
+Ni is the total number of pixels of groundtruth regions from
+Si . The Covering for class i, Covi , is computed in
+the same way as the original Covering metric except that only groundtruth
+regions from Si and predicted regions from Si' are
+considered. PC is then obtained by computing the average of Covi
+over C semantic classes.
+
+A notable difference between PQ and the proposed PC is that there is no
+matching involved in PC and hence no matching threshold. As an attempt to
+treat equally "thing" and "stuff", the segmentation of "stuff" classes still
+receives partial PC score if the segmentation is only partially correct. For
+example, if one out of three equally-sized trees is perfectly segmented, the
+model will get the same partial score by using PC regardless of considering
+"tree" as "stuff" or "thing".
+
+## Tutorial
+
+To evaluate the parsing results with PQ and PC, we provide two options:
+
+1. Python off-line evaluation with results saved in the [COCO format](http://cocodataset.org/#format-results).
+2. TensorFlow on-line evaluation.
+
+Below, we explain each option in detail.
+
+#### 1. Python off-line evaluation with results saved in COCO format
+
+[COCO result format](http://cocodataset.org/#format-results) has been
+adopted by several benchmarks [3, 4]. Therefore, we provide a convenient
+function, `eval_coco_format`, to evaluate the results saved in COCO format
+in terms of PC and re-implemented PQ.
+
+Before using the provided function, the users need to download the official COCO
+panotpic segmentation task API. Please see [installation](../g3doc/installation.md#add-libraries-to-pythonpath)
+for reference.
+
+Once the official COCO panoptic segmentation task API is downloaded, the
+users should be able to run the `eval_coco_format.py` to evaluate the parsing
+results in terms of both PC and reimplemented PQ.
+
+To be concrete, let's take a look at the function, `eval_coco_format` in
+`eval_coco_format.py`:
+
+```python
+eval_coco_format(gt_json_file,
+ pred_json_file,
+ gt_folder=None,
+ pred_folder=None,
+ metric='pq',
+ num_categories=201,
+ ignored_label=0,
+ max_instances_per_category=256,
+ intersection_offset=None,
+ normalize_by_image_size=True,
+ num_workers=0,
+ print_digits=3):
+
+```
+where
+
+1. `gt_json_file`: Path to a JSON file giving ground-truth annotations in COCO
+format.
+2. `pred_json_file`: Path to a JSON file for the predictions to evaluate.
+3. `gt_folder`: Folder containing panoptic-format ID images to match
+ground-truth annotations to image regions.
+4. `pred_folder`: Path to a folder containing ID images for predictions.
+5. `metric`: Name of a metric to compute. Set to `pc`, `pq` for evaluation in PC
+or PQ, respectively.
+6. `num_categories`: The number of segmentation categories (or "classes") in the
+dataset.
+7. `ignored_label`: A category id that is ignored in evaluation, e.g. the "void"
+label in COCO panoptic segmentation dataset.
+8. `max_instances_per_category`: The maximum number of instances for each
+category to ensure unique instance labels.
+9. `intersection_offset`: The maximum number of unique labels.
+10. `normalize_by_image_size`: Whether to normalize groundtruth instance region
+areas by image size when using PC.
+11. `num_workers`: If set to a positive number, will spawn child processes to
+compute parts of the metric in parallel by splitting the images between the
+workers. If set to -1, will use the value of multiprocessing.cpu_count().
+12. `print_digits`: Number of significant digits to print in summary of computed
+metrics.
+
+The input arguments have default values set for the COCO panoptic segmentation
+dataset. Thus, users only need to provide the `gt_json_file` and the
+`pred_json_file` (following the COCO format) to run the evaluation on COCO with
+PQ. If users want to evaluate the results on other datasets, they may need
+to change the default values.
+
+As an example, the interested users could take a look at the provided unit
+test, `test_compare_pq_with_reference_eval`, in `eval_coco_format_test.py`.
+
+#### 2. TensorFlow on-line evaluation
+
+Users may also want to run the TensorFlow on-line evaluation, similar to the
+[tf.contrib.metrics.streaming_mean_iou](https://www.tensorflow.org/api_docs/python/tf/contrib/metrics/streaming_mean_iou).
+
+Below, we provide a code snippet that shows how to use the provided
+`streaming_panoptic_quality` and `streaming_parsing_covering`.
+
+```python
+metric_map = {}
+metric_map['panoptic_quality'] = streaming_metrics.streaming_panoptic_quality(
+ category_label,
+ instance_label,
+ category_prediction,
+ instance_prediction,
+ num_classes=201,
+ max_instances_per_category=256,
+ ignored_label=0,
+ offset=256*256)
+metric_map['parsing_covering'] = streaming_metrics.streaming_parsing_covering(
+ category_label,
+ instance_label,
+ category_prediction,
+ instance_prediction,
+ num_classes=201,
+ max_instances_per_category=256,
+ ignored_label=0,
+ offset=256*256,
+ normalize_by_image_size=True)
+metrics_to_values, metrics_to_updates = slim.metrics.aggregate_metric_map(
+ metric_map)
+```
+where `metric_map` is a dictionary storing the streamed results of PQ and PC.
+
+The `category_label` and the `instance_label` are the semantic segmentation and
+instance segmentation groundtruth, respectively. That is, in the panoptic
+segmentation format:
+panoptic_label = category_label * max_instances_per_category + instance_label.
+Similarly, the `category_prediction` and the `instance_prediction` are the
+predicted semantic segmentation and instance segmentation, respectively.
+
+Below, we provide a code snippet about how to summarize the results in the
+context of tf.summary.
+
+```python
+summary_ops = []
+for metric_name, metric_value in metrics_to_values.iteritems():
+ if metric_name == 'panoptic_quality':
+ [pq, sq, rq, total_tp, total_fn, total_fp] = tf.unstack(
+ metric_value, 6, axis=0)
+ panoptic_metrics = {
+ # Panoptic quality.
+ 'pq': pq,
+ # Segmentation quality.
+ 'sq': sq,
+ # Recognition quality.
+ 'rq': rq,
+ # Total true positives.
+ 'total_tp': total_tp,
+ # Total false negatives.
+ 'total_fn': total_fn,
+ # Total false positives.
+ 'total_fp': total_fp,
+ }
+ # Find the valid classes that will be used for evaluation. We will
+ # ignore the `ignore_label` class and other classes which have (tp + fn
+ # + fp) equal to 0.
+ valid_classes = tf.logical_and(
+ tf.not_equal(tf.range(0, num_classes), void_label),
+ tf.not_equal(total_tp + total_fn + total_fp, 0))
+ for target_metric, target_value in panoptic_metrics.iteritems():
+ output_metric_name = '{}_{}'.format(metric_name, target_metric)
+ op = tf.summary.scalar(
+ output_metric_name,
+ tf.reduce_mean(tf.boolean_mask(target_value, valid_classes)))
+ op = tf.Print(op, [target_value], output_metric_name + '_classwise: ',
+ summarize=num_classes)
+ op = tf.Print(
+ op,
+ [tf.reduce_mean(tf.boolean_mask(target_value, valid_classes))],
+ output_metric_name + '_mean: ',
+ summarize=1)
+ summary_ops.append(op)
+ elif metric_name == 'parsing_covering':
+ [per_class_covering,
+ total_per_class_weighted_ious,
+ total_per_class_gt_areas] = tf.unstack(metric_value, 3, axis=0)
+ # Find the valid classes that will be used for evaluation. We will
+ # ignore the `void_label` class and other classes which have
+ # total_per_class_weighted_ious + total_per_class_gt_areas equal to 0.
+ valid_classes = tf.logical_and(
+ tf.not_equal(tf.range(0, num_classes), void_label),
+ tf.not_equal(
+ total_per_class_weighted_ious + total_per_class_gt_areas, 0))
+ op = tf.summary.scalar(
+ metric_name,
+ tf.reduce_mean(tf.boolean_mask(per_class_covering, valid_classes)))
+ op = tf.Print(op, [per_class_covering], metric_name + '_classwise: ',
+ summarize=num_classes)
+ op = tf.Print(
+ op,
+ [tf.reduce_mean(
+ tf.boolean_mask(per_class_covering, valid_classes))],
+ metric_name + '_mean: ',
+ summarize=1)
+ summary_ops.append(op)
+ else:
+ raise ValueError('The metric_name "%s" is not supported.' % metric_name)
+```
+
+Afterwards, the users could use the following code to run the evaluation in
+TensorFlow.
+
+Users can take a look at eval.py for reference which provides a simple
+example to run the streaming evaluation of mIOU for semantic segmentation.
+
+```python
+metric_values = slim.evaluation.evaluation_loop(
+ master=FLAGS.master,
+ checkpoint_dir=FLAGS.checkpoint_dir,
+ logdir=FLAGS.eval_logdir,
+ num_evals=num_batches,
+ eval_op=metrics_to_updates.values(),
+ final_op=metrics_to_values.values(),
+ summary_op=tf.summary.merge(summary_ops),
+ max_number_of_evaluations=FLAGS.max_number_of_evaluations,
+ eval_interval_secs=FLAGS.eval_interval_secs)
+```
+
+
+### References
+
+1. **Image Parsing: Unifying Segmentation, Detection, and Recognition**
+ Zhuowen Tu, Xiangrong Chen, Alan L. Yuille, and Song-Chun Zhu
+ IJCV, 2005.
+
+2. **Panoptic Segmentation**
+ Alexander Kirillov, Kaiming He, Ross Girshick, Carsten Rother and Piotr
+ DollĂ¡r
+ arXiv:1801.00868, 2018.
+
+3. **Microsoft COCO: Common Objects in Context**
+ Tsung-Yi Lin, Michael Maire, Serge Belongie, Lubomir Bourdev, Ross
+ Girshick, James Hays, Pietro Perona, Deva Ramanan, C. Lawrence Zitnick,
+ Piotr Dollar
+ In the Proc. of ECCV, 2014.
+
+4. **The Mapillary Vistas Dataset for Semantic Understanding of Street Scenes**
+ Gerhard Neuhold, Tobias Ollmann, Samuel Rota BulĂ², and Peter Kontschieder
+ In the Proc. of ICCV, 2017.
+
+5. **DeeperLab: Single-Shot Image Parser**
+ Tien-Ju Yang, Maxwell D. Collins, Yukun Zhu, Jyh-Jing Hwang, Ting Liu,
+ Xiao Zhang, Vivienne Sze, George Papandreou, Liang-Chieh Chen
+ arXiv: 1902.05093, 2019.
+
+6. **Contour Detection and Hierarchical Image Segmentation**
+ Pablo Arbelaez, Michael Maire, Charless Fowlkes, and Jitendra Malik
+ PAMI, 2011
diff --git a/deeplab/models/research/deeplab/evaluation/__init__.py b/deeplab/models/research/deeplab/evaluation/__init__.py
new file mode 100644
index 0000000..e69de29
diff --git a/deeplab/models/research/deeplab/evaluation/base_metric.py b/deeplab/models/research/deeplab/evaluation/base_metric.py
new file mode 100644
index 0000000..ee7606e
--- /dev/null
+++ b/deeplab/models/research/deeplab/evaluation/base_metric.py
@@ -0,0 +1,191 @@
+# Copyright 2019 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Defines the top-level interface for evaluating segmentations."""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import abc
+import numpy as np
+import six
+
+
+_EPSILON = 1e-10
+
+
+def realdiv_maybe_zero(x, y):
+ """Element-wise x / y where y may contain zeros, for those returns 0 too."""
+ return np.where(
+ np.less(np.abs(y), _EPSILON), np.zeros_like(x), np.divide(x, y))
+
+
+@six.add_metaclass(abc.ABCMeta)
+class SegmentationMetric(object):
+ """Abstract base class for computers of segmentation metrics.
+
+ Subclasses will implement both:
+ 1. Comparing the predicted segmentation for an image with the groundtruth.
+ 2. Computing the final metric over a set of images.
+ These are often done as separate steps, due to the need to accumulate
+ intermediate values other than the metric itself across images, computing the
+ actual metric value only on these accumulations after all the images have been
+ compared.
+
+ A simple usage would be:
+
+ metric = MetricImplementation(...)
+ for , in evaluation_set:
+ = run_segmentation()
+ metric.compare_and_accumulate(, )
+ print(metric.result())
+
+ """
+
+ def __init__(self, num_categories, ignored_label, max_instances_per_category,
+ offset):
+ """Base initialization for SegmentationMetric.
+
+ Args:
+ num_categories: The number of segmentation categories (or "classes" in the
+ dataset.
+ ignored_label: A category id that is ignored in evaluation, e.g. the void
+ label as defined in COCO panoptic segmentation dataset.
+ max_instances_per_category: The maximum number of instances for each
+ category. Used in ensuring unique instance labels.
+ offset: The maximum number of unique labels. This is used, by multiplying
+ the ground-truth labels, to generate unique ids for individual regions
+ of overlap between groundtruth and predicted segments.
+ """
+ self.num_categories = num_categories
+ self.ignored_label = ignored_label
+ self.max_instances_per_category = max_instances_per_category
+ self.offset = offset
+ self.reset()
+
+ def _naively_combine_labels(self, category_array, instance_array):
+ """Naively creates a combined label array from categories and instances."""
+ return (category_array.astype(np.uint32) * self.max_instances_per_category +
+ instance_array.astype(np.uint32))
+
+ @abc.abstractmethod
+ def compare_and_accumulate(
+ self, groundtruth_category_array, groundtruth_instance_array,
+ predicted_category_array, predicted_instance_array):
+ """Compares predicted segmentation with groundtruth, accumulates its metric.
+
+ It is not assumed that instance ids are unique across different categories.
+ See for example combine_semantic_and_instance_predictions.py in official
+ PanopticAPI evaluation code for issues to consider when fusing category
+ and instance labels.
+
+ Instances ids of the ignored category have the meaning that id 0 is "void"
+ and remaining ones are crowd instances.
+
+ Args:
+ groundtruth_category_array: A 2D numpy uint16 array of groundtruth
+ per-pixel category labels.
+ groundtruth_instance_array: A 2D numpy uint16 array of groundtruth
+ instance labels.
+ predicted_category_array: A 2D numpy uint16 array of predicted per-pixel
+ category labels.
+ predicted_instance_array: A 2D numpy uint16 array of predicted instance
+ labels.
+
+ Returns:
+ The value of the metric over all comparisons done so far, including this
+ one, as a float scalar.
+ """
+ raise NotImplementedError('Must be implemented in subclasses.')
+
+ @abc.abstractmethod
+ def result(self):
+ """Computes the metric over all comparisons done so far."""
+ raise NotImplementedError('Must be implemented in subclasses.')
+
+ @abc.abstractmethod
+ def detailed_results(self, is_thing=None):
+ """Computes and returns the detailed final metric results.
+
+ Args:
+ is_thing: A boolean array of length `num_categories`. The entry
+ `is_thing[category_id]` is True iff that category is a "thing" category
+ instead of "stuff."
+
+ Returns:
+ A dictionary with a breakdown of metrics and/or metric factors by things,
+ stuff, and all categories.
+ """
+ raise NotImplementedError('Not implemented in subclasses.')
+
+ @abc.abstractmethod
+ def result_per_category(self):
+ """For supported metrics, return individual per-category metric values.
+
+ Returns:
+ A numpy array of shape `[self.num_categories]`, where index `i` is the
+ metrics value over only that category.
+ """
+ raise NotImplementedError('Not implemented in subclass.')
+
+ def print_detailed_results(self, is_thing=None, print_digits=3):
+ """Prints out a detailed breakdown of metric results.
+
+ Args:
+ is_thing: A boolean array of length num_categories.
+ `is_thing[category_id]` will say whether that category is a "thing"
+ rather than "stuff."
+ print_digits: Number of significant digits to print in computed metrics.
+ """
+ raise NotImplementedError('Not implemented in subclass.')
+
+ @abc.abstractmethod
+ def merge(self, other_instance):
+ """Combines the accumulated results of another instance into self.
+
+ The following two cases should put `metric_a` into an equivalent state.
+
+ Case 1 (with merge):
+
+ metric_a = MetricsSubclass(...)
+ metric_a.compare_and_accumulate()
+ metric_a.compare_and_accumulate()
+
+ metric_b = MetricsSubclass(...)
+ metric_b.compare_and_accumulate()
+ metric_b.compare_and_accumulate()
+
+ metric_a.merge(metric_b)
+
+ Case 2 (without merge):
+
+ metric_a = MetricsSubclass(...)
+ metric_a.compare_and_accumulate()
+ metric_a.compare_and_accumulate()
+ metric_a.compare_and_accumulate()
+ metric_a.compare_and_accumulate()
+
+ Args:
+ other_instance: Another compatible instance of the same metric subclass.
+ """
+ raise NotImplementedError('Not implemented in subclass.')
+
+ @abc.abstractmethod
+ def reset(self):
+ """Resets the accumulation to the metric class's state at initialization.
+
+ Note that this function will be called in SegmentationMetric.__init__.
+ """
+ raise NotImplementedError('Must be implemented in subclasses.')
diff --git a/deeplab/models/research/deeplab/evaluation/eval_coco_format.py b/deeplab/models/research/deeplab/evaluation/eval_coco_format.py
new file mode 100644
index 0000000..1a26446
--- /dev/null
+++ b/deeplab/models/research/deeplab/evaluation/eval_coco_format.py
@@ -0,0 +1,338 @@
+# Lint as: python2, python3
+# Copyright 2019 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Computes evaluation metrics on groundtruth and predictions in COCO format.
+
+The Common Objects in Context (COCO) dataset defines a format for specifying
+combined semantic and instance segmentations as "panoptic" segmentations. This
+is done with the combination of JSON and image files as specified at:
+http://cocodataset.org/#format-results
+where the JSON file specifies the overall structure of the result,
+including the categories for each annotation, and the images specify the image
+region for each annotation in that image by its ID.
+
+This script computes additional metrics such as Parsing Covering on datasets and
+predictions in this format. An implementation of Panoptic Quality is also
+provided for convenience.
+"""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import collections
+import json
+import multiprocessing
+import os
+
+from absl import app
+from absl import flags
+from absl import logging
+import numpy as np
+from PIL import Image
+import utils as panopticapi_utils
+import six
+
+from deeplab.evaluation import panoptic_quality
+from deeplab.evaluation import parsing_covering
+
+FLAGS = flags.FLAGS
+
+flags.DEFINE_string(
+ 'gt_json_file', None,
+ ' Path to a JSON file giving ground-truth annotations in COCO format.')
+flags.DEFINE_string('pred_json_file', None,
+ 'Path to a JSON file for the predictions to evaluate.')
+flags.DEFINE_string(
+ 'gt_folder', None,
+ 'Folder containing panoptic-format ID images to match ground-truth '
+ 'annotations to image regions.')
+flags.DEFINE_string('pred_folder', None,
+ 'Folder containing ID images for predictions.')
+flags.DEFINE_enum(
+ 'metric', 'pq', ['pq', 'pc'], 'Shorthand name of a metric to compute. '
+ 'Supported values are:\n'
+ 'Panoptic Quality (pq)\n'
+ 'Parsing Covering (pc)')
+flags.DEFINE_integer(
+ 'num_categories', 201,
+ 'The number of segmentation categories (or "classes") in the dataset.')
+flags.DEFINE_integer(
+ 'ignored_label', 0,
+ 'A category id that is ignored in evaluation, e.g. the void label as '
+ 'defined in COCO panoptic segmentation dataset.')
+flags.DEFINE_integer(
+ 'max_instances_per_category', 256,
+ 'The maximum number of instances for each category. Used in ensuring '
+ 'unique instance labels.')
+flags.DEFINE_integer('intersection_offset', None,
+ 'The maximum number of unique labels.')
+flags.DEFINE_bool(
+ 'normalize_by_image_size', True,
+ 'Whether to normalize groundtruth instance region areas by image size. If '
+ 'True, groundtruth instance areas and weighted IoUs will be divided by the '
+ 'size of the corresponding image before accumulated across the dataset. '
+ 'Only used for Parsing Covering (pc) evaluation.')
+flags.DEFINE_integer(
+ 'num_workers', 0, 'If set to a positive number, will spawn child processes '
+ 'to compute parts of the metric in parallel by splitting '
+ 'the images between the workers. If set to -1, will use '
+ 'the value of multiprocessing.cpu_count().')
+flags.DEFINE_integer('print_digits', 3,
+ 'Number of significant digits to print in metrics.')
+
+
+def _build_metric(metric,
+ num_categories,
+ ignored_label,
+ max_instances_per_category,
+ intersection_offset=None,
+ normalize_by_image_size=True):
+ """Creates a metric aggregator objet of the given name."""
+ if metric == 'pq':
+ logging.warning('One should check Panoptic Quality results against the '
+ 'official COCO API code. Small numerical differences '
+ '(< 0.1%) can be magnified by rounding.')
+ return panoptic_quality.PanopticQuality(num_categories, ignored_label,
+ max_instances_per_category,
+ intersection_offset)
+ elif metric == 'pc':
+ return parsing_covering.ParsingCovering(
+ num_categories, ignored_label, max_instances_per_category,
+ intersection_offset, normalize_by_image_size)
+ else:
+ raise ValueError('No implementation for metric "%s"' % metric)
+
+
+def _matched_annotations(gt_json, pred_json):
+ """Yields a set of (groundtruth, prediction) image annotation pairs.."""
+ image_id_to_pred_ann = {
+ annotation['image_id']: annotation
+ for annotation in pred_json['annotations']
+ }
+ for gt_ann in gt_json['annotations']:
+ image_id = gt_ann['image_id']
+ pred_ann = image_id_to_pred_ann[image_id]
+ yield gt_ann, pred_ann
+
+
+def _open_panoptic_id_image(image_path):
+ """Loads a COCO-format panoptic ID image from file."""
+ return panopticapi_utils.rgb2id(
+ np.array(Image.open(image_path), dtype=np.uint32))
+
+
+def _split_panoptic(ann_json, id_array, ignored_label, allow_crowds):
+ """Given the COCO JSON and ID map, splits into categories and instances."""
+ category = np.zeros(id_array.shape, np.uint16)
+ instance = np.zeros(id_array.shape, np.uint16)
+ next_instance_id = collections.defaultdict(int)
+ # Skip instance label 0 for ignored label. That is reserved for void.
+ next_instance_id[ignored_label] = 1
+ for segment_info in ann_json['segments_info']:
+ if allow_crowds and segment_info['iscrowd']:
+ category_id = ignored_label
+ else:
+ category_id = segment_info['category_id']
+ mask = np.equal(id_array, segment_info['id'])
+ category[mask] = category_id
+ instance[mask] = next_instance_id[category_id]
+ next_instance_id[category_id] += 1
+ return category, instance
+
+
+def _category_and_instance_from_annotation(ann_json, folder, ignored_label,
+ allow_crowds):
+ """Given the COCO JSON annotations, finds maps of categories and instances."""
+ panoptic_id_image = _open_panoptic_id_image(
+ os.path.join(folder, ann_json['file_name']))
+ return _split_panoptic(ann_json, panoptic_id_image, ignored_label,
+ allow_crowds)
+
+
+def _compute_metric(metric_aggregator, gt_folder, pred_folder,
+ annotation_pairs):
+ """Iterates over matched annotation pairs and computes a metric over them."""
+ for gt_ann, pred_ann in annotation_pairs:
+ # We only expect "iscrowd" to appear in the ground-truth, and not in model
+ # output. In predicted JSON it is simply ignored, as done in official code.
+ gt_category, gt_instance = _category_and_instance_from_annotation(
+ gt_ann, gt_folder, metric_aggregator.ignored_label, True)
+ pred_category, pred_instance = _category_and_instance_from_annotation(
+ pred_ann, pred_folder, metric_aggregator.ignored_label, False)
+
+ metric_aggregator.compare_and_accumulate(gt_category, gt_instance,
+ pred_category, pred_instance)
+ return metric_aggregator
+
+
+def _iterate_work_queue(work_queue):
+ """Creates an iterable that retrieves items from a queue until one is None."""
+ task = work_queue.get(block=True)
+ while task is not None:
+ yield task
+ task = work_queue.get(block=True)
+
+
+def _run_metrics_worker(metric_aggregator, gt_folder, pred_folder, work_queue,
+ result_queue):
+ result = _compute_metric(metric_aggregator, gt_folder, pred_folder,
+ _iterate_work_queue(work_queue))
+ result_queue.put(result, block=True)
+
+
+def _is_thing_array(categories_json, ignored_label):
+ """is_thing[category_id] is a bool on if category is "thing" or "stuff"."""
+ is_thing_dict = {}
+ for category_json in categories_json:
+ is_thing_dict[category_json['id']] = bool(category_json['isthing'])
+
+ # Check our assumption that the category ids are consecutive.
+ # Usually metrics should be able to handle this case, but adding a warning
+ # here.
+ max_category_id = max(six.iterkeys(is_thing_dict))
+ if len(is_thing_dict) != max_category_id + 1:
+ seen_ids = six.viewkeys(is_thing_dict)
+ all_ids = set(six.moves.range(max_category_id + 1))
+ unseen_ids = all_ids.difference(seen_ids)
+ if unseen_ids != {ignored_label}:
+ logging.warning(
+ 'Nonconsecutive category ids or no category JSON specified for ids: '
+ '%s', unseen_ids)
+
+ is_thing_array = np.zeros(max_category_id + 1)
+ for category_id, is_thing in six.iteritems(is_thing_dict):
+ is_thing_array[category_id] = is_thing
+
+ return is_thing_array
+
+
+def eval_coco_format(gt_json_file,
+ pred_json_file,
+ gt_folder=None,
+ pred_folder=None,
+ metric='pq',
+ num_categories=201,
+ ignored_label=0,
+ max_instances_per_category=256,
+ intersection_offset=None,
+ normalize_by_image_size=True,
+ num_workers=0,
+ print_digits=3):
+ """Top-level code to compute metrics on a COCO-format result.
+
+ Note that the default values are set for COCO panoptic segmentation dataset,
+ and thus the users may want to change it for their own dataset evaluation.
+
+ Args:
+ gt_json_file: Path to a JSON file giving ground-truth annotations in COCO
+ format.
+ pred_json_file: Path to a JSON file for the predictions to evaluate.
+ gt_folder: Folder containing panoptic-format ID images to match ground-truth
+ annotations to image regions.
+ pred_folder: Folder containing ID images for predictions.
+ metric: Name of a metric to compute.
+ num_categories: The number of segmentation categories (or "classes") in the
+ dataset.
+ ignored_label: A category id that is ignored in evaluation, e.g. the "void"
+ label as defined in the COCO panoptic segmentation dataset.
+ max_instances_per_category: The maximum number of instances for each
+ category. Used in ensuring unique instance labels.
+ intersection_offset: The maximum number of unique labels.
+ normalize_by_image_size: Whether to normalize groundtruth instance region
+ areas by image size. If True, groundtruth instance areas and weighted IoUs
+ will be divided by the size of the corresponding image before accumulated
+ across the dataset. Only used for Parsing Covering (pc) evaluation.
+ num_workers: If set to a positive number, will spawn child processes to
+ compute parts of the metric in parallel by splitting the images between
+ the workers. If set to -1, will use the value of
+ multiprocessing.cpu_count().
+ print_digits: Number of significant digits to print in summary of computed
+ metrics.
+
+ Returns:
+ The computed result of the metric as a float scalar.
+ """
+ with open(gt_json_file, 'r') as gt_json_fo:
+ gt_json = json.load(gt_json_fo)
+ with open(pred_json_file, 'r') as pred_json_fo:
+ pred_json = json.load(pred_json_fo)
+ if gt_folder is None:
+ gt_folder = gt_json_file.replace('.json', '')
+ if pred_folder is None:
+ pred_folder = pred_json_file.replace('.json', '')
+ if intersection_offset is None:
+ intersection_offset = (num_categories + 1) * max_instances_per_category
+
+ metric_aggregator = _build_metric(
+ metric, num_categories, ignored_label, max_instances_per_category,
+ intersection_offset, normalize_by_image_size)
+
+ if num_workers == -1:
+ logging.info('Attempting to get the CPU count to set # workers.')
+ num_workers = multiprocessing.cpu_count()
+
+ if num_workers > 0:
+ logging.info('Computing metric in parallel with %d workers.', num_workers)
+ work_queue = multiprocessing.Queue()
+ result_queue = multiprocessing.Queue()
+ workers = []
+ worker_args = (metric_aggregator, gt_folder, pred_folder, work_queue,
+ result_queue)
+ for _ in six.moves.range(num_workers):
+ workers.append(
+ multiprocessing.Process(target=_run_metrics_worker, args=worker_args))
+ for worker in workers:
+ worker.start()
+ for ann_pair in _matched_annotations(gt_json, pred_json):
+ work_queue.put(ann_pair, block=True)
+
+ # Will cause each worker to return a result and terminate upon recieving a
+ # None task.
+ for _ in six.moves.range(num_workers):
+ work_queue.put(None, block=True)
+
+ # Retrieve results.
+ for _ in six.moves.range(num_workers):
+ metric_aggregator.merge(result_queue.get(block=True))
+
+ for worker in workers:
+ worker.join()
+ else:
+ logging.info('Computing metric in a single process.')
+ annotation_pairs = _matched_annotations(gt_json, pred_json)
+ _compute_metric(metric_aggregator, gt_folder, pred_folder, annotation_pairs)
+
+ is_thing = _is_thing_array(gt_json['categories'], ignored_label)
+ metric_aggregator.print_detailed_results(
+ is_thing=is_thing, print_digits=print_digits)
+ return metric_aggregator.detailed_results(is_thing=is_thing)
+
+
+def main(argv):
+ if len(argv) > 1:
+ raise app.UsageError('Too many command-line arguments.')
+
+ eval_coco_format(FLAGS.gt_json_file, FLAGS.pred_json_file, FLAGS.gt_folder,
+ FLAGS.pred_folder, FLAGS.metric, FLAGS.num_categories,
+ FLAGS.ignored_label, FLAGS.max_instances_per_category,
+ FLAGS.intersection_offset, FLAGS.normalize_by_image_size,
+ FLAGS.num_workers, FLAGS.print_digits)
+
+
+if __name__ == '__main__':
+ flags.mark_flags_as_required(
+ ['gt_json_file', 'gt_folder', 'pred_json_file', 'pred_folder'])
+ app.run(main)
diff --git a/deeplab/models/research/deeplab/evaluation/eval_coco_format_test.py b/deeplab/models/research/deeplab/evaluation/eval_coco_format_test.py
new file mode 100644
index 0000000..d9093ff
--- /dev/null
+++ b/deeplab/models/research/deeplab/evaluation/eval_coco_format_test.py
@@ -0,0 +1,140 @@
+# Lint as: python2, python3
+# Copyright 2019 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Tests for eval_coco_format script."""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import os
+
+from absl import flags
+from absl.testing import absltest
+import evaluation as panopticapi_eval
+
+from deeplab.evaluation import eval_coco_format
+
+_TEST_DIR = 'deeplab/evaluation/testdata'
+
+FLAGS = flags.FLAGS
+
+
+class EvalCocoFormatTest(absltest.TestCase):
+
+ def test_compare_pq_with_reference_eval(self):
+ sample_data_dir = os.path.join(_TEST_DIR)
+ gt_json_file = os.path.join(sample_data_dir, 'coco_gt.json')
+ gt_folder = os.path.join(sample_data_dir, 'coco_gt')
+ pred_json_file = os.path.join(sample_data_dir, 'coco_pred.json')
+ pred_folder = os.path.join(sample_data_dir, 'coco_pred')
+
+ panopticapi_results = panopticapi_eval.pq_compute(
+ gt_json_file, pred_json_file, gt_folder, pred_folder)
+ deeplab_results = eval_coco_format.eval_coco_format(
+ gt_json_file,
+ pred_json_file,
+ gt_folder,
+ pred_folder,
+ metric='pq',
+ num_categories=7,
+ ignored_label=0,
+ max_instances_per_category=256,
+ intersection_offset=(256 * 256))
+ self.assertCountEqual(
+ list(deeplab_results.keys()), ['All', 'Things', 'Stuff'])
+ for cat_group in ['All', 'Things', 'Stuff']:
+ self.assertCountEqual(deeplab_results[cat_group], ['pq', 'sq', 'rq', 'n'])
+ for metric in ['pq', 'sq', 'rq', 'n']:
+ self.assertAlmostEqual(deeplab_results[cat_group][metric],
+ panopticapi_results[cat_group][metric])
+
+ def test_compare_pc_with_golden_value(self):
+ sample_data_dir = os.path.join(_TEST_DIR)
+ gt_json_file = os.path.join(sample_data_dir, 'coco_gt.json')
+ gt_folder = os.path.join(sample_data_dir, 'coco_gt')
+ pred_json_file = os.path.join(sample_data_dir, 'coco_pred.json')
+ pred_folder = os.path.join(sample_data_dir, 'coco_pred')
+
+ deeplab_results = eval_coco_format.eval_coco_format(
+ gt_json_file,
+ pred_json_file,
+ gt_folder,
+ pred_folder,
+ metric='pc',
+ num_categories=7,
+ ignored_label=0,
+ max_instances_per_category=256,
+ intersection_offset=(256 * 256),
+ normalize_by_image_size=False)
+ self.assertCountEqual(
+ list(deeplab_results.keys()), ['All', 'Things', 'Stuff'])
+ for cat_group in ['All', 'Things', 'Stuff']:
+ self.assertCountEqual(deeplab_results[cat_group], ['pc', 'n'])
+ self.assertAlmostEqual(deeplab_results['All']['pc'], 0.68210561)
+ self.assertEqual(deeplab_results['All']['n'], 6)
+ self.assertAlmostEqual(deeplab_results['Things']['pc'], 0.5890529)
+ self.assertEqual(deeplab_results['Things']['n'], 4)
+ self.assertAlmostEqual(deeplab_results['Stuff']['pc'], 0.86821097)
+ self.assertEqual(deeplab_results['Stuff']['n'], 2)
+
+ def test_compare_pc_with_golden_value_normalize_by_size(self):
+ sample_data_dir = os.path.join(_TEST_DIR)
+ gt_json_file = os.path.join(sample_data_dir, 'coco_gt.json')
+ gt_folder = os.path.join(sample_data_dir, 'coco_gt')
+ pred_json_file = os.path.join(sample_data_dir, 'coco_pred.json')
+ pred_folder = os.path.join(sample_data_dir, 'coco_pred')
+
+ deeplab_results = eval_coco_format.eval_coco_format(
+ gt_json_file,
+ pred_json_file,
+ gt_folder,
+ pred_folder,
+ metric='pc',
+ num_categories=7,
+ ignored_label=0,
+ max_instances_per_category=256,
+ intersection_offset=(256 * 256),
+ normalize_by_image_size=True)
+ self.assertCountEqual(
+ list(deeplab_results.keys()), ['All', 'Things', 'Stuff'])
+ self.assertAlmostEqual(deeplab_results['All']['pc'], 0.68214908840)
+
+ def test_pc_with_multiple_workers(self):
+ sample_data_dir = os.path.join(_TEST_DIR)
+ gt_json_file = os.path.join(sample_data_dir, 'coco_gt.json')
+ gt_folder = os.path.join(sample_data_dir, 'coco_gt')
+ pred_json_file = os.path.join(sample_data_dir, 'coco_pred.json')
+ pred_folder = os.path.join(sample_data_dir, 'coco_pred')
+
+ deeplab_results = eval_coco_format.eval_coco_format(
+ gt_json_file,
+ pred_json_file,
+ gt_folder,
+ pred_folder,
+ metric='pc',
+ num_categories=7,
+ ignored_label=0,
+ max_instances_per_category=256,
+ intersection_offset=(256 * 256),
+ num_workers=3,
+ normalize_by_image_size=False)
+ self.assertCountEqual(
+ list(deeplab_results.keys()), ['All', 'Things', 'Stuff'])
+ self.assertAlmostEqual(deeplab_results['All']['pc'], 0.68210561668)
+
+
+if __name__ == '__main__':
+ absltest.main()
diff --git a/deeplab/models/research/deeplab/evaluation/g3doc/img/equation_pc.png b/deeplab/models/research/deeplab/evaluation/g3doc/img/equation_pc.png
new file mode 100644
index 0000000..90f15e7
Binary files /dev/null and b/deeplab/models/research/deeplab/evaluation/g3doc/img/equation_pc.png differ
diff --git a/deeplab/models/research/deeplab/evaluation/g3doc/img/equation_pq.png b/deeplab/models/research/deeplab/evaluation/g3doc/img/equation_pq.png
new file mode 100644
index 0000000..13a4393
Binary files /dev/null and b/deeplab/models/research/deeplab/evaluation/g3doc/img/equation_pq.png differ
diff --git a/deeplab/models/research/deeplab/evaluation/panoptic_quality.py b/deeplab/models/research/deeplab/evaluation/panoptic_quality.py
new file mode 100644
index 0000000..f7d0f3f
--- /dev/null
+++ b/deeplab/models/research/deeplab/evaluation/panoptic_quality.py
@@ -0,0 +1,259 @@
+# Lint as: python2, python3
+# Copyright 2019 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Implementation of the Panoptic Quality metric.
+
+Panoptic Quality is an instance-based metric for evaluating the task of
+image parsing, aka panoptic segmentation.
+
+Please see the paper for details:
+"Panoptic Segmentation", Alexander Kirillov, Kaiming He, Ross Girshick,
+Carsten Rother and Piotr Dollar. arXiv:1801.00868, 2018.
+"""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import collections
+import numpy as np
+import prettytable
+import six
+
+from deeplab.evaluation import base_metric
+
+
+def _ids_to_counts(id_array):
+ """Given a numpy array, a mapping from each unique entry to its count."""
+ ids, counts = np.unique(id_array, return_counts=True)
+ return dict(six.moves.zip(ids, counts))
+
+
+class PanopticQuality(base_metric.SegmentationMetric):
+ """Metric class for Panoptic Quality.
+
+ "Panoptic Segmentation" by Alexander Kirillov, Kaiming He, Ross Girshick,
+ Carsten Rother, Piotr Dollar.
+ https://arxiv.org/abs/1801.00868
+ """
+
+ def compare_and_accumulate(
+ self, groundtruth_category_array, groundtruth_instance_array,
+ predicted_category_array, predicted_instance_array):
+ """See base class."""
+ # First, combine the category and instance labels so that every unique
+ # value for (category, instance) is assigned a unique integer label.
+ pred_segment_id = self._naively_combine_labels(predicted_category_array,
+ predicted_instance_array)
+ gt_segment_id = self._naively_combine_labels(groundtruth_category_array,
+ groundtruth_instance_array)
+
+ # Pre-calculate areas for all groundtruth and predicted segments.
+ gt_segment_areas = _ids_to_counts(gt_segment_id)
+ pred_segment_areas = _ids_to_counts(pred_segment_id)
+
+ # We assume there is only one void segment and it has instance id = 0.
+ void_segment_id = self.ignored_label * self.max_instances_per_category
+
+ # There may be other ignored groundtruth segments with instance id > 0, find
+ # those ids using the unique segment ids extracted with the area computation
+ # above.
+ ignored_segment_ids = {
+ gt_segment_id for gt_segment_id in six.iterkeys(gt_segment_areas)
+ if (gt_segment_id //
+ self.max_instances_per_category) == self.ignored_label
+ }
+
+ # Next, combine the groundtruth and predicted labels. Dividing up the pixels
+ # based on which groundtruth segment and which predicted segment they belong
+ # to, this will assign a different 32-bit integer label to each choice
+ # of (groundtruth segment, predicted segment), encoded as
+ # gt_segment_id * offset + pred_segment_id.
+ intersection_id_array = (
+ gt_segment_id.astype(np.uint32) * self.offset +
+ pred_segment_id.astype(np.uint32))
+
+ # For every combination of (groundtruth segment, predicted segment) with a
+ # non-empty intersection, this counts the number of pixels in that
+ # intersection.
+ intersection_areas = _ids_to_counts(intersection_id_array)
+
+ # Helper function that computes the area of the overlap between a predicted
+ # segment and the ground-truth void/ignored segment.
+ def prediction_void_overlap(pred_segment_id):
+ void_intersection_id = void_segment_id * self.offset + pred_segment_id
+ return intersection_areas.get(void_intersection_id, 0)
+
+ # Compute overall ignored overlap.
+ def prediction_ignored_overlap(pred_segment_id):
+ total_ignored_overlap = 0
+ for ignored_segment_id in ignored_segment_ids:
+ intersection_id = ignored_segment_id * self.offset + pred_segment_id
+ total_ignored_overlap += intersection_areas.get(intersection_id, 0)
+ return total_ignored_overlap
+
+ # Sets that are populated with which segments groundtruth/predicted segments
+ # have been matched with overlapping predicted/groundtruth segments
+ # respectively.
+ gt_matched = set()
+ pred_matched = set()
+
+ # Calculate IoU per pair of intersecting segments of the same category.
+ for intersection_id, intersection_area in six.iteritems(intersection_areas):
+ gt_segment_id = intersection_id // self.offset
+ pred_segment_id = intersection_id % self.offset
+
+ gt_category = gt_segment_id // self.max_instances_per_category
+ pred_category = pred_segment_id // self.max_instances_per_category
+ if gt_category != pred_category:
+ continue
+
+ # Union between the groundtruth and predicted segments being compared does
+ # not include the portion of the predicted segment that consists of
+ # groundtruth "void" pixels.
+ union = (
+ gt_segment_areas[gt_segment_id] +
+ pred_segment_areas[pred_segment_id] - intersection_area -
+ prediction_void_overlap(pred_segment_id))
+ iou = intersection_area / union
+ if iou > 0.5:
+ self.tp_per_class[gt_category] += 1
+ self.iou_per_class[gt_category] += iou
+ gt_matched.add(gt_segment_id)
+ pred_matched.add(pred_segment_id)
+
+ # Count false negatives for each category.
+ for gt_segment_id in six.iterkeys(gt_segment_areas):
+ if gt_segment_id in gt_matched:
+ continue
+ category = gt_segment_id // self.max_instances_per_category
+ # Failing to detect a void segment is not a false negative.
+ if category == self.ignored_label:
+ continue
+ self.fn_per_class[category] += 1
+
+ # Count false positives for each category.
+ for pred_segment_id in six.iterkeys(pred_segment_areas):
+ if pred_segment_id in pred_matched:
+ continue
+ # A false positive is not penalized if is mostly ignored in the
+ # groundtruth.
+ if (prediction_ignored_overlap(pred_segment_id) /
+ pred_segment_areas[pred_segment_id]) > 0.5:
+ continue
+ category = pred_segment_id // self.max_instances_per_category
+ self.fp_per_class[category] += 1
+
+ return self.result()
+
+ def _valid_categories(self):
+ """Categories with a "valid" value for the metric, have > 0 instances.
+
+ We will ignore the `ignore_label` class and other classes which have
+ `tp + fn + fp = 0`.
+
+ Returns:
+ Boolean array of shape `[num_categories]`.
+ """
+ valid_categories = np.not_equal(
+ self.tp_per_class + self.fn_per_class + self.fp_per_class, 0)
+ if self.ignored_label >= 0 and self.ignored_label < self.num_categories:
+ valid_categories[self.ignored_label] = False
+ return valid_categories
+
+ def detailed_results(self, is_thing=None):
+ """See base class."""
+ valid_categories = self._valid_categories()
+
+ # If known, break down which categories are valid _and_ things/stuff.
+ category_sets = collections.OrderedDict()
+ category_sets['All'] = valid_categories
+ if is_thing is not None:
+ category_sets['Things'] = np.logical_and(valid_categories, is_thing)
+ category_sets['Stuff'] = np.logical_and(valid_categories,
+ np.logical_not(is_thing))
+
+ # Compute individual per-class metrics that constitute factors of PQ.
+ sq = base_metric.realdiv_maybe_zero(self.iou_per_class, self.tp_per_class)
+ rq = base_metric.realdiv_maybe_zero(
+ self.tp_per_class,
+ self.tp_per_class + 0.5 * self.fn_per_class + 0.5 * self.fp_per_class)
+ pq = np.multiply(sq, rq)
+
+ # Assemble detailed results dictionary.
+ results = {}
+ for category_set_name, in_category_set in six.iteritems(category_sets):
+ if np.any(in_category_set):
+ results[category_set_name] = {
+ 'pq': np.mean(pq[in_category_set]),
+ 'sq': np.mean(sq[in_category_set]),
+ 'rq': np.mean(rq[in_category_set]),
+ # The number of categories in this subset.
+ 'n': np.sum(in_category_set.astype(np.int32)),
+ }
+ else:
+ results[category_set_name] = {'pq': 0, 'sq': 0, 'rq': 0, 'n': 0}
+
+ return results
+
+ def result_per_category(self):
+ """See base class."""
+ sq = base_metric.realdiv_maybe_zero(self.iou_per_class, self.tp_per_class)
+ rq = base_metric.realdiv_maybe_zero(
+ self.tp_per_class,
+ self.tp_per_class + 0.5 * self.fn_per_class + 0.5 * self.fp_per_class)
+ return np.multiply(sq, rq)
+
+ def print_detailed_results(self, is_thing=None, print_digits=3):
+ """See base class."""
+ results = self.detailed_results(is_thing=is_thing)
+
+ tab = prettytable.PrettyTable()
+
+ tab.add_column('', [], align='l')
+ for fieldname in ['PQ', 'SQ', 'RQ', 'N']:
+ tab.add_column(fieldname, [], align='r')
+
+ for category_set, subset_results in six.iteritems(results):
+ data_cols = [
+ round(subset_results[col_key], print_digits) * 100
+ for col_key in ['pq', 'sq', 'rq']
+ ]
+ data_cols += [subset_results['n']]
+ tab.add_row([category_set] + data_cols)
+
+ print(tab)
+
+ def result(self):
+ """See base class."""
+ pq_per_class = self.result_per_category()
+ valid_categories = self._valid_categories()
+ if not np.any(valid_categories):
+ return 0.
+ return np.mean(pq_per_class[valid_categories])
+
+ def merge(self, other_instance):
+ """See base class."""
+ self.iou_per_class += other_instance.iou_per_class
+ self.tp_per_class += other_instance.tp_per_class
+ self.fn_per_class += other_instance.fn_per_class
+ self.fp_per_class += other_instance.fp_per_class
+
+ def reset(self):
+ """See base class."""
+ self.iou_per_class = np.zeros(self.num_categories, dtype=np.float64)
+ self.tp_per_class = np.zeros(self.num_categories, dtype=np.float64)
+ self.fn_per_class = np.zeros(self.num_categories, dtype=np.float64)
+ self.fp_per_class = np.zeros(self.num_categories, dtype=np.float64)
diff --git a/deeplab/models/research/deeplab/evaluation/panoptic_quality_test.py b/deeplab/models/research/deeplab/evaluation/panoptic_quality_test.py
new file mode 100644
index 0000000..00c88c2
--- /dev/null
+++ b/deeplab/models/research/deeplab/evaluation/panoptic_quality_test.py
@@ -0,0 +1,336 @@
+# Lint as: python2, python3
+# Copyright 2019 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Tests for Panoptic Quality metric."""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+
+from absl.testing import absltest
+import numpy as np
+import six
+
+from deeplab.evaluation import panoptic_quality
+from deeplab.evaluation import test_utils
+
+# See the definition of the color names at:
+# https://en.wikipedia.org/wiki/Web_colors.
+_CLASS_COLOR_MAP = {
+ (0, 0, 0): 0,
+ (0, 0, 255): 1, # Person (blue).
+ (255, 0, 0): 2, # Bear (red).
+ (0, 255, 0): 3, # Tree (lime).
+ (255, 0, 255): 4, # Bird (fuchsia).
+ (0, 255, 255): 5, # Sky (aqua).
+ (255, 255, 0): 6, # Cat (yellow).
+}
+
+
+class PanopticQualityTest(absltest.TestCase):
+
+ def test_perfect_match(self):
+ categories = np.zeros([6, 6], np.uint16)
+ instances = np.array([
+ [1, 1, 1, 1, 1, 1],
+ [1, 2, 2, 2, 2, 1],
+ [1, 2, 2, 2, 2, 1],
+ [1, 2, 2, 2, 2, 1],
+ [1, 2, 2, 1, 1, 1],
+ [1, 2, 1, 1, 1, 1],
+ ],
+ dtype=np.uint16)
+
+ pq = panoptic_quality.PanopticQuality(
+ num_categories=1,
+ ignored_label=2,
+ max_instances_per_category=16,
+ offset=16)
+ pq.compare_and_accumulate(categories, instances, categories, instances)
+ np.testing.assert_array_equal(pq.iou_per_class, [2.0])
+ np.testing.assert_array_equal(pq.tp_per_class, [2])
+ np.testing.assert_array_equal(pq.fn_per_class, [0])
+ np.testing.assert_array_equal(pq.fp_per_class, [0])
+ np.testing.assert_array_equal(pq.result_per_category(), [1.0])
+ self.assertEqual(pq.result(), 1.0)
+
+ def test_totally_wrong(self):
+ det_categories = np.array([
+ [0, 0, 0, 0, 0, 0],
+ [0, 1, 0, 0, 1, 0],
+ [0, 1, 1, 1, 1, 0],
+ [0, 1, 1, 1, 1, 0],
+ [0, 0, 0, 0, 0, 0],
+ [0, 0, 0, 0, 0, 0],
+ ],
+ dtype=np.uint16)
+ gt_categories = 1 - det_categories
+ instances = np.zeros([6, 6], np.uint16)
+
+ pq = panoptic_quality.PanopticQuality(
+ num_categories=2,
+ ignored_label=2,
+ max_instances_per_category=1,
+ offset=16)
+ pq.compare_and_accumulate(gt_categories, instances, det_categories,
+ instances)
+ np.testing.assert_array_equal(pq.iou_per_class, [0.0, 0.0])
+ np.testing.assert_array_equal(pq.tp_per_class, [0, 0])
+ np.testing.assert_array_equal(pq.fn_per_class, [1, 1])
+ np.testing.assert_array_equal(pq.fp_per_class, [1, 1])
+ np.testing.assert_array_equal(pq.result_per_category(), [0.0, 0.0])
+ self.assertEqual(pq.result(), 0.0)
+
+ def test_matches_by_iou(self):
+ good_det_labels = np.array(
+ [
+ [1, 1, 1, 1, 1, 1],
+ [1, 1, 1, 1, 1, 1],
+ [1, 2, 2, 2, 2, 1],
+ [1, 2, 2, 2, 1, 1],
+ [1, 1, 1, 1, 1, 1],
+ [1, 1, 1, 1, 1, 1],
+ ],
+ dtype=np.uint16)
+ gt_labels = np.array(
+ [
+ [1, 1, 1, 1, 1, 1],
+ [1, 1, 1, 1, 1, 1],
+ [1, 1, 2, 2, 2, 1],
+ [1, 2, 2, 2, 2, 1],
+ [1, 1, 1, 1, 1, 1],
+ [1, 1, 1, 1, 1, 1],
+ ],
+ dtype=np.uint16)
+
+ pq = panoptic_quality.PanopticQuality(
+ num_categories=1,
+ ignored_label=2,
+ max_instances_per_category=16,
+ offset=16)
+ pq.compare_and_accumulate(
+ np.zeros_like(gt_labels), gt_labels, np.zeros_like(good_det_labels),
+ good_det_labels)
+
+ # iou(1, 1) = 28/30
+ # iou(2, 2) = 6/8
+ np.testing.assert_array_almost_equal(pq.iou_per_class, [28 / 30 + 6 / 8])
+ np.testing.assert_array_equal(pq.tp_per_class, [2])
+ np.testing.assert_array_equal(pq.fn_per_class, [0])
+ np.testing.assert_array_equal(pq.fp_per_class, [0])
+ self.assertAlmostEqual(pq.result(), (28 / 30 + 6 / 8) / 2)
+
+ bad_det_labels = np.array(
+ [
+ [1, 1, 1, 1, 1, 1],
+ [1, 1, 1, 1, 1, 1],
+ [1, 1, 1, 2, 2, 1],
+ [1, 1, 1, 2, 2, 1],
+ [1, 1, 1, 2, 2, 1],
+ [1, 1, 1, 1, 1, 1],
+ ],
+ dtype=np.uint16)
+
+ pq.reset()
+ pq.compare_and_accumulate(
+ np.zeros_like(gt_labels), gt_labels, np.zeros_like(bad_det_labels),
+ bad_det_labels)
+
+ # iou(1, 1) = 27/32
+ np.testing.assert_array_almost_equal(pq.iou_per_class, [27 / 32])
+ np.testing.assert_array_equal(pq.tp_per_class, [1])
+ np.testing.assert_array_equal(pq.fn_per_class, [1])
+ np.testing.assert_array_equal(pq.fp_per_class, [1])
+ self.assertAlmostEqual(pq.result(), (27 / 32) * (1 / 2))
+
+ def test_wrong_instances(self):
+ categories = np.array([
+ [1, 1, 1, 1, 1, 1],
+ [1, 1, 1, 1, 1, 1],
+ [1, 2, 2, 1, 2, 2],
+ [1, 2, 2, 1, 2, 2],
+ [1, 1, 1, 1, 1, 1],
+ [1, 1, 1, 1, 1, 1],
+ ],
+ dtype=np.uint16)
+ predicted_instances = np.array([
+ [0, 0, 0, 0, 0, 0],
+ [0, 0, 0, 0, 0, 0],
+ [0, 0, 0, 0, 1, 1],
+ [0, 0, 0, 0, 1, 1],
+ [0, 0, 0, 0, 0, 0],
+ [0, 0, 0, 0, 0, 0],
+ ],
+ dtype=np.uint16)
+ groundtruth_instances = np.zeros([6, 6], dtype=np.uint16)
+
+ pq = panoptic_quality.PanopticQuality(
+ num_categories=3,
+ ignored_label=0,
+ max_instances_per_category=10,
+ offset=100)
+ pq.compare_and_accumulate(categories, groundtruth_instances, categories,
+ predicted_instances)
+
+ np.testing.assert_array_equal(pq.iou_per_class, [0.0, 1.0, 0.0])
+ np.testing.assert_array_equal(pq.tp_per_class, [0, 1, 0])
+ np.testing.assert_array_equal(pq.fn_per_class, [0, 0, 1])
+ np.testing.assert_array_equal(pq.fp_per_class, [0, 0, 2])
+ np.testing.assert_array_equal(pq.result_per_category(), [0, 1, 0])
+ self.assertAlmostEqual(pq.result(), 0.5)
+
+ def test_instance_order_is_arbitrary(self):
+ categories = np.array([
+ [1, 1, 1, 1, 1, 1],
+ [1, 1, 1, 1, 1, 1],
+ [1, 2, 2, 1, 2, 2],
+ [1, 2, 2, 1, 2, 2],
+ [1, 1, 1, 1, 1, 1],
+ [1, 1, 1, 1, 1, 1],
+ ],
+ dtype=np.uint16)
+ predicted_instances = np.array([
+ [0, 0, 0, 0, 0, 0],
+ [0, 0, 0, 0, 0, 0],
+ [0, 0, 0, 0, 1, 1],
+ [0, 0, 0, 0, 1, 1],
+ [0, 0, 0, 0, 0, 0],
+ [0, 0, 0, 0, 0, 0],
+ ],
+ dtype=np.uint16)
+ groundtruth_instances = np.array([
+ [0, 0, 0, 0, 0, 0],
+ [0, 0, 0, 0, 0, 0],
+ [0, 1, 1, 0, 0, 0],
+ [0, 1, 1, 0, 0, 0],
+ [0, 0, 0, 0, 0, 0],
+ [0, 0, 0, 0, 0, 0],
+ ],
+ dtype=np.uint16)
+
+ pq = panoptic_quality.PanopticQuality(
+ num_categories=3,
+ ignored_label=0,
+ max_instances_per_category=10,
+ offset=100)
+ pq.compare_and_accumulate(categories, groundtruth_instances, categories,
+ predicted_instances)
+
+ np.testing.assert_array_equal(pq.iou_per_class, [0.0, 1.0, 2.0])
+ np.testing.assert_array_equal(pq.tp_per_class, [0, 1, 2])
+ np.testing.assert_array_equal(pq.fn_per_class, [0, 0, 0])
+ np.testing.assert_array_equal(pq.fp_per_class, [0, 0, 0])
+ np.testing.assert_array_equal(pq.result_per_category(), [0, 1, 1])
+ self.assertAlmostEqual(pq.result(), 1.0)
+
+ def test_matches_expected(self):
+ pred_classes = test_utils.read_segmentation_with_rgb_color_map(
+ 'team_pred_class.png', _CLASS_COLOR_MAP)
+ pred_instances = test_utils.read_test_image(
+ 'team_pred_instance.png', mode='L')
+
+ instance_class_map = {
+ 0: 0,
+ 47: 1,
+ 97: 1,
+ 133: 1,
+ 150: 1,
+ 174: 1,
+ 198: 2,
+ 215: 1,
+ 244: 1,
+ 255: 1,
+ }
+ gt_instances, gt_classes = test_utils.panoptic_segmentation_with_class_map(
+ 'team_gt_instance.png', instance_class_map)
+
+ pq = panoptic_quality.PanopticQuality(
+ num_categories=3,
+ ignored_label=0,
+ max_instances_per_category=256,
+ offset=256 * 256)
+ pq.compare_and_accumulate(gt_classes, gt_instances, pred_classes,
+ pred_instances)
+ np.testing.assert_array_almost_equal(
+ pq.iou_per_class, [2.06104, 5.26827, 0.54069], decimal=4)
+ np.testing.assert_array_equal(pq.tp_per_class, [1, 7, 1])
+ np.testing.assert_array_equal(pq.fn_per_class, [0, 1, 0])
+ np.testing.assert_array_equal(pq.fp_per_class, [0, 0, 0])
+ np.testing.assert_array_almost_equal(pq.result_per_category(),
+ [2.061038, 0.702436, 0.54069])
+ self.assertAlmostEqual(pq.result(), 0.62156287)
+
+ def test_merge_accumulates_all_across_instances(self):
+ categories = np.zeros([6, 6], np.uint16)
+ good_det_labels = np.array([
+ [1, 1, 1, 1, 1, 1],
+ [1, 1, 1, 1, 1, 1],
+ [1, 2, 2, 2, 2, 1],
+ [1, 2, 2, 2, 1, 1],
+ [1, 1, 1, 1, 1, 1],
+ [1, 1, 1, 1, 1, 1],
+ ],
+ dtype=np.uint16)
+ gt_labels = np.array([
+ [1, 1, 1, 1, 1, 1],
+ [1, 1, 1, 1, 1, 1],
+ [1, 1, 2, 2, 2, 1],
+ [1, 2, 2, 2, 2, 1],
+ [1, 1, 1, 1, 1, 1],
+ [1, 1, 1, 1, 1, 1],
+ ],
+ dtype=np.uint16)
+
+ good_pq = panoptic_quality.PanopticQuality(
+ num_categories=1,
+ ignored_label=2,
+ max_instances_per_category=16,
+ offset=16)
+ for _ in six.moves.range(2):
+ good_pq.compare_and_accumulate(categories, gt_labels, categories,
+ good_det_labels)
+
+ bad_det_labels = np.array([
+ [1, 1, 1, 1, 1, 1],
+ [1, 1, 1, 1, 1, 1],
+ [1, 1, 1, 2, 2, 1],
+ [1, 1, 1, 2, 2, 1],
+ [1, 1, 1, 2, 2, 1],
+ [1, 1, 1, 1, 1, 1],
+ ],
+ dtype=np.uint16)
+
+ bad_pq = panoptic_quality.PanopticQuality(
+ num_categories=1,
+ ignored_label=2,
+ max_instances_per_category=16,
+ offset=16)
+ for _ in six.moves.range(2):
+ bad_pq.compare_and_accumulate(categories, gt_labels, categories,
+ bad_det_labels)
+
+ good_pq.merge(bad_pq)
+
+ np.testing.assert_array_almost_equal(
+ good_pq.iou_per_class, [2 * (28 / 30 + 6 / 8) + 2 * (27 / 32)])
+ np.testing.assert_array_equal(good_pq.tp_per_class, [2 * 2 + 2])
+ np.testing.assert_array_equal(good_pq.fn_per_class, [2])
+ np.testing.assert_array_equal(good_pq.fp_per_class, [2])
+ self.assertAlmostEqual(good_pq.result(), 0.63177083)
+
+
+if __name__ == '__main__':
+ absltest.main()
diff --git a/deeplab/models/research/deeplab/evaluation/parsing_covering.py b/deeplab/models/research/deeplab/evaluation/parsing_covering.py
new file mode 100644
index 0000000..a40e55f
--- /dev/null
+++ b/deeplab/models/research/deeplab/evaluation/parsing_covering.py
@@ -0,0 +1,246 @@
+# Lint as: python2, python3
+# Copyright 2019 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Implementation of the Parsing Covering metric.
+
+Parsing Covering is a region-based metric for evaluating the task of
+image parsing, aka panoptic segmentation.
+
+Please see the paper for details:
+"DeeperLab: Single-Shot Image Parser", Tien-Ju Yang, Maxwell D. Collins,
+Yukun Zhu, Jyh-Jing Hwang, Ting Liu, Xiao Zhang, Vivienne Sze,
+George Papandreou, Liang-Chieh Chen. arXiv: 1902.05093, 2019.
+"""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import collections
+
+import numpy as np
+import prettytable
+import six
+
+from deeplab.evaluation import base_metric
+
+
+class ParsingCovering(base_metric.SegmentationMetric):
+ r"""Metric class for Parsing Covering.
+
+ Computes segmentation covering metric introduced in (Arbelaez, et al., 2010)
+ with extension to handle multi-class semantic labels (a.k.a. parsing
+ covering). Specifically, segmentation covering (SC) is defined in Eq. (8) in
+ (Arbelaez et al., 2010) as:
+
+ SC(c) = \sum_{R\in S}(|R| * \max_{R'\in S'}O(R,R')) / \sum_{R\in S}|R|,
+
+ where S are the groundtruth instance regions and S' are the predicted
+ instance regions. The parsing covering is simply:
+
+ PC = \sum_{c=1}^{C}SC(c) / C,
+
+ where C is the number of classes.
+ """
+
+ def __init__(self,
+ num_categories,
+ ignored_label,
+ max_instances_per_category,
+ offset,
+ normalize_by_image_size=True):
+ """Initialization for ParsingCovering.
+
+ Args:
+ num_categories: The number of segmentation categories (or "classes" in the
+ dataset.
+ ignored_label: A category id that is ignored in evaluation, e.g. the void
+ label as defined in COCO panoptic segmentation dataset.
+ max_instances_per_category: The maximum number of instances for each
+ category. Used in ensuring unique instance labels.
+ offset: The maximum number of unique labels. This is used, by multiplying
+ the ground-truth labels, to generate unique ids for individual regions
+ of overlap between groundtruth and predicted segments.
+ normalize_by_image_size: Whether to normalize groundtruth instance region
+ areas by image size. If True, groundtruth instance areas and weighted
+ IoUs will be divided by the size of the corresponding image before
+ accumulated across the dataset.
+ """
+ super(ParsingCovering, self).__init__(num_categories, ignored_label,
+ max_instances_per_category, offset)
+ self.normalize_by_image_size = normalize_by_image_size
+
+ def compare_and_accumulate(
+ self, groundtruth_category_array, groundtruth_instance_array,
+ predicted_category_array, predicted_instance_array):
+ """See base class."""
+ # Allocate intermediate data structures.
+ max_ious = np.zeros([self.num_categories, self.max_instances_per_category],
+ dtype=np.float64)
+ gt_areas = np.zeros([self.num_categories, self.max_instances_per_category],
+ dtype=np.float64)
+ pred_areas = np.zeros(
+ [self.num_categories, self.max_instances_per_category],
+ dtype=np.float64)
+ # This is a dictionary in the format:
+ # {(category, gt_instance): [(pred_instance, intersection_area)]}.
+ intersections = collections.defaultdict(list)
+
+ # First, combine the category and instance labels so that every unique
+ # value for (category, instance) is assigned a unique integer label.
+ pred_segment_id = self._naively_combine_labels(predicted_category_array,
+ predicted_instance_array)
+ gt_segment_id = self._naively_combine_labels(groundtruth_category_array,
+ groundtruth_instance_array)
+
+ # Next, combine the groundtruth and predicted labels. Dividing up the pixels
+ # based on which groundtruth segment and which predicted segment they belong
+ # to, this will assign a different 32-bit integer label to each choice
+ # of (groundtruth segment, predicted segment), encoded as
+ # gt_segment_id * offset + pred_segment_id.
+ intersection_id_array = (
+ gt_segment_id.astype(np.uint32) * self.offset +
+ pred_segment_id.astype(np.uint32))
+
+ # For every combination of (groundtruth segment, predicted segment) with a
+ # non-empty intersection, this counts the number of pixels in that
+ # intersection.
+ intersection_ids, intersection_areas = np.unique(
+ intersection_id_array, return_counts=True)
+
+ # Find areas of all groundtruth and predicted instances, as well as of their
+ # intersections.
+ for intersection_id, intersection_area in six.moves.zip(
+ intersection_ids, intersection_areas):
+ gt_segment_id = intersection_id // self.offset
+ gt_category = gt_segment_id // self.max_instances_per_category
+ if gt_category == self.ignored_label:
+ continue
+ gt_instance = gt_segment_id % self.max_instances_per_category
+ gt_areas[gt_category, gt_instance] += intersection_area
+
+ pred_segment_id = intersection_id % self.offset
+ pred_category = pred_segment_id // self.max_instances_per_category
+ pred_instance = pred_segment_id % self.max_instances_per_category
+ pred_areas[pred_category, pred_instance] += intersection_area
+ if pred_category != gt_category:
+ continue
+
+ intersections[gt_category, gt_instance].append((pred_instance,
+ intersection_area))
+
+ # Find maximum IoU for every groundtruth instance.
+ for gt_label, instance_intersections in six.iteritems(intersections):
+ category, gt_instance = gt_label
+ gt_area = gt_areas[category, gt_instance]
+ ious = []
+ for pred_instance, intersection_area in instance_intersections:
+ pred_area = pred_areas[category, pred_instance]
+ union = gt_area + pred_area - intersection_area
+ ious.append(intersection_area / union)
+ max_ious[category, gt_instance] = max(ious)
+
+ # Normalize groundtruth instance areas by image size if necessary.
+ if self.normalize_by_image_size:
+ gt_areas /= groundtruth_category_array.size
+
+ # Compute per-class weighted IoUs and areas summed over all groundtruth
+ # instances.
+ self.weighted_iou_per_class += np.sum(max_ious * gt_areas, axis=-1)
+ self.gt_area_per_class += np.sum(gt_areas, axis=-1)
+
+ return self.result()
+
+ def result_per_category(self):
+ """See base class."""
+ return base_metric.realdiv_maybe_zero(self.weighted_iou_per_class,
+ self.gt_area_per_class)
+
+ def _valid_categories(self):
+ """Categories with a "valid" value for the metric, have > 0 instances.
+
+ We will ignore the `ignore_label` class and other classes which have
+ groundtruth area of 0.
+
+ Returns:
+ Boolean array of shape `[num_categories]`.
+ """
+ valid_categories = np.not_equal(self.gt_area_per_class, 0)
+ if self.ignored_label >= 0 and self.ignored_label < self.num_categories:
+ valid_categories[self.ignored_label] = False
+ return valid_categories
+
+ def detailed_results(self, is_thing=None):
+ """See base class."""
+ valid_categories = self._valid_categories()
+
+ # If known, break down which categories are valid _and_ things/stuff.
+ category_sets = collections.OrderedDict()
+ category_sets['All'] = valid_categories
+ if is_thing is not None:
+ category_sets['Things'] = np.logical_and(valid_categories, is_thing)
+ category_sets['Stuff'] = np.logical_and(valid_categories,
+ np.logical_not(is_thing))
+
+ covering_per_class = self.result_per_category()
+ results = {}
+ for category_set_name, in_category_set in six.iteritems(category_sets):
+ if np.any(in_category_set):
+ results[category_set_name] = {
+ 'pc': np.mean(covering_per_class[in_category_set]),
+ # The number of valid categories in this subset.
+ 'n': np.sum(in_category_set.astype(np.int32)),
+ }
+ else:
+ results[category_set_name] = {'pc': 0, 'n': 0}
+
+ return results
+
+ def print_detailed_results(self, is_thing=None, print_digits=3):
+ """See base class."""
+ results = self.detailed_results(is_thing=is_thing)
+
+ tab = prettytable.PrettyTable()
+
+ tab.add_column('', [], align='l')
+ for fieldname in ['PC', 'N']:
+ tab.add_column(fieldname, [], align='r')
+
+ for category_set, subset_results in six.iteritems(results):
+ data_cols = [
+ round(subset_results['pc'], print_digits) * 100, subset_results['n']
+ ]
+ tab.add_row([category_set] + data_cols)
+
+ print(tab)
+
+ def result(self):
+ """See base class."""
+ covering_per_class = self.result_per_category()
+ valid_categories = self._valid_categories()
+ if not np.any(valid_categories):
+ return 0.
+ return np.mean(covering_per_class[valid_categories])
+
+ def merge(self, other_instance):
+ """See base class."""
+ self.weighted_iou_per_class += other_instance.weighted_iou_per_class
+ self.gt_area_per_class += other_instance.gt_area_per_class
+
+ def reset(self):
+ """See base class."""
+ self.weighted_iou_per_class = np.zeros(
+ self.num_categories, dtype=np.float64)
+ self.gt_area_per_class = np.zeros(self.num_categories, dtype=np.float64)
diff --git a/deeplab/models/research/deeplab/evaluation/parsing_covering_test.py b/deeplab/models/research/deeplab/evaluation/parsing_covering_test.py
new file mode 100644
index 0000000..124d1b3
--- /dev/null
+++ b/deeplab/models/research/deeplab/evaluation/parsing_covering_test.py
@@ -0,0 +1,173 @@
+# Lint as: python2, python3
+# Copyright 2019 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Tests for Parsing Covering metric."""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+
+
+from absl.testing import absltest
+import numpy as np
+
+from deeplab.evaluation import parsing_covering
+from deeplab.evaluation import test_utils
+
+# See the definition of the color names at:
+# https://en.wikipedia.org/wiki/Web_colors.
+_CLASS_COLOR_MAP = {
+ (0, 0, 0): 0,
+ (0, 0, 255): 1, # Person (blue).
+ (255, 0, 0): 2, # Bear (red).
+ (0, 255, 0): 3, # Tree (lime).
+ (255, 0, 255): 4, # Bird (fuchsia).
+ (0, 255, 255): 5, # Sky (aqua).
+ (255, 255, 0): 6, # Cat (yellow).
+}
+
+
+class CoveringConveringTest(absltest.TestCase):
+
+ def test_perfect_match(self):
+ categories = np.zeros([6, 6], np.uint16)
+ instances = np.array([
+ [2, 2, 2, 2, 2, 2],
+ [2, 4, 4, 4, 4, 2],
+ [2, 4, 4, 4, 4, 2],
+ [2, 4, 4, 4, 4, 2],
+ [2, 4, 4, 2, 2, 2],
+ [2, 4, 2, 2, 2, 2],
+ ],
+ dtype=np.uint16)
+
+ pc = parsing_covering.ParsingCovering(
+ num_categories=3,
+ ignored_label=2,
+ max_instances_per_category=2,
+ offset=16,
+ normalize_by_image_size=False)
+ pc.compare_and_accumulate(categories, instances, categories, instances)
+ np.testing.assert_array_equal(pc.weighted_iou_per_class, [0.0, 21.0, 0.0])
+ np.testing.assert_array_equal(pc.gt_area_per_class, [0.0, 21.0, 0.0])
+ np.testing.assert_array_equal(pc.result_per_category(), [0.0, 1.0, 0.0])
+ self.assertEqual(pc.result(), 1.0)
+
+ def test_totally_wrong(self):
+ categories = np.zeros([6, 6], np.uint16)
+ gt_instances = np.array([
+ [0, 0, 0, 0, 0, 0],
+ [0, 1, 0, 0, 1, 0],
+ [0, 1, 1, 1, 1, 0],
+ [0, 1, 1, 1, 1, 0],
+ [0, 0, 0, 0, 0, 0],
+ [0, 0, 0, 0, 0, 0],
+ ],
+ dtype=np.uint16)
+ pred_instances = 1 - gt_instances
+
+ pc = parsing_covering.ParsingCovering(
+ num_categories=2,
+ ignored_label=0,
+ max_instances_per_category=1,
+ offset=16,
+ normalize_by_image_size=False)
+ pc.compare_and_accumulate(categories, gt_instances, categories,
+ pred_instances)
+ np.testing.assert_array_equal(pc.weighted_iou_per_class, [0.0, 0.0])
+ np.testing.assert_array_equal(pc.gt_area_per_class, [0.0, 10.0])
+ np.testing.assert_array_equal(pc.result_per_category(), [0.0, 0.0])
+ self.assertEqual(pc.result(), 0.0)
+
+ def test_matches_expected(self):
+ pred_classes = test_utils.read_segmentation_with_rgb_color_map(
+ 'team_pred_class.png', _CLASS_COLOR_MAP)
+ pred_instances = test_utils.read_test_image(
+ 'team_pred_instance.png', mode='L')
+
+ instance_class_map = {
+ 0: 0,
+ 47: 1,
+ 97: 1,
+ 133: 1,
+ 150: 1,
+ 174: 1,
+ 198: 2,
+ 215: 1,
+ 244: 1,
+ 255: 1,
+ }
+ gt_instances, gt_classes = test_utils.panoptic_segmentation_with_class_map(
+ 'team_gt_instance.png', instance_class_map)
+
+ pc = parsing_covering.ParsingCovering(
+ num_categories=3,
+ ignored_label=0,
+ max_instances_per_category=256,
+ offset=256 * 256,
+ normalize_by_image_size=False)
+ pc.compare_and_accumulate(gt_classes, gt_instances, pred_classes,
+ pred_instances)
+ np.testing.assert_array_almost_equal(
+ pc.weighted_iou_per_class, [0.0, 39864.14634, 3136], decimal=4)
+ np.testing.assert_array_equal(pc.gt_area_per_class, [0.0, 56870, 5800])
+ np.testing.assert_array_almost_equal(
+ pc.result_per_category(), [0.0, 0.70097, 0.54069], decimal=4)
+ self.assertAlmostEqual(pc.result(), 0.6208296732)
+
+ def test_matches_expected_normalize_by_size(self):
+ pred_classes = test_utils.read_segmentation_with_rgb_color_map(
+ 'team_pred_class.png', _CLASS_COLOR_MAP)
+ pred_instances = test_utils.read_test_image(
+ 'team_pred_instance.png', mode='L')
+
+ instance_class_map = {
+ 0: 0,
+ 47: 1,
+ 97: 1,
+ 133: 1,
+ 150: 1,
+ 174: 1,
+ 198: 2,
+ 215: 1,
+ 244: 1,
+ 255: 1,
+ }
+ gt_instances, gt_classes = test_utils.panoptic_segmentation_with_class_map(
+ 'team_gt_instance.png', instance_class_map)
+
+ pc = parsing_covering.ParsingCovering(
+ num_categories=3,
+ ignored_label=0,
+ max_instances_per_category=256,
+ offset=256 * 256,
+ normalize_by_image_size=True)
+ pc.compare_and_accumulate(gt_classes, gt_instances, pred_classes,
+ pred_instances)
+ np.testing.assert_array_almost_equal(
+ pc.weighted_iou_per_class, [0.0, 0.5002088756, 0.03935002196],
+ decimal=4)
+ np.testing.assert_array_almost_equal(
+ pc.gt_area_per_class, [0.0, 0.7135955832, 0.07277746408], decimal=4)
+ # Note that the per-category and overall PCs are identical to those without
+ # normalization in the previous test, because we only have a single image.
+ np.testing.assert_array_almost_equal(
+ pc.result_per_category(), [0.0, 0.70097, 0.54069], decimal=4)
+ self.assertAlmostEqual(pc.result(), 0.6208296732)
+
+
+if __name__ == '__main__':
+ absltest.main()
diff --git a/deeplab/models/research/deeplab/evaluation/streaming_metrics.py b/deeplab/models/research/deeplab/evaluation/streaming_metrics.py
new file mode 100644
index 0000000..8313792
--- /dev/null
+++ b/deeplab/models/research/deeplab/evaluation/streaming_metrics.py
@@ -0,0 +1,240 @@
+# Lint as: python2, python3
+# Copyright 2019 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Code to compute segmentation in a "streaming" pattern in Tensorflow.
+
+These aggregate the metric over examples of the evaluation set. Each example is
+assumed to be fed in in a stream, and the metric implementation accumulates
+across them.
+"""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import tensorflow as tf
+
+from deeplab.evaluation import panoptic_quality
+from deeplab.evaluation import parsing_covering
+
+_EPSILON = 1e-10
+
+
+def _realdiv_maybe_zero(x, y):
+ """Support tf.realdiv(x, y) where y may contain zeros."""
+ return tf.where(tf.less(y, _EPSILON), tf.zeros_like(x), tf.realdiv(x, y))
+
+
+def _running_total(value, shape, name=None):
+ """Maintains a running total of tensor `value` between calls."""
+ with tf.variable_scope(name, 'running_total', [value]):
+ total_var = tf.get_variable(
+ 'total',
+ shape,
+ value.dtype,
+ initializer=tf.zeros_initializer(),
+ trainable=False,
+ collections=[
+ tf.GraphKeys.LOCAL_VARIABLES, tf.GraphKeys.METRIC_VARIABLES
+ ])
+ updated_total = tf.assign_add(total_var, value, use_locking=True)
+
+ return total_var, updated_total
+
+
+def _panoptic_quality_helper(
+ groundtruth_category_array, groundtruth_instance_array,
+ predicted_category_array, predicted_instance_array, num_classes,
+ max_instances_per_category, ignored_label, offset):
+ """Helper function to compute panoptic quality."""
+ pq = panoptic_quality.PanopticQuality(num_classes, ignored_label,
+ max_instances_per_category, offset)
+ pq.compare_and_accumulate(groundtruth_category_array,
+ groundtruth_instance_array,
+ predicted_category_array, predicted_instance_array)
+ return pq.iou_per_class, pq.tp_per_class, pq.fn_per_class, pq.fp_per_class
+
+
+def streaming_panoptic_quality(groundtruth_categories,
+ groundtruth_instances,
+ predicted_categories,
+ predicted_instances,
+ num_classes,
+ max_instances_per_category,
+ ignored_label,
+ offset,
+ name=None):
+ """Aggregates the panoptic metric across calls with different input tensors.
+
+ See tf.metrics.* functions for comparable functionality and usage.
+
+ Args:
+ groundtruth_categories: A 2D uint16 tensor of groundtruth category labels.
+ groundtruth_instances: A 2D uint16 tensor of groundtruth instance labels.
+ predicted_categories: A 2D uint16 tensor of predicted category labels.
+ predicted_instances: A 2D uint16 tensor of predicted instance labels.
+ num_classes: Number of classes in the dataset as an integer.
+ max_instances_per_category: The maximum number of instances for each class
+ as an integer or integer tensor.
+ ignored_label: The class id to be ignored in evaluation as an integer or
+ integer tensor.
+ offset: The maximum number of unique labels as an integer or integer tensor.
+ name: An optional variable_scope name.
+
+ Returns:
+ qualities: A tensor of shape `[6, num_classes]`, where (1) panoptic quality,
+ (2) segmentation quality, (3) recognition quality, (4) total_tp,
+ (5) total_fn and (6) total_fp are saved in the respective rows.
+ update_ops: List of operations that update the running overall panoptic
+ quality.
+
+ Raises:
+ RuntimeError: If eager execution is enabled.
+ """
+ if tf.executing_eagerly():
+ raise RuntimeError('Cannot aggregate when eager execution is enabled.')
+
+ input_args = [
+ tf.convert_to_tensor(groundtruth_categories, tf.uint16),
+ tf.convert_to_tensor(groundtruth_instances, tf.uint16),
+ tf.convert_to_tensor(predicted_categories, tf.uint16),
+ tf.convert_to_tensor(predicted_instances, tf.uint16),
+ tf.convert_to_tensor(num_classes, tf.int32),
+ tf.convert_to_tensor(max_instances_per_category, tf.int32),
+ tf.convert_to_tensor(ignored_label, tf.int32),
+ tf.convert_to_tensor(offset, tf.int32),
+ ]
+ return_types = [
+ tf.float64,
+ tf.float64,
+ tf.float64,
+ tf.float64,
+ ]
+ with tf.variable_scope(name, 'streaming_panoptic_quality', input_args):
+ panoptic_results = tf.py_func(
+ _panoptic_quality_helper, input_args, return_types, stateful=False)
+ iou, tp, fn, fp = tuple(panoptic_results)
+
+ total_iou, updated_iou = _running_total(
+ iou, [num_classes], name='iou_total')
+ total_tp, updated_tp = _running_total(tp, [num_classes], name='tp_total')
+ total_fn, updated_fn = _running_total(fn, [num_classes], name='fn_total')
+ total_fp, updated_fp = _running_total(fp, [num_classes], name='fp_total')
+ update_ops = [updated_iou, updated_tp, updated_fn, updated_fp]
+
+ sq = _realdiv_maybe_zero(total_iou, total_tp)
+ rq = _realdiv_maybe_zero(total_tp,
+ total_tp + 0.5 * total_fn + 0.5 * total_fp)
+ pq = tf.multiply(sq, rq)
+ qualities = tf.stack([pq, sq, rq, total_tp, total_fn, total_fp], axis=0)
+ return qualities, update_ops
+
+
+def _parsing_covering_helper(
+ groundtruth_category_array, groundtruth_instance_array,
+ predicted_category_array, predicted_instance_array, num_classes,
+ max_instances_per_category, ignored_label, offset, normalize_by_image_size):
+ """Helper function to compute parsing covering."""
+ pc = parsing_covering.ParsingCovering(num_classes, ignored_label,
+ max_instances_per_category, offset,
+ normalize_by_image_size)
+ pc.compare_and_accumulate(groundtruth_category_array,
+ groundtruth_instance_array,
+ predicted_category_array, predicted_instance_array)
+ return pc.weighted_iou_per_class, pc.gt_area_per_class
+
+
+def streaming_parsing_covering(groundtruth_categories,
+ groundtruth_instances,
+ predicted_categories,
+ predicted_instances,
+ num_classes,
+ max_instances_per_category,
+ ignored_label,
+ offset,
+ normalize_by_image_size=True,
+ name=None):
+ """Aggregates the covering across calls with different input tensors.
+
+ See tf.metrics.* functions for comparable functionality and usage.
+
+ Args:
+ groundtruth_categories: A 2D uint16 tensor of groundtruth category labels.
+ groundtruth_instances: A 2D uint16 tensor of groundtruth instance labels.
+ predicted_categories: A 2D uint16 tensor of predicted category labels.
+ predicted_instances: A 2D uint16 tensor of predicted instance labels.
+ num_classes: Number of classes in the dataset as an integer.
+ max_instances_per_category: The maximum number of instances for each class
+ as an integer or integer tensor.
+ ignored_label: The class id to be ignored in evaluation as an integer or
+ integer tensor.
+ offset: The maximum number of unique labels as an integer or integer tensor.
+ normalize_by_image_size: Whether to normalize groundtruth region areas by
+ image size. If True, groundtruth instance areas and weighted IoUs will be
+ divided by the size of the corresponding image before accumulated across
+ the dataset.
+ name: An optional variable_scope name.
+
+ Returns:
+ coverings: A tensor of shape `[3, num_classes]`, where (1) per class
+ coverings, (2) per class sum of weighted IoUs, and (3) per class sum of
+ groundtruth region areas are saved in the perspective rows.
+ update_ops: List of operations that update the running overall parsing
+ covering.
+
+ Raises:
+ RuntimeError: If eager execution is enabled.
+ """
+ if tf.executing_eagerly():
+ raise RuntimeError('Cannot aggregate when eager execution is enabled.')
+
+ input_args = [
+ tf.convert_to_tensor(groundtruth_categories, tf.uint16),
+ tf.convert_to_tensor(groundtruth_instances, tf.uint16),
+ tf.convert_to_tensor(predicted_categories, tf.uint16),
+ tf.convert_to_tensor(predicted_instances, tf.uint16),
+ tf.convert_to_tensor(num_classes, tf.int32),
+ tf.convert_to_tensor(max_instances_per_category, tf.int32),
+ tf.convert_to_tensor(ignored_label, tf.int32),
+ tf.convert_to_tensor(offset, tf.int32),
+ tf.convert_to_tensor(normalize_by_image_size, tf.bool),
+ ]
+ return_types = [
+ tf.float64,
+ tf.float64,
+ ]
+ with tf.variable_scope(name, 'streaming_parsing_covering', input_args):
+ covering_results = tf.py_func(
+ _parsing_covering_helper, input_args, return_types, stateful=False)
+ weighted_iou_per_class, gt_area_per_class = tuple(covering_results)
+
+ total_weighted_iou_per_class, updated_weighted_iou_per_class = (
+ _running_total(
+ weighted_iou_per_class, [num_classes],
+ name='weighted_iou_per_class_total'))
+ total_gt_area_per_class, updated_gt_area_per_class = _running_total(
+ gt_area_per_class, [num_classes], name='gt_area_per_class_total')
+
+ covering_per_class = _realdiv_maybe_zero(total_weighted_iou_per_class,
+ total_gt_area_per_class)
+ coverings = tf.stack([
+ covering_per_class,
+ total_weighted_iou_per_class,
+ total_gt_area_per_class,
+ ],
+ axis=0)
+ update_ops = [updated_weighted_iou_per_class, updated_gt_area_per_class]
+
+ return coverings, update_ops
diff --git a/deeplab/models/research/deeplab/evaluation/streaming_metrics_test.py b/deeplab/models/research/deeplab/evaluation/streaming_metrics_test.py
new file mode 100644
index 0000000..656007e
--- /dev/null
+++ b/deeplab/models/research/deeplab/evaluation/streaming_metrics_test.py
@@ -0,0 +1,549 @@
+# Lint as: python2, python3
+# Copyright 2019 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Tests for segmentation "streaming" metrics."""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import collections
+
+
+
+import numpy as np
+import six
+import tensorflow as tf
+
+from deeplab.evaluation import streaming_metrics
+from deeplab.evaluation import test_utils
+
+# See the definition of the color names at:
+# https://en.wikipedia.org/wiki/Web_colors.
+_CLASS_COLOR_MAP = {
+ (0, 0, 0): 0,
+ (0, 0, 255): 1, # Person (blue).
+ (255, 0, 0): 2, # Bear (red).
+ (0, 255, 0): 3, # Tree (lime).
+ (255, 0, 255): 4, # Bird (fuchsia).
+ (0, 255, 255): 5, # Sky (aqua).
+ (255, 255, 0): 6, # Cat (yellow).
+}
+
+
+class StreamingPanopticQualityTest(tf.test.TestCase):
+
+ def test_streaming_metric_on_single_image(self):
+ offset = 256 * 256
+
+ instance_class_map = {
+ 0: 0,
+ 47: 1,
+ 97: 1,
+ 133: 1,
+ 150: 1,
+ 174: 1,
+ 198: 2,
+ 215: 1,
+ 244: 1,
+ 255: 1,
+ }
+ gt_instances, gt_classes = test_utils.panoptic_segmentation_with_class_map(
+ 'team_gt_instance.png', instance_class_map)
+
+ pred_classes = test_utils.read_segmentation_with_rgb_color_map(
+ 'team_pred_class.png', _CLASS_COLOR_MAP)
+ pred_instances = test_utils.read_test_image(
+ 'team_pred_instance.png', mode='L')
+
+ gt_class_tensor = tf.placeholder(tf.uint16)
+ gt_instance_tensor = tf.placeholder(tf.uint16)
+ pred_class_tensor = tf.placeholder(tf.uint16)
+ pred_instance_tensor = tf.placeholder(tf.uint16)
+ qualities, update_pq = streaming_metrics.streaming_panoptic_quality(
+ gt_class_tensor,
+ gt_instance_tensor,
+ pred_class_tensor,
+ pred_instance_tensor,
+ num_classes=3,
+ max_instances_per_category=256,
+ ignored_label=0,
+ offset=offset)
+ pq, sq, rq, total_tp, total_fn, total_fp = tf.unstack(qualities, 6, axis=0)
+ feed_dict = {
+ gt_class_tensor: gt_classes,
+ gt_instance_tensor: gt_instances,
+ pred_class_tensor: pred_classes,
+ pred_instance_tensor: pred_instances
+ }
+
+ with self.session() as sess:
+ sess.run(tf.local_variables_initializer())
+ sess.run(update_pq, feed_dict=feed_dict)
+ (result_pq, result_sq, result_rq, result_total_tp, result_total_fn,
+ result_total_fp) = sess.run([pq, sq, rq, total_tp, total_fn, total_fp],
+ feed_dict=feed_dict)
+ np.testing.assert_array_almost_equal(
+ result_pq, [2.06104, 0.7024, 0.54069], decimal=4)
+ np.testing.assert_array_almost_equal(
+ result_sq, [2.06104, 0.7526, 0.54069], decimal=4)
+ np.testing.assert_array_almost_equal(result_rq, [1., 0.9333, 1.], decimal=4)
+ np.testing.assert_array_almost_equal(
+ result_total_tp, [1., 7., 1.], decimal=4)
+ np.testing.assert_array_almost_equal(
+ result_total_fn, [0., 1., 0.], decimal=4)
+ np.testing.assert_array_almost_equal(
+ result_total_fp, [0., 0., 0.], decimal=4)
+
+ def test_streaming_metric_on_multiple_images(self):
+ num_classes = 7
+ offset = 256 * 256
+
+ bird_gt_instance_class_map = {
+ 92: 5,
+ 176: 3,
+ 255: 4,
+ }
+ cat_gt_instance_class_map = {
+ 0: 0,
+ 255: 6,
+ }
+ team_gt_instance_class_map = {
+ 0: 0,
+ 47: 1,
+ 97: 1,
+ 133: 1,
+ 150: 1,
+ 174: 1,
+ 198: 2,
+ 215: 1,
+ 244: 1,
+ 255: 1,
+ }
+ test_image = collections.namedtuple(
+ 'TestImage',
+ ['gt_class_map', 'gt_path', 'pred_inst_path', 'pred_class_path'])
+ test_images = [
+ test_image(bird_gt_instance_class_map, 'bird_gt.png',
+ 'bird_pred_instance.png', 'bird_pred_class.png'),
+ test_image(cat_gt_instance_class_map, 'cat_gt.png',
+ 'cat_pred_instance.png', 'cat_pred_class.png'),
+ test_image(team_gt_instance_class_map, 'team_gt_instance.png',
+ 'team_pred_instance.png', 'team_pred_class.png'),
+ ]
+
+ gt_classes = []
+ gt_instances = []
+ pred_classes = []
+ pred_instances = []
+ for test_image in test_images:
+ (image_gt_instances,
+ image_gt_classes) = test_utils.panoptic_segmentation_with_class_map(
+ test_image.gt_path, test_image.gt_class_map)
+ gt_classes.append(image_gt_classes)
+ gt_instances.append(image_gt_instances)
+
+ pred_classes.append(
+ test_utils.read_segmentation_with_rgb_color_map(
+ test_image.pred_class_path, _CLASS_COLOR_MAP))
+ pred_instances.append(
+ test_utils.read_test_image(test_image.pred_inst_path, mode='L'))
+
+ gt_class_tensor = tf.placeholder(tf.uint16)
+ gt_instance_tensor = tf.placeholder(tf.uint16)
+ pred_class_tensor = tf.placeholder(tf.uint16)
+ pred_instance_tensor = tf.placeholder(tf.uint16)
+ qualities, update_pq = streaming_metrics.streaming_panoptic_quality(
+ gt_class_tensor,
+ gt_instance_tensor,
+ pred_class_tensor,
+ pred_instance_tensor,
+ num_classes=num_classes,
+ max_instances_per_category=256,
+ ignored_label=0,
+ offset=offset)
+ pq, sq, rq, total_tp, total_fn, total_fp = tf.unstack(qualities, 6, axis=0)
+ with self.session() as sess:
+ sess.run(tf.local_variables_initializer())
+ for pred_class, pred_instance, gt_class, gt_instance in six.moves.zip(
+ pred_classes, pred_instances, gt_classes, gt_instances):
+ sess.run(
+ update_pq,
+ feed_dict={
+ gt_class_tensor: gt_class,
+ gt_instance_tensor: gt_instance,
+ pred_class_tensor: pred_class,
+ pred_instance_tensor: pred_instance
+ })
+ (result_pq, result_sq, result_rq, result_total_tp, result_total_fn,
+ result_total_fp) = sess.run(
+ [pq, sq, rq, total_tp, total_fn, total_fp],
+ feed_dict={
+ gt_class_tensor: 0,
+ gt_instance_tensor: 0,
+ pred_class_tensor: 0,
+ pred_instance_tensor: 0
+ })
+ np.testing.assert_array_almost_equal(
+ result_pq,
+ [4.3107, 0.7024, 0.54069, 0.745353, 0.85768, 0.99107, 0.77410],
+ decimal=4)
+ np.testing.assert_array_almost_equal(
+ result_sq, [5.3883, 0.7526, 0.5407, 0.7454, 0.8577, 0.9911, 0.7741],
+ decimal=4)
+ np.testing.assert_array_almost_equal(
+ result_rq, [0.8, 0.9333, 1., 1., 1., 1., 1.], decimal=4)
+ np.testing.assert_array_almost_equal(
+ result_total_tp, [2., 7., 1., 1., 1., 1., 1.], decimal=4)
+ np.testing.assert_array_almost_equal(
+ result_total_fn, [0., 1., 0., 0., 0., 0., 0.], decimal=4)
+ np.testing.assert_array_almost_equal(
+ result_total_fp, [1., 0., 0., 0., 0., 0., 0.], decimal=4)
+
+
+class StreamingParsingCoveringTest(tf.test.TestCase):
+
+ def test_streaming_metric_on_single_image(self):
+ offset = 256 * 256
+
+ instance_class_map = {
+ 0: 0,
+ 47: 1,
+ 97: 1,
+ 133: 1,
+ 150: 1,
+ 174: 1,
+ 198: 2,
+ 215: 1,
+ 244: 1,
+ 255: 1,
+ }
+ gt_instances, gt_classes = test_utils.panoptic_segmentation_with_class_map(
+ 'team_gt_instance.png', instance_class_map)
+
+ pred_classes = test_utils.read_segmentation_with_rgb_color_map(
+ 'team_pred_class.png', _CLASS_COLOR_MAP)
+ pred_instances = test_utils.read_test_image(
+ 'team_pred_instance.png', mode='L')
+
+ gt_class_tensor = tf.placeholder(tf.uint16)
+ gt_instance_tensor = tf.placeholder(tf.uint16)
+ pred_class_tensor = tf.placeholder(tf.uint16)
+ pred_instance_tensor = tf.placeholder(tf.uint16)
+ coverings, update_ops = streaming_metrics.streaming_parsing_covering(
+ gt_class_tensor,
+ gt_instance_tensor,
+ pred_class_tensor,
+ pred_instance_tensor,
+ num_classes=3,
+ max_instances_per_category=256,
+ ignored_label=0,
+ offset=offset,
+ normalize_by_image_size=False)
+ (per_class_coverings, per_class_weighted_ious, per_class_gt_areas) = (
+ tf.unstack(coverings, num=3, axis=0))
+ feed_dict = {
+ gt_class_tensor: gt_classes,
+ gt_instance_tensor: gt_instances,
+ pred_class_tensor: pred_classes,
+ pred_instance_tensor: pred_instances
+ }
+
+ with self.session() as sess:
+ sess.run(tf.local_variables_initializer())
+ sess.run(update_ops, feed_dict=feed_dict)
+ (result_per_class_coverings, result_per_class_weighted_ious,
+ result_per_class_gt_areas) = (
+ sess.run([
+ per_class_coverings,
+ per_class_weighted_ious,
+ per_class_gt_areas,
+ ],
+ feed_dict=feed_dict))
+
+ np.testing.assert_array_almost_equal(
+ result_per_class_coverings, [0.0, 0.7009696912, 0.5406896552],
+ decimal=4)
+ np.testing.assert_array_almost_equal(
+ result_per_class_weighted_ious, [0.0, 39864.14634, 3136], decimal=4)
+ np.testing.assert_array_equal(result_per_class_gt_areas, [0, 56870, 5800])
+
+ def test_streaming_metric_on_multiple_images(self):
+ """Tests streaming parsing covering metric."""
+ num_classes = 7
+ offset = 256 * 256
+
+ bird_gt_instance_class_map = {
+ 92: 5,
+ 176: 3,
+ 255: 4,
+ }
+ cat_gt_instance_class_map = {
+ 0: 0,
+ 255: 6,
+ }
+ team_gt_instance_class_map = {
+ 0: 0,
+ 47: 1,
+ 97: 1,
+ 133: 1,
+ 150: 1,
+ 174: 1,
+ 198: 2,
+ 215: 1,
+ 244: 1,
+ 255: 1,
+ }
+ test_image = collections.namedtuple(
+ 'TestImage',
+ ['gt_class_map', 'gt_path', 'pred_inst_path', 'pred_class_path'])
+ test_images = [
+ test_image(bird_gt_instance_class_map, 'bird_gt.png',
+ 'bird_pred_instance.png', 'bird_pred_class.png'),
+ test_image(cat_gt_instance_class_map, 'cat_gt.png',
+ 'cat_pred_instance.png', 'cat_pred_class.png'),
+ test_image(team_gt_instance_class_map, 'team_gt_instance.png',
+ 'team_pred_instance.png', 'team_pred_class.png'),
+ ]
+
+ gt_classes = []
+ gt_instances = []
+ pred_classes = []
+ pred_instances = []
+ for test_image in test_images:
+ (image_gt_instances,
+ image_gt_classes) = test_utils.panoptic_segmentation_with_class_map(
+ test_image.gt_path, test_image.gt_class_map)
+ gt_classes.append(image_gt_classes)
+ gt_instances.append(image_gt_instances)
+
+ pred_instances.append(
+ test_utils.read_test_image(test_image.pred_inst_path, mode='L'))
+ pred_classes.append(
+ test_utils.read_segmentation_with_rgb_color_map(
+ test_image.pred_class_path, _CLASS_COLOR_MAP))
+
+ gt_class_tensor = tf.placeholder(tf.uint16)
+ gt_instance_tensor = tf.placeholder(tf.uint16)
+ pred_class_tensor = tf.placeholder(tf.uint16)
+ pred_instance_tensor = tf.placeholder(tf.uint16)
+ coverings, update_ops = streaming_metrics.streaming_parsing_covering(
+ gt_class_tensor,
+ gt_instance_tensor,
+ pred_class_tensor,
+ pred_instance_tensor,
+ num_classes=num_classes,
+ max_instances_per_category=256,
+ ignored_label=0,
+ offset=offset,
+ normalize_by_image_size=False)
+ (per_class_coverings, per_class_weighted_ious, per_class_gt_areas) = (
+ tf.unstack(coverings, num=3, axis=0))
+
+ with self.session() as sess:
+ sess.run(tf.local_variables_initializer())
+ for pred_class, pred_instance, gt_class, gt_instance in six.moves.zip(
+ pred_classes, pred_instances, gt_classes, gt_instances):
+ sess.run(
+ update_ops,
+ feed_dict={
+ gt_class_tensor: gt_class,
+ gt_instance_tensor: gt_instance,
+ pred_class_tensor: pred_class,
+ pred_instance_tensor: pred_instance
+ })
+ (result_per_class_coverings, result_per_class_weighted_ious,
+ result_per_class_gt_areas) = (
+ sess.run(
+ [
+ per_class_coverings,
+ per_class_weighted_ious,
+ per_class_gt_areas,
+ ],
+ feed_dict={
+ gt_class_tensor: 0,
+ gt_instance_tensor: 0,
+ pred_class_tensor: 0,
+ pred_instance_tensor: 0
+ }))
+
+ np.testing.assert_array_almost_equal(
+ result_per_class_coverings, [
+ 0.0,
+ 0.7009696912,
+ 0.5406896552,
+ 0.7453531599,
+ 0.8576779026,
+ 0.9910687881,
+ 0.7741046032,
+ ],
+ decimal=4)
+ np.testing.assert_array_almost_equal(
+ result_per_class_weighted_ious, [
+ 0.0,
+ 39864.14634,
+ 3136,
+ 1177.657993,
+ 2498.41573,
+ 33366.31289,
+ 26671,
+ ],
+ decimal=4)
+ np.testing.assert_array_equal(result_per_class_gt_areas, [
+ 0.0,
+ 56870,
+ 5800,
+ 1580,
+ 2913,
+ 33667,
+ 34454,
+ ])
+
+ def test_streaming_metric_on_multiple_images_normalize_by_size(self):
+ """Tests streaming parsing covering metric with image size normalization."""
+ num_classes = 7
+ offset = 256 * 256
+
+ bird_gt_instance_class_map = {
+ 92: 5,
+ 176: 3,
+ 255: 4,
+ }
+ cat_gt_instance_class_map = {
+ 0: 0,
+ 255: 6,
+ }
+ team_gt_instance_class_map = {
+ 0: 0,
+ 47: 1,
+ 97: 1,
+ 133: 1,
+ 150: 1,
+ 174: 1,
+ 198: 2,
+ 215: 1,
+ 244: 1,
+ 255: 1,
+ }
+ test_image = collections.namedtuple(
+ 'TestImage',
+ ['gt_class_map', 'gt_path', 'pred_inst_path', 'pred_class_path'])
+ test_images = [
+ test_image(bird_gt_instance_class_map, 'bird_gt.png',
+ 'bird_pred_instance.png', 'bird_pred_class.png'),
+ test_image(cat_gt_instance_class_map, 'cat_gt.png',
+ 'cat_pred_instance.png', 'cat_pred_class.png'),
+ test_image(team_gt_instance_class_map, 'team_gt_instance.png',
+ 'team_pred_instance.png', 'team_pred_class.png'),
+ ]
+
+ gt_classes = []
+ gt_instances = []
+ pred_classes = []
+ pred_instances = []
+ for test_image in test_images:
+ (image_gt_instances,
+ image_gt_classes) = test_utils.panoptic_segmentation_with_class_map(
+ test_image.gt_path, test_image.gt_class_map)
+ gt_classes.append(image_gt_classes)
+ gt_instances.append(image_gt_instances)
+
+ pred_instances.append(
+ test_utils.read_test_image(test_image.pred_inst_path, mode='L'))
+ pred_classes.append(
+ test_utils.read_segmentation_with_rgb_color_map(
+ test_image.pred_class_path, _CLASS_COLOR_MAP))
+
+ gt_class_tensor = tf.placeholder(tf.uint16)
+ gt_instance_tensor = tf.placeholder(tf.uint16)
+ pred_class_tensor = tf.placeholder(tf.uint16)
+ pred_instance_tensor = tf.placeholder(tf.uint16)
+ coverings, update_ops = streaming_metrics.streaming_parsing_covering(
+ gt_class_tensor,
+ gt_instance_tensor,
+ pred_class_tensor,
+ pred_instance_tensor,
+ num_classes=num_classes,
+ max_instances_per_category=256,
+ ignored_label=0,
+ offset=offset,
+ normalize_by_image_size=True)
+ (per_class_coverings, per_class_weighted_ious, per_class_gt_areas) = (
+ tf.unstack(coverings, num=3, axis=0))
+
+ with self.session() as sess:
+ sess.run(tf.local_variables_initializer())
+ for pred_class, pred_instance, gt_class, gt_instance in six.moves.zip(
+ pred_classes, pred_instances, gt_classes, gt_instances):
+ sess.run(
+ update_ops,
+ feed_dict={
+ gt_class_tensor: gt_class,
+ gt_instance_tensor: gt_instance,
+ pred_class_tensor: pred_class,
+ pred_instance_tensor: pred_instance
+ })
+ (result_per_class_coverings, result_per_class_weighted_ious,
+ result_per_class_gt_areas) = (
+ sess.run(
+ [
+ per_class_coverings,
+ per_class_weighted_ious,
+ per_class_gt_areas,
+ ],
+ feed_dict={
+ gt_class_tensor: 0,
+ gt_instance_tensor: 0,
+ pred_class_tensor: 0,
+ pred_instance_tensor: 0
+ }))
+
+ np.testing.assert_array_almost_equal(
+ result_per_class_coverings, [
+ 0.0,
+ 0.7009696912,
+ 0.5406896552,
+ 0.7453531599,
+ 0.8576779026,
+ 0.9910687881,
+ 0.7741046032,
+ ],
+ decimal=4)
+ np.testing.assert_array_almost_equal(
+ result_per_class_weighted_ious, [
+ 0.0,
+ 0.5002088756,
+ 0.03935002196,
+ 0.03086105851,
+ 0.06547211033,
+ 0.8743792686,
+ 0.2549565051,
+ ],
+ decimal=4)
+ np.testing.assert_array_almost_equal(
+ result_per_class_gt_areas, [
+ 0.0,
+ 0.7135955832,
+ 0.07277746408,
+ 0.04140461216,
+ 0.07633647799,
+ 0.8822589099,
+ 0.3293566581,
+ ],
+ decimal=4)
+
+
+if __name__ == '__main__':
+ tf.test.main()
diff --git a/deeplab/models/research/deeplab/evaluation/test_utils.py b/deeplab/models/research/deeplab/evaluation/test_utils.py
new file mode 100644
index 0000000..9ad4f55
--- /dev/null
+++ b/deeplab/models/research/deeplab/evaluation/test_utils.py
@@ -0,0 +1,119 @@
+# Lint as: python2, python3
+# Copyright 2019 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Utility functions to set up unit tests on Panoptic Segmentation code."""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import os
+
+
+
+from absl import flags
+import numpy as np
+import scipy.misc
+import six
+from six.moves import map
+
+FLAGS = flags.FLAGS
+
+_TEST_DIR = 'deeplab/evaluation/testdata'
+
+
+def read_test_image(testdata_path, *args, **kwargs):
+ """Loads a test image.
+
+ Args:
+ testdata_path: Image path relative to panoptic_segmentation/testdata as a
+ string.
+ *args: Additional positional arguments passed to `imread`.
+ **kwargs: Additional keyword arguments passed to `imread`.
+
+ Returns:
+ The image, as a numpy array.
+ """
+ image_path = os.path.join(_TEST_DIR, testdata_path)
+ return scipy.misc.imread(image_path, *args, **kwargs)
+
+
+def read_segmentation_with_rgb_color_map(image_testdata_path,
+ rgb_to_semantic_label,
+ output_dtype=None):
+ """Reads a test segmentation as an image and a map from colors to labels.
+
+ Args:
+ image_testdata_path: Image path relative to panoptic_segmentation/testdata
+ as a string.
+ rgb_to_semantic_label: Mapping from RGB colors to integer labels as a
+ dictionary.
+ output_dtype: Type of the output labels. If None, defaults to the type of
+ the provided color map.
+
+ Returns:
+ A 2D numpy array of labels.
+
+ Raises:
+ ValueError: On an incomplete `rgb_to_semantic_label`.
+ """
+ rgb_image = read_test_image(image_testdata_path, mode='RGB')
+ if len(rgb_image.shape) != 3 or rgb_image.shape[2] != 3:
+ raise AssertionError(
+ 'Expected RGB image, actual shape is %s' % rgb_image.sape)
+
+ num_pixels = rgb_image.shape[0] * rgb_image.shape[1]
+ unique_colors = np.unique(np.reshape(rgb_image, [num_pixels, 3]), axis=0)
+ if not set(map(tuple, unique_colors)).issubset(
+ six.viewkeys(rgb_to_semantic_label)):
+ raise ValueError('RGB image has colors not in color map.')
+
+ output_dtype = output_dtype or type(
+ next(six.itervalues(rgb_to_semantic_label)))
+ output_labels = np.empty(rgb_image.shape[:2], dtype=output_dtype)
+ for rgb_color, int_label in six.iteritems(rgb_to_semantic_label):
+ color_array = np.array(rgb_color, ndmin=3)
+ output_labels[np.all(rgb_image == color_array, axis=2)] = int_label
+ return output_labels
+
+
+def panoptic_segmentation_with_class_map(instance_testdata_path,
+ instance_label_to_semantic_label):
+ """Reads in a panoptic segmentation with an instance map and a map to classes.
+
+ Args:
+ instance_testdata_path: Path to a grayscale instance map, given as a string
+ and relative to panoptic_segmentation/testdata.
+ instance_label_to_semantic_label: A map from instance labels to class
+ labels.
+
+ Returns:
+ A tuple `(instance_labels, class_labels)` of numpy arrays.
+
+ Raises:
+ ValueError: On a mismatched set of instances in
+ the
+ `instance_label_to_semantic_label`.
+ """
+ instance_labels = read_test_image(instance_testdata_path, mode='L')
+ if set(np.unique(instance_labels)) != set(
+ six.iterkeys(instance_label_to_semantic_label)):
+ raise ValueError('Provided class map does not match present instance ids.')
+
+ class_labels = np.empty_like(instance_labels)
+ for instance_id, class_id in six.iteritems(instance_label_to_semantic_label):
+ class_labels[instance_labels == instance_id] = class_id
+
+ return instance_labels, class_labels
diff --git a/deeplab/models/research/deeplab/evaluation/test_utils_test.py b/deeplab/models/research/deeplab/evaluation/test_utils_test.py
new file mode 100644
index 0000000..9e9bed3
--- /dev/null
+++ b/deeplab/models/research/deeplab/evaluation/test_utils_test.py
@@ -0,0 +1,74 @@
+# Lint as: python2, python3
+# Copyright 2019 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Tests for test_utils."""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+
+
+from absl.testing import absltest
+import numpy as np
+
+from deeplab.evaluation import test_utils
+
+
+class TestUtilsTest(absltest.TestCase):
+
+ def test_read_test_image(self):
+ image_array = test_utils.read_test_image('team_pred_class.png')
+ self.assertSequenceEqual(image_array.shape, (231, 345, 4))
+
+ def test_reads_segmentation_with_color_map(self):
+ rgb_to_semantic_label = {(0, 0, 0): 0, (0, 0, 255): 1, (255, 0, 0): 23}
+ labels = test_utils.read_segmentation_with_rgb_color_map(
+ 'team_pred_class.png', rgb_to_semantic_label)
+
+ input_image = test_utils.read_test_image('team_pred_class.png')
+ np.testing.assert_array_equal(
+ labels == 0,
+ np.logical_and(input_image[:, :, 0] == 0, input_image[:, :, 2] == 0))
+ np.testing.assert_array_equal(labels == 1, input_image[:, :, 2] == 255)
+ np.testing.assert_array_equal(labels == 23, input_image[:, :, 0] == 255)
+
+ def test_reads_gt_segmentation(self):
+ instance_label_to_semantic_label = {
+ 0: 0,
+ 47: 1,
+ 97: 1,
+ 133: 1,
+ 150: 1,
+ 174: 1,
+ 198: 23,
+ 215: 1,
+ 244: 1,
+ 255: 1,
+ }
+ instances, classes = test_utils.panoptic_segmentation_with_class_map(
+ 'team_gt_instance.png', instance_label_to_semantic_label)
+
+ expected_label_shape = (231, 345)
+ self.assertSequenceEqual(instances.shape, expected_label_shape)
+ self.assertSequenceEqual(classes.shape, expected_label_shape)
+ np.testing.assert_array_equal(instances == 0, classes == 0)
+ np.testing.assert_array_equal(instances == 198, classes == 23)
+ np.testing.assert_array_equal(
+ np.logical_and(instances != 0, instances != 198), classes == 1)
+
+
+if __name__ == '__main__':
+ absltest.main()
diff --git a/deeplab/models/research/deeplab/evaluation/testdata/README.md b/deeplab/models/research/deeplab/evaluation/testdata/README.md
new file mode 100644
index 0000000..711b476
--- /dev/null
+++ b/deeplab/models/research/deeplab/evaluation/testdata/README.md
@@ -0,0 +1,14 @@
+# Segmentation Evalaution Test Data
+
+## Source Images
+
+* [team_input.png](team_input.png) \
+ Source:
+ https://ai.googleblog.com/2018/03/semantic-image-segmentation-with.html
+* [cat_input.jpg](cat_input.jpg) \
+ Source: https://www.flickr.com/photos/magdalena_b/4995858743
+* [bird_input.jpg](bird_input.jpg) \
+ Source: https://www.flickr.com/photos/chivinskia/40619099560
+* [congress_input.jpg](congress_input.jpg) \
+ Source:
+ https://cao.house.gov/sites/cao.house.gov/files/documents/SAR-Jan-Jun-2016.pdf
diff --git a/deeplab/models/research/deeplab/evaluation/testdata/bird_gt.png b/deeplab/models/research/deeplab/evaluation/testdata/bird_gt.png
new file mode 100644
index 0000000..05d8549
Binary files /dev/null and b/deeplab/models/research/deeplab/evaluation/testdata/bird_gt.png differ
diff --git a/deeplab/models/research/deeplab/evaluation/testdata/bird_pred_class.png b/deeplab/models/research/deeplab/evaluation/testdata/bird_pred_class.png
new file mode 100644
index 0000000..07351bf
Binary files /dev/null and b/deeplab/models/research/deeplab/evaluation/testdata/bird_pred_class.png differ
diff --git a/deeplab/models/research/deeplab/evaluation/testdata/bird_pred_instance.png b/deeplab/models/research/deeplab/evaluation/testdata/bird_pred_instance.png
new file mode 100644
index 0000000..faa1371
Binary files /dev/null and b/deeplab/models/research/deeplab/evaluation/testdata/bird_pred_instance.png differ
diff --git a/deeplab/models/research/deeplab/evaluation/testdata/cat_gt.png b/deeplab/models/research/deeplab/evaluation/testdata/cat_gt.png
new file mode 100644
index 0000000..41f6011
Binary files /dev/null and b/deeplab/models/research/deeplab/evaluation/testdata/cat_gt.png differ
diff --git a/deeplab/models/research/deeplab/evaluation/testdata/cat_pred_class.png b/deeplab/models/research/deeplab/evaluation/testdata/cat_pred_class.png
new file mode 100644
index 0000000..3728c68
Binary files /dev/null and b/deeplab/models/research/deeplab/evaluation/testdata/cat_pred_class.png differ
diff --git a/deeplab/models/research/deeplab/evaluation/testdata/cat_pred_instance.png b/deeplab/models/research/deeplab/evaluation/testdata/cat_pred_instance.png
new file mode 100644
index 0000000..ebd9ba4
Binary files /dev/null and b/deeplab/models/research/deeplab/evaluation/testdata/cat_pred_instance.png differ
diff --git a/deeplab/models/research/deeplab/evaluation/testdata/coco_gt.json b/deeplab/models/research/deeplab/evaluation/testdata/coco_gt.json
new file mode 100644
index 0000000..5f79bf1
--- /dev/null
+++ b/deeplab/models/research/deeplab/evaluation/testdata/coco_gt.json
@@ -0,0 +1,214 @@
+{
+ "info": {
+ "description": "Test COCO-format dataset",
+ "url": "https://github.com/tensorflow/models/tree/master/research/deeplab",
+ "version": "1.0",
+ "year": 2019
+ },
+ "images": [
+ {
+ "id": 1,
+ "file_name": "bird.jpg",
+ "height": 159,
+ "width": 240,
+ "flickr_url": "https://www.flickr.com/photos/chivinskia/40619099560"
+ },
+ {
+ "id": 2,
+ "file_name": "cat.jpg",
+ "height": 330,
+ "width": 317,
+ "flickr_url": "https://www.flickr.com/photos/magdalena_b/4995858743"
+ },
+ {
+ "id": 3,
+ "file_name": "team.jpg",
+ "height": 231,
+ "width": 345
+ },
+ {
+ "id": 4,
+ "file_name": "congress.jpg",
+ "height": 267,
+ "width": 525
+ }
+ ],
+ "annotations": [
+ {
+ "image_id": 1,
+ "file_name": "bird.png",
+ "segments_info": [
+ {
+ "id": 255,
+ "area": 2913,
+ "category_id": 4,
+ "iscrowd": 0
+ },
+ {
+ "id": 2586368,
+ "area": 1580,
+ "category_id": 3,
+ "iscrowd": 0
+ },
+ {
+ "id": 16770360,
+ "area": 33667,
+ "category_id": 5,
+ "iscrowd": 0
+ }
+ ]
+ },
+ {
+ "image_id": 2,
+ "file_name": "cat.png",
+ "segments_info": [
+ {
+ "id": 16711691,
+ "area": 34454,
+ "category_id": 6,
+ "iscrowd": 0
+ }
+ ]
+ },
+ {
+ "image_id": 3,
+ "file_name": "team.png",
+ "segments_info": [
+ {
+ "id": 129,
+ "area": 5443,
+ "category_id": 1,
+ "iscrowd": 0
+ },
+ {
+ "id": 255,
+ "area": 3574,
+ "category_id": 2,
+ "iscrowd": 0
+ },
+ {
+ "id": 47615,
+ "area": 11483,
+ "category_id": 1,
+ "iscrowd": 0
+ },
+ {
+ "id": 65532,
+ "area": 7080,
+ "category_id": 1,
+ "iscrowd": 0
+ },
+ {
+ "id": 8585107,
+ "area": 11363,
+ "category_id": 1,
+ "iscrowd": 0
+ },
+ {
+ "id": 9011200,
+ "area": 7158,
+ "category_id": 1,
+ "iscrowd": 0
+ },
+ {
+ "id": 12858027,
+ "area": 6419,
+ "category_id": 1,
+ "iscrowd": 0
+ },
+ {
+ "id": 16053492,
+ "area": 4350,
+ "category_id": 1,
+ "iscrowd": 0
+ },
+ {
+ "id": 16711680,
+ "area": 5800,
+ "category_id": 1,
+ "iscrowd": 0
+ }
+ ]
+ },
+ {
+ "image_id": 4,
+ "file_name": "congress.png",
+ "segments_info": [
+ {
+ "id": 255,
+ "area": 243,
+ "category_id": 1,
+ "iscrowd": 0
+ },
+ {
+ "id": 65315,
+ "area": 553,
+ "category_id": 1,
+ "iscrowd": 0
+ },
+ {
+ "id": 65516,
+ "area": 652,
+ "category_id": 1,
+ "iscrowd": 0
+ },
+ {
+ "id": 9895680,
+ "area": 82774,
+ "category_id": 1,
+ "iscrowd": 1
+ },
+ {
+ "id": 16711739,
+ "area": 137,
+ "category_id": 1,
+ "iscrowd": 0
+ },
+ {
+ "id": 16711868,
+ "area": 179,
+ "category_id": 1,
+ "iscrowd": 0
+ },
+ {
+ "id": 16762624,
+ "area": 2742,
+ "category_id": 1,
+ "iscrowd": 0
+ }
+ ]
+ }
+ ],
+ "categories": [
+ {
+ "id": 1,
+ "name": "person",
+ "isthing": 1
+ },
+ {
+ "id": 2,
+ "name": "umbrella",
+ "isthing": 1
+ },
+ {
+ "id": 3,
+ "name": "tree-merged",
+ "isthing": 0
+ },
+ {
+ "id": 4,
+ "name": "bird",
+ "isthing": 1
+ },
+ {
+ "id": 5,
+ "name": "sky",
+ "isthing": 0
+ },
+ {
+ "id": 6,
+ "name": "cat",
+ "isthing": 1
+ }
+ ]
+}
diff --git a/deeplab/models/research/deeplab/evaluation/testdata/coco_gt/bird.png b/deeplab/models/research/deeplab/evaluation/testdata/coco_gt/bird.png
new file mode 100644
index 0000000..9ef4ad9
Binary files /dev/null and b/deeplab/models/research/deeplab/evaluation/testdata/coco_gt/bird.png differ
diff --git a/deeplab/models/research/deeplab/evaluation/testdata/coco_gt/cat.png b/deeplab/models/research/deeplab/evaluation/testdata/coco_gt/cat.png
new file mode 100644
index 0000000..cb02530
Binary files /dev/null and b/deeplab/models/research/deeplab/evaluation/testdata/coco_gt/cat.png differ
diff --git a/deeplab/models/research/deeplab/evaluation/testdata/coco_gt/congress.png b/deeplab/models/research/deeplab/evaluation/testdata/coco_gt/congress.png
new file mode 100644
index 0000000..a56b98d
Binary files /dev/null and b/deeplab/models/research/deeplab/evaluation/testdata/coco_gt/congress.png differ
diff --git a/deeplab/models/research/deeplab/evaluation/testdata/coco_gt/team.png b/deeplab/models/research/deeplab/evaluation/testdata/coco_gt/team.png
new file mode 100644
index 0000000..bde358d
Binary files /dev/null and b/deeplab/models/research/deeplab/evaluation/testdata/coco_gt/team.png differ
diff --git a/deeplab/models/research/deeplab/evaluation/testdata/coco_pred.json b/deeplab/models/research/deeplab/evaluation/testdata/coco_pred.json
new file mode 100644
index 0000000..4aead17
--- /dev/null
+++ b/deeplab/models/research/deeplab/evaluation/testdata/coco_pred.json
@@ -0,0 +1,208 @@
+{
+ "info": {
+ "description": "Test COCO-format dataset",
+ "url": "https://github.com/tensorflow/models/tree/master/research/deeplab",
+ "version": "1.0",
+ "year": 2019
+ },
+ "images": [
+ {
+ "id": 1,
+ "file_name": "bird.jpg",
+ "height": 159,
+ "width": 240,
+ "flickr_url": "https://www.flickr.com/photos/chivinskia/40619099560"
+ },
+ {
+ "id": 2,
+ "file_name": "cat.jpg",
+ "height": 330,
+ "width": 317,
+ "flickr_url": "https://www.flickr.com/photos/magdalena_b/4995858743"
+ },
+ {
+ "id": 3,
+ "file_name": "team.jpg",
+ "height": 231,
+ "width": 345
+ },
+ {
+ "id": 4,
+ "file_name": "congress.jpg",
+ "height": 267,
+ "width": 525
+ }
+ ],
+ "annotations": [
+ {
+ "image_id": 1,
+ "file_name": "bird.png",
+ "segments_info": [
+ {
+ "id": 55551,
+ "area": 3039,
+ "category_id": 4,
+ "iscrowd": 0
+ },
+ {
+ "id": 16216831,
+ "area": 33659,
+ "category_id": 5,
+ "iscrowd": 0
+ },
+ {
+ "id": 16760832,
+ "area": 1237,
+ "category_id": 3,
+ "iscrowd": 0
+ }
+ ]
+ },
+ {
+ "image_id": 2,
+ "file_name": "cat.png",
+ "segments_info": [
+ {
+ "id": 36493,
+ "area": 26910,
+ "category_id": 6,
+ "iscrowd": 0
+ }
+ ]
+ },
+ {
+ "image_id": 3,
+ "file_name": "team.png",
+ "segments_info": [
+ {
+ "id": 0,
+ "area": 22164,
+ "category_id": 1,
+ "iscrowd": 0
+ },
+ {
+ "id": 129,
+ "area": 3418,
+ "category_id": 1,
+ "iscrowd": 0
+ },
+ {
+ "id": 255,
+ "area": 12827,
+ "category_id": 1,
+ "iscrowd": 0
+ },
+ {
+ "id": 740608,
+ "area": 8606,
+ "category_id": 1,
+ "iscrowd": 0
+ },
+ {
+ "id": 2555695,
+ "area": 7636,
+ "category_id": 1,
+ "iscrowd": 0
+ },
+ {
+ "id": 2883541,
+ "area": 6844,
+ "category_id": 1,
+ "iscrowd": 0
+ },
+ {
+ "id": 14408667,
+ "area": 4766,
+ "category_id": 1,
+ "iscrowd": 0
+ },
+ {
+ "id": 16711820,
+ "area": 4767,
+ "category_id": 1,
+ "iscrowd": 0
+ },
+ {
+ "id": 16768768,
+ "area": 8667,
+ "category_id": 1,
+ "iscrowd": 0
+ }
+ ]
+ },
+ {
+ "image_id": 4,
+ "file_name": "congress.png",
+ "segments_info": [
+ {
+ "id": 255,
+ "area": 2599,
+ "category_id": 1,
+ "iscrowd": 0
+ },
+ {
+ "id": 37375,
+ "area": 386,
+ "category_id": 1,
+ "iscrowd": 0
+ },
+ {
+ "id": 62207,
+ "area": 384,
+ "category_id": 1,
+ "iscrowd": 0
+ },
+ {
+ "id": 5177088,
+ "area": 260,
+ "category_id": 1,
+ "iscrowd": 0
+ },
+ {
+ "id": 16711691,
+ "area": 1011,
+ "category_id": 1,
+ "iscrowd": 0
+ },
+ {
+ "id": 16774912,
+ "area": 803,
+ "category_id": 1,
+ "iscrowd": 0
+ }
+ ]
+ }
+ ],
+ "categories": [
+ {
+ "id": 1,
+ "name": "person",
+ "isthing": 1
+ },
+ {
+ "id": 2,
+ "name": "umbrella",
+ "isthing": 1
+ },
+ {
+ "id": 3,
+ "name": "tree-merged",
+ "isthing": 0
+ },
+ {
+ "id": 4,
+ "name": "bird",
+ "isthing": 1
+ },
+ {
+ "id": 5,
+ "name": "sky",
+ "isthing": 0
+ },
+ {
+ "id": 6,
+ "name": "cat",
+ "isthing": 1
+ }
+ ]
+}
diff --git a/deeplab/models/research/deeplab/evaluation/testdata/coco_pred/bird.png b/deeplab/models/research/deeplab/evaluation/testdata/coco_pred/bird.png
new file mode 100644
index 0000000..c9b4cbc
Binary files /dev/null and b/deeplab/models/research/deeplab/evaluation/testdata/coco_pred/bird.png differ
diff --git a/deeplab/models/research/deeplab/evaluation/testdata/coco_pred/cat.png b/deeplab/models/research/deeplab/evaluation/testdata/coco_pred/cat.png
new file mode 100644
index 0000000..3245832
Binary files /dev/null and b/deeplab/models/research/deeplab/evaluation/testdata/coco_pred/cat.png differ
diff --git a/deeplab/models/research/deeplab/evaluation/testdata/coco_pred/congress.png b/deeplab/models/research/deeplab/evaluation/testdata/coco_pred/congress.png
new file mode 100644
index 0000000..fc7bb06
Binary files /dev/null and b/deeplab/models/research/deeplab/evaluation/testdata/coco_pred/congress.png differ
diff --git a/deeplab/models/research/deeplab/evaluation/testdata/coco_pred/team.png b/deeplab/models/research/deeplab/evaluation/testdata/coco_pred/team.png
new file mode 100644
index 0000000..7300bf4
Binary files /dev/null and b/deeplab/models/research/deeplab/evaluation/testdata/coco_pred/team.png differ
diff --git a/deeplab/models/research/deeplab/evaluation/testdata/team_gt_instance.png b/deeplab/models/research/deeplab/evaluation/testdata/team_gt_instance.png
new file mode 100644
index 0000000..97abb55
Binary files /dev/null and b/deeplab/models/research/deeplab/evaluation/testdata/team_gt_instance.png differ
diff --git a/deeplab/models/research/deeplab/evaluation/testdata/team_pred_class.png b/deeplab/models/research/deeplab/evaluation/testdata/team_pred_class.png
new file mode 100644
index 0000000..2ed78de
Binary files /dev/null and b/deeplab/models/research/deeplab/evaluation/testdata/team_pred_class.png differ
diff --git a/deeplab/models/research/deeplab/evaluation/testdata/team_pred_instance.png b/deeplab/models/research/deeplab/evaluation/testdata/team_pred_instance.png
new file mode 100644
index 0000000..264606a
Binary files /dev/null and b/deeplab/models/research/deeplab/evaluation/testdata/team_pred_instance.png differ
diff --git a/deeplab/models/research/deeplab/export_model.py b/deeplab/models/research/deeplab/export_model.py
new file mode 100644
index 0000000..b7307b5
--- /dev/null
+++ b/deeplab/models/research/deeplab/export_model.py
@@ -0,0 +1,201 @@
+# Lint as: python2, python3
+# Copyright 2018 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Exports trained model to TensorFlow frozen graph."""
+
+import os
+import tensorflow as tf
+
+from tensorflow.contrib import quantize as contrib_quantize
+from tensorflow.python.tools import freeze_graph
+from deeplab import common
+from deeplab import input_preprocess
+from deeplab import model
+
+slim = tf.contrib.slim
+flags = tf.app.flags
+
+FLAGS = flags.FLAGS
+
+flags.DEFINE_string('checkpoint_path', None, 'Checkpoint path')
+
+flags.DEFINE_string('export_path', None,
+ 'Path to output Tensorflow frozen graph.')
+
+flags.DEFINE_integer('num_classes', 21, 'Number of classes.')
+
+flags.DEFINE_multi_integer('crop_size', [513, 513],
+ 'Crop size [height, width].')
+
+# For `xception_65`, use atrous_rates = [12, 24, 36] if output_stride = 8, or
+# rates = [6, 12, 18] if output_stride = 16. For `mobilenet_v2`, use None. Note
+# one could use different atrous_rates/output_stride during training/evaluation.
+flags.DEFINE_multi_integer('atrous_rates', None,
+ 'Atrous rates for atrous spatial pyramid pooling.')
+
+flags.DEFINE_integer('output_stride', 8,
+ 'The ratio of input to output spatial resolution.')
+
+# Change to [0.5, 0.75, 1.0, 1.25, 1.5, 1.75] for multi-scale inference.
+flags.DEFINE_multi_float('inference_scales', [1.0],
+ 'The scales to resize images for inference.')
+
+flags.DEFINE_bool('add_flipped_images', False,
+ 'Add flipped images during inference or not.')
+
+flags.DEFINE_integer(
+ 'quantize_delay_step', -1,
+ 'Steps to start quantized training. If < 0, will not quantize model.')
+
+flags.DEFINE_bool('save_inference_graph', False,
+ 'Save inference graph in text proto.')
+
+# Input name of the exported model.
+_INPUT_NAME = 'ImageTensor'
+
+# Output name of the exported predictions.
+_OUTPUT_NAME = 'SemanticPredictions'
+_RAW_OUTPUT_NAME = 'RawSemanticPredictions'
+
+# Output name of the exported probabilities.
+_OUTPUT_PROB_NAME = 'SemanticProbabilities'
+_RAW_OUTPUT_PROB_NAME = 'RawSemanticProbabilities'
+
+
+def _create_input_tensors():
+ """Creates and prepares input tensors for DeepLab model.
+
+ This method creates a 4-D uint8 image tensor 'ImageTensor' with shape
+ [1, None, None, 3]. The actual input tensor name to use during inference is
+ 'ImageTensor:0'.
+
+ Returns:
+ image: Preprocessed 4-D float32 tensor with shape [1, crop_height,
+ crop_width, 3].
+ original_image_size: Original image shape tensor [height, width].
+ resized_image_size: Resized image shape tensor [height, width].
+ """
+ # input_preprocess takes 4-D image tensor as input.
+ input_image = tf.placeholder(tf.uint8, [1, None, None, 3], name=_INPUT_NAME)
+ original_image_size = tf.shape(input_image)[1:3]
+
+ # Squeeze the dimension in axis=0 since `preprocess_image_and_label` assumes
+ # image to be 3-D.
+ image = tf.squeeze(input_image, axis=0)
+ resized_image, image, _ = input_preprocess.preprocess_image_and_label(
+ image,
+ label=None,
+ crop_height=FLAGS.crop_size[0],
+ crop_width=FLAGS.crop_size[1],
+ min_resize_value=FLAGS.min_resize_value,
+ max_resize_value=FLAGS.max_resize_value,
+ resize_factor=FLAGS.resize_factor,
+ is_training=False,
+ model_variant=FLAGS.model_variant)
+ resized_image_size = tf.shape(resized_image)[:2]
+
+ # Expand the dimension in axis=0, since the following operations assume the
+ # image to be 4-D.
+ image = tf.expand_dims(image, 0)
+
+ return image, original_image_size, resized_image_size
+
+
+def main(unused_argv):
+ tf.logging.set_verbosity(tf.logging.INFO)
+ tf.logging.info('Prepare to export model to: %s', FLAGS.export_path)
+
+ with tf.Graph().as_default():
+ image, image_size, resized_image_size = _create_input_tensors()
+
+ model_options = common.ModelOptions(
+ outputs_to_num_classes={common.OUTPUT_TYPE: FLAGS.num_classes},
+ crop_size=FLAGS.crop_size,
+ atrous_rates=FLAGS.atrous_rates,
+ output_stride=FLAGS.output_stride)
+
+ if tuple(FLAGS.inference_scales) == (1.0,):
+ tf.logging.info('Exported model performs single-scale inference.')
+ predictions = model.predict_labels(
+ image,
+ model_options=model_options,
+ image_pyramid=FLAGS.image_pyramid)
+ else:
+ tf.logging.info('Exported model performs multi-scale inference.')
+ if FLAGS.quantize_delay_step >= 0:
+ raise ValueError(
+ 'Quantize mode is not supported with multi-scale test.')
+ predictions = model.predict_labels_multi_scale(
+ image,
+ model_options=model_options,
+ eval_scales=FLAGS.inference_scales,
+ add_flipped_images=FLAGS.add_flipped_images)
+ raw_predictions = tf.identity(
+ tf.cast(predictions[common.OUTPUT_TYPE], tf.float32),
+ _RAW_OUTPUT_NAME)
+ raw_probabilities = tf.identity(
+ predictions[common.OUTPUT_TYPE + model.PROB_SUFFIX],
+ _RAW_OUTPUT_PROB_NAME)
+
+ # Crop the valid regions from the predictions.
+ semantic_predictions = raw_predictions[
+ :, :resized_image_size[0], :resized_image_size[1]]
+ semantic_probabilities = raw_probabilities[
+ :, :resized_image_size[0], :resized_image_size[1]]
+
+ # Resize back the prediction to the original image size.
+ def _resize_label(label, label_size):
+ # Expand dimension of label to [1, height, width, 1] for resize operation.
+ label = tf.expand_dims(label, 3)
+ resized_label = tf.image.resize_images(
+ label,
+ label_size,
+ method=tf.image.ResizeMethod.NEAREST_NEIGHBOR,
+ align_corners=True)
+ return tf.cast(tf.squeeze(resized_label, 3), tf.int32)
+ semantic_predictions = _resize_label(semantic_predictions, image_size)
+ semantic_predictions = tf.identity(semantic_predictions, name=_OUTPUT_NAME)
+
+ semantic_probabilities = tf.image.resize_bilinear(
+ semantic_probabilities, image_size, align_corners=True,
+ name=_OUTPUT_PROB_NAME)
+
+ if FLAGS.quantize_delay_step >= 0:
+ contrib_quantize.create_eval_graph()
+
+ saver = tf.train.Saver(tf.all_variables())
+
+ dirname = os.path.dirname(FLAGS.export_path)
+ tf.gfile.MakeDirs(dirname)
+ graph_def = tf.get_default_graph().as_graph_def(add_shapes=True)
+ freeze_graph.freeze_graph_with_def_protos(
+ graph_def,
+ saver.as_saver_def(),
+ FLAGS.checkpoint_path,
+ _OUTPUT_NAME + ',' + _OUTPUT_PROB_NAME,
+ restore_op_name=None,
+ filename_tensor_name=None,
+ output_graph=FLAGS.export_path,
+ clear_devices=True,
+ initializer_nodes=None)
+
+ if FLAGS.save_inference_graph:
+ tf.train.write_graph(graph_def, dirname, 'inference_graph.pbtxt')
+
+
+if __name__ == '__main__':
+ flags.mark_flag_as_required('checkpoint_path')
+ flags.mark_flag_as_required('export_path')
+ tf.app.run()
diff --git a/deeplab/models/research/deeplab/g3doc/ade20k.md b/deeplab/models/research/deeplab/g3doc/ade20k.md
new file mode 100644
index 0000000..9505ab2
--- /dev/null
+++ b/deeplab/models/research/deeplab/g3doc/ade20k.md
@@ -0,0 +1,107 @@
+# Running DeepLab on ADE20K Semantic Segmentation Dataset
+
+This page walks through the steps required to run DeepLab on ADE20K dataset on a
+local machine.
+
+## Download dataset and convert to TFRecord
+
+We have prepared the script (under the folder `datasets`) to download and
+convert ADE20K semantic segmentation dataset to TFRecord.
+
+```bash
+# From the tensorflow/models/research/deeplab/datasets directory.
+bash download_and_convert_ade20k.sh
+```
+
+The converted dataset will be saved at ./deeplab/datasets/ADE20K/tfrecord
+
+## Recommended Directory Structure for Training and Evaluation
+
+```
++ datasets
+ - build_data.py
+ - build_ade20k_data.py
+ - download_and_convert_ade20k.sh
+ + ADE20K
+ + tfrecord
+ + exp
+ + train_on_train_set
+ + train
+ + eval
+ + vis
+ + ADEChallengeData2016
+ + annotations
+ + training
+ + validation
+ + images
+ + training
+ + validation
+```
+
+where the folder `train_on_train_set` stores the train/eval/vis events and
+results (when training DeepLab on the ADE20K train set).
+
+## Running the train/eval/vis jobs
+
+A local training job using `xception_65` can be run with the following command:
+
+```bash
+# From tensorflow/models/research/
+python deeplab/train.py \
+ --logtostderr \
+ --training_number_of_steps=150000 \
+ --train_split="train" \
+ --model_variant="xception_65" \
+ --atrous_rates=6 \
+ --atrous_rates=12 \
+ --atrous_rates=18 \
+ --output_stride=16 \
+ --decoder_output_stride=4 \
+ --train_crop_size="513,513" \
+ --train_batch_size=4 \
+ --min_resize_value=513 \
+ --max_resize_value=513 \
+ --resize_factor=16 \
+ --dataset="ade20k" \
+ --tf_initial_checkpoint=${PATH_TO_INITIAL_CHECKPOINT} \
+ --train_logdir=${PATH_TO_TRAIN_DIR}\
+ --dataset_dir=${PATH_TO_DATASET}
+```
+
+where ${PATH\_TO\_INITIAL\_CHECKPOINT} is the path to the initial checkpoint.
+${PATH\_TO\_TRAIN\_DIR} is the directory in which training checkpoints and
+events will be written to (it is recommended to set it to the
+`train_on_train_set/train` above), and ${PATH\_TO\_DATASET} is the directory in
+which the ADE20K dataset resides (the `tfrecord` above)
+
+**Note that for train.py:**
+
+1. In order to fine tune the BN layers, one needs to use large batch size (>
+ 12), and set fine_tune_batch_norm = True. Here, we simply use small batch
+ size during training for the purpose of demonstration. If the users have
+ limited GPU memory at hand, please fine-tune from our provided checkpoints
+ whose batch norm parameters have been trained, and use smaller learning rate
+ with fine_tune_batch_norm = False.
+
+2. User should fine tune the `min_resize_value` and `max_resize_value` to get
+ better result. Note that `resize_factor` has to be equal to `output_stride`.
+
+3. The users should change atrous_rates from [6, 12, 18] to [12, 24, 36] if
+ setting output_stride=8.
+
+4. The users could skip the flag, `decoder_output_stride`, if you do not want
+ to use the decoder structure.
+
+## Running Tensorboard
+
+Progress for training and evaluation jobs can be inspected using Tensorboard. If
+using the recommended directory structure, Tensorboard can be run using the
+following command:
+
+```bash
+tensorboard --logdir=${PATH_TO_LOG_DIRECTORY}
+```
+
+where `${PATH_TO_LOG_DIRECTORY}` points to the directory that contains the train
+directorie (e.g., the folder `train_on_train_set` in the above example). Please
+note it may take Tensorboard a couple minutes to populate with data.
diff --git a/deeplab/models/research/deeplab/g3doc/cityscapes.md b/deeplab/models/research/deeplab/g3doc/cityscapes.md
new file mode 100644
index 0000000..5a660aa
--- /dev/null
+++ b/deeplab/models/research/deeplab/g3doc/cityscapes.md
@@ -0,0 +1,159 @@
+# Running DeepLab on Cityscapes Semantic Segmentation Dataset
+
+This page walks through the steps required to run DeepLab on Cityscapes on a
+local machine.
+
+## Download dataset and convert to TFRecord
+
+We have prepared the script (under the folder `datasets`) to convert Cityscapes
+dataset to TFRecord. The users are required to download the dataset beforehand
+by registering the [website](https://www.cityscapes-dataset.com/).
+
+```bash
+# From the tensorflow/models/research/deeplab/datasets directory.
+sh convert_cityscapes.sh
+```
+
+The converted dataset will be saved at ./deeplab/datasets/cityscapes/tfrecord.
+
+## Recommended Directory Structure for Training and Evaluation
+
+```
++ datasets
+ + cityscapes
+ + leftImg8bit
+ + gtFine
+ + tfrecord
+ + exp
+ + train_on_train_set
+ + train
+ + eval
+ + vis
+```
+
+where the folder `train_on_train_set` stores the train/eval/vis events and
+results (when training DeepLab on the Cityscapes train set).
+
+## Running the train/eval/vis jobs
+
+A local training job using `xception_65` can be run with the following command:
+
+```bash
+# From tensorflow/models/research/
+python deeplab/train.py \
+ --logtostderr \
+ --training_number_of_steps=90000 \
+ --train_split="train_fine" \
+ --model_variant="xception_65" \
+ --atrous_rates=6 \
+ --atrous_rates=12 \
+ --atrous_rates=18 \
+ --output_stride=16 \
+ --decoder_output_stride=4 \
+ --train_crop_size="769,769" \
+ --train_batch_size=1 \
+ --dataset="cityscapes" \
+ --tf_initial_checkpoint=${PATH_TO_INITIAL_CHECKPOINT} \
+ --train_logdir=${PATH_TO_TRAIN_DIR} \
+ --dataset_dir=${PATH_TO_DATASET}
+```
+
+where ${PATH_TO_INITIAL_CHECKPOINT} is the path to the initial checkpoint
+(usually an ImageNet pretrained checkpoint), ${PATH_TO_TRAIN_DIR} is the
+directory in which training checkpoints and events will be written to, and
+${PATH_TO_DATASET} is the directory in which the Cityscapes dataset resides.
+
+**Note that for {train,eval,vis}.py**:
+
+1. In order to reproduce our results, one needs to use large batch size (> 8),
+ and set fine_tune_batch_norm = True. Here, we simply use small batch size
+ during training for the purpose of demonstration. If the users have limited
+ GPU memory at hand, please fine-tune from our provided checkpoints whose
+ batch norm parameters have been trained, and use smaller learning rate with
+ fine_tune_batch_norm = False.
+
+2. The users should change atrous_rates from [6, 12, 18] to [12, 24, 36] if
+ setting output_stride=8.
+
+3. The users could skip the flag, `decoder_output_stride`, if you do not want
+ to use the decoder structure.
+
+4. Change and add the following flags in order to use the provided dense
+ prediction cell. Note we need to set decoder_output_stride if you want to
+ use the provided checkpoints which include the decoder module.
+
+```bash
+--model_variant="xception_71"
+--dense_prediction_cell_json="deeplab/core/dense_prediction_cell_branch5_top1_cityscapes.json"
+--decoder_output_stride=4
+```
+
+A local evaluation job using `xception_65` can be run with the following
+command:
+
+```bash
+# From tensorflow/models/research/
+python deeplab/eval.py \
+ --logtostderr \
+ --eval_split="val_fine" \
+ --model_variant="xception_65" \
+ --atrous_rates=6 \
+ --atrous_rates=12 \
+ --atrous_rates=18 \
+ --output_stride=16 \
+ --decoder_output_stride=4 \
+ --eval_crop_size="1025,2049" \
+ --dataset="cityscapes" \
+ --checkpoint_dir=${PATH_TO_CHECKPOINT} \
+ --eval_logdir=${PATH_TO_EVAL_DIR} \
+ --dataset_dir=${PATH_TO_DATASET}
+```
+
+where ${PATH_TO_CHECKPOINT} is the path to the trained checkpoint (i.e., the
+path to train_logdir), ${PATH_TO_EVAL_DIR} is the directory in which evaluation
+events will be written to, and ${PATH_TO_DATASET} is the directory in which the
+Cityscapes dataset resides.
+
+A local visualization job using `xception_65` can be run with the following
+command:
+
+```bash
+# From tensorflow/models/research/
+python deeplab/vis.py \
+ --logtostderr \
+ --vis_split="val_fine" \
+ --model_variant="xception_65" \
+ --atrous_rates=6 \
+ --atrous_rates=12 \
+ --atrous_rates=18 \
+ --output_stride=16 \
+ --decoder_output_stride=4 \
+ --vis_crop_size="1025,2049" \
+ --dataset="cityscapes" \
+ --colormap_type="cityscapes" \
+ --checkpoint_dir=${PATH_TO_CHECKPOINT} \
+ --vis_logdir=${PATH_TO_VIS_DIR} \
+ --dataset_dir=${PATH_TO_DATASET}
+```
+
+where ${PATH_TO_CHECKPOINT} is the path to the trained checkpoint (i.e., the
+path to train_logdir), ${PATH_TO_VIS_DIR} is the directory in which evaluation
+events will be written to, and ${PATH_TO_DATASET} is the directory in which the
+Cityscapes dataset resides. Note that if the users would like to save the
+segmentation results for evaluation server, set also_save_raw_predictions =
+True.
+
+## Running Tensorboard
+
+Progress for training and evaluation jobs can be inspected using Tensorboard. If
+using the recommended directory structure, Tensorboard can be run using the
+following command:
+
+```bash
+tensorboard --logdir=${PATH_TO_LOG_DIRECTORY}
+```
+
+where `${PATH_TO_LOG_DIRECTORY}` points to the directory that contains the
+train, eval, and vis directories (e.g., the folder `train_on_train_set` in the
+above example). Please note it may take Tensorboard a couple minutes to populate
+with data.
diff --git a/deeplab/models/research/deeplab/g3doc/export_model.md b/deeplab/models/research/deeplab/g3doc/export_model.md
new file mode 100644
index 0000000..c41649e
--- /dev/null
+++ b/deeplab/models/research/deeplab/g3doc/export_model.md
@@ -0,0 +1,23 @@
+# Export trained deeplab model to frozen inference graph
+
+After model training finishes, you could export it to a frozen TensorFlow
+inference graph proto. Your trained model checkpoint usually includes the
+following files:
+
+* model.ckpt-${CHECKPOINT_NUMBER}.data-00000-of-00001,
+* model.ckpt-${CHECKPOINT_NUMBER}.index
+* model.ckpt-${CHECKPOINT_NUMBER}.meta
+
+After you have identified a candidate checkpoint to export, you can run the
+following commandline to export to a frozen graph:
+
+```bash
+# From tensorflow/models/research/
+# Assume all checkpoint files share the same path prefix `${CHECKPOINT_PATH}`.
+python deeplab/export_model.py \
+ --checkpoint_path=${CHECKPOINT_PATH} \
+ --export_path=${OUTPUT_DIR}/frozen_inference_graph.pb
+```
+
+Please also add other model specific flags as you use for training, such as
+`model_variant`, `add_image_level_feature`, etc.
diff --git a/deeplab/models/research/deeplab/g3doc/faq.md b/deeplab/models/research/deeplab/g3doc/faq.md
new file mode 100644
index 0000000..26ff4b3
--- /dev/null
+++ b/deeplab/models/research/deeplab/g3doc/faq.md
@@ -0,0 +1,87 @@
+# FAQ
+___
+Q1: What if I want to use other network backbones, such as ResNet [1], instead of only those provided ones (e.g., Xception)?
+
+A: The users could modify the provided core/feature_extractor.py to support more network backbones.
+___
+Q2: What if I want to train the model on other datasets?
+
+A: The users could modify the provided dataset/build_{cityscapes,voc2012}_data.py and dataset/segmentation_dataset.py to build their own dataset.
+___
+Q3: Where can I download the PASCAL VOC augmented training set?
+
+A: The PASCAL VOC augmented training set is provided by Bharath Hariharan et al. [2] Please refer to their [website](http://home.bharathh.info/pubs/codes/SBD/download.html) for details and consider citing their paper if using the dataset.
+___
+Q4: Why the implementation does not include DenseCRF [3]?
+
+A: We have not tried this. The interested users could take a look at Philipp KrähenbĂ¼hl's [website](http://graphics.stanford.edu/projects/densecrf/) and [paper](https://arxiv.org/abs/1210.5644) for details.
+___
+Q5: What if I want to train the model and fine-tune the batch normalization parameters?
+
+A: If given the limited resource at hand, we would suggest you simply fine-tune
+from our provided checkpoint whose batch-norm parameters have been trained (i.e.,
+train with a smaller learning rate, set `fine_tune_batch_norm = false`, and
+employ longer training iterations since the learning rate is small). If
+you really would like to train by yourself, we would suggest
+
+1. Set `output_stride = 16` or maybe even `32` (remember to change the flag
+`atrous_rates` accordingly, e.g., `atrous_rates = [3, 6, 9]` for
+`output_stride = 32`).
+
+2. Use as many GPUs as possible (change the flag `num_clones` in train.py) and
+set `train_batch_size` as large as possible.
+
+3. Adjust the `train_crop_size` in train.py. Maybe set it to be smaller, e.g.,
+513x513 (or even 321x321), so that you could use a larger batch size.
+
+4. Use a smaller network backbone, such as MobileNet-v2.
+
+___
+Q6: How can I train the model asynchronously?
+
+A: In the train.py, the users could set `num_replicas` (number of machines for training) and `num_ps_tasks` (we usually set `num_ps_tasks` = `num_replicas` / 2). See slim.deployment.model_deploy for more details.
+___
+Q7: I could not reproduce the performance even with the provided checkpoints.
+
+A: Please try running
+
+```bash
+# Run the simple test with Xception_65 as network backbone.
+sh local_test.sh
+```
+
+or
+
+```bash
+# Run the simple test with MobileNet-v2 as network backbone.
+sh local_test_mobilenetv2.sh
+```
+
+First, make sure you could reproduce the results with our provided setting.
+After that, you could start to make a new change one at a time to help debug.
+___
+Q8: What value of `eval_crop_size` should I use?
+
+A: Our model uses whole-image inference, meaning that we need to set `eval_crop_size` equal to `output_stride` * k + 1, where k is an integer and set k so that the resulting `eval_crop_size` is slightly larger the largest
+image dimension in the dataset. For example, we have `eval_crop_size` = 513x513 for PASCAL dataset whose largest image dimension is 512. Similarly, we set `eval_crop_size` = 1025x2049 for Cityscapes images whose
+image dimension is all equal to 1024x2048.
+___
+Q9: Why multi-gpu training is slow?
+
+A: Please try to use more threads to pre-process the inputs. For, example change [num_readers = 4](https://github.com/tensorflow/models/blob/master/research/deeplab/train.py#L457).
+___
+
+
+## References
+
+1. **Deep Residual Learning for Image Recognition**
+ Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun
+ [[link]](https://arxiv.org/abs/1512.03385), In CVPR, 2016.
+
+2. **Semantic Contours from Inverse Detectors**
+ Bharath Hariharan, Pablo Arbelaez, Lubomir Bourdev, Subhransu Maji, Jitendra Malik
+ [[link]](http://home.bharathh.info/pubs/codes/SBD/download.html), In ICCV, 2011.
+
+3. **Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials**
+ Philipp KrähenbĂ¼hl, Vladlen Koltun
+ [[link]](http://graphics.stanford.edu/projects/densecrf/), In NIPS, 2011.
diff --git a/deeplab/models/research/deeplab/g3doc/img/image1.jpg b/deeplab/models/research/deeplab/g3doc/img/image1.jpg
new file mode 100644
index 0000000..939b6f9
Binary files /dev/null and b/deeplab/models/research/deeplab/g3doc/img/image1.jpg differ
diff --git a/deeplab/models/research/deeplab/g3doc/img/image2.jpg b/deeplab/models/research/deeplab/g3doc/img/image2.jpg
new file mode 100644
index 0000000..5ec1b8a
Binary files /dev/null and b/deeplab/models/research/deeplab/g3doc/img/image2.jpg differ
diff --git a/deeplab/models/research/deeplab/g3doc/img/image3.jpg b/deeplab/models/research/deeplab/g3doc/img/image3.jpg
new file mode 100644
index 0000000..d788e3d
Binary files /dev/null and b/deeplab/models/research/deeplab/g3doc/img/image3.jpg differ
diff --git a/deeplab/models/research/deeplab/g3doc/img/image_info.txt b/deeplab/models/research/deeplab/g3doc/img/image_info.txt
new file mode 100644
index 0000000..583d113
--- /dev/null
+++ b/deeplab/models/research/deeplab/g3doc/img/image_info.txt
@@ -0,0 +1,13 @@
+Image provenance:
+
+image1.jpg: Philippe Put,
+ https://www.flickr.com/photos/34547181@N00/14499172124
+
+image2.jpg: Peretz Partensky
+ https://www.flickr.com/photos/ifl/3926001309
+
+image3.jpg: Peter Harrison
+ https://www.flickr.com/photos/devcentre/392585679
+
+
+vis[1-3].png: Showing original image together with DeepLab segmentation map.
diff --git a/deeplab/models/research/deeplab/g3doc/img/vis1.png b/deeplab/models/research/deeplab/g3doc/img/vis1.png
new file mode 100644
index 0000000..41b8ecd
Binary files /dev/null and b/deeplab/models/research/deeplab/g3doc/img/vis1.png differ
diff --git a/deeplab/models/research/deeplab/g3doc/img/vis2.png b/deeplab/models/research/deeplab/g3doc/img/vis2.png
new file mode 100644
index 0000000..7fa7a4c
Binary files /dev/null and b/deeplab/models/research/deeplab/g3doc/img/vis2.png differ
diff --git a/deeplab/models/research/deeplab/g3doc/img/vis3.png b/deeplab/models/research/deeplab/g3doc/img/vis3.png
new file mode 100644
index 0000000..813b634
Binary files /dev/null and b/deeplab/models/research/deeplab/g3doc/img/vis3.png differ
diff --git a/deeplab/models/research/deeplab/g3doc/installation.md b/deeplab/models/research/deeplab/g3doc/installation.md
new file mode 100644
index 0000000..591a1f8
--- /dev/null
+++ b/deeplab/models/research/deeplab/g3doc/installation.md
@@ -0,0 +1,73 @@
+# Installation
+
+## Dependencies
+
+DeepLab depends on the following libraries:
+
+* Numpy
+* Pillow 1.0
+* tf Slim (which is included in the "tensorflow/models/research/" checkout)
+* Jupyter notebook
+* Matplotlib
+* Tensorflow
+
+For detailed steps to install Tensorflow, follow the [Tensorflow installation
+instructions](https://www.tensorflow.org/install/). A typical user can install
+Tensorflow using one of the following commands:
+
+```bash
+# For CPU
+pip install tensorflow
+# For GPU
+pip install tensorflow-gpu
+```
+
+The remaining libraries can be installed on Ubuntu 14.04 using via apt-get:
+
+```bash
+sudo apt-get install python-pil python-numpy
+pip install --user jupyter
+pip install --user matplotlib
+pip install --user PrettyTable
+```
+
+## Add Libraries to PYTHONPATH
+
+When running locally, the tensorflow/models/research/ directory should be
+appended to PYTHONPATH. This can be done by running the following from
+tensorflow/models/research/:
+
+```bash
+# From tensorflow/models/research/
+export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim
+
+# [Optional] for panoptic evaluation, you might need panopticapi:
+# https://github.com/cocodataset/panopticapi
+# Please clone it to a local directory ${PANOPTICAPI_DIR}
+touch ${PANOPTICAPI_DIR}/panopticapi/__init__.py
+export PYTHONPATH=$PYTHONPATH:${PANOPTICAPI_DIR}/panopticapi
+```
+
+Note: This command needs to run from every new terminal you start. If you wish
+to avoid running this manually, you can add it as a new line to the end of your
+~/.bashrc file.
+
+# Testing the Installation
+
+You can test if you have successfully installed the Tensorflow DeepLab by
+running the following commands:
+
+Quick test by running model_test.py:
+
+```bash
+# From tensorflow/models/research/
+python deeplab/model_test.py
+```
+
+Quick running the whole code on the PASCAL VOC 2012 dataset:
+
+```bash
+# From tensorflow/models/research/deeplab
+bash local_test.sh
+```
+
diff --git a/deeplab/models/research/deeplab/g3doc/model_zoo.md b/deeplab/models/research/deeplab/g3doc/model_zoo.md
new file mode 100644
index 0000000..76972dc
--- /dev/null
+++ b/deeplab/models/research/deeplab/g3doc/model_zoo.md
@@ -0,0 +1,254 @@
+# TensorFlow DeepLab Model Zoo
+
+We provide deeplab models pretrained several datasets, including (1) PASCAL VOC
+2012, (2) Cityscapes, and (3) ADE20K for reproducing our results, as well as
+some checkpoints that are only pretrained on ImageNet for training your own
+models.
+
+## DeepLab models trained on PASCAL VOC 2012
+
+Un-tar'ed directory includes:
+
+* a frozen inference graph (`frozen_inference_graph.pb`). All frozen inference
+ graphs by default use output stride of 8, a single eval scale of 1.0 and
+ no left-right flips, unless otherwise specified. MobileNet-v2 based models
+ do not include the decoder module.
+
+* a checkpoint (`model.ckpt.data-00000-of-00001`, `model.ckpt.index`)
+
+### Model details
+
+We provide several checkpoints that have been pretrained on VOC 2012 train_aug
+set or train_aug + trainval set. In the former case, one could train their model
+with smaller batch size and freeze batch normalization when limited GPU memory
+is available, since we have already fine-tuned the batch normalization for you.
+In the latter case, one could directly evaluate the checkpoints on VOC 2012 test
+set or use this checkpoint for demo. Note *MobileNet-v2* based models do not
+employ ASPP and decoder modules for fast computation.
+
+Checkpoint name | Network backbone | Pretrained dataset | ASPP | Decoder
+--------------------------- | :--------------: | :-----------------: | :---: | :-----:
+mobilenetv2_dm05_coco_voc_trainaug | MobileNet-v2 Depth-Multiplier = 0.5 | ImageNet MS-COCO VOC 2012 train_aug set| N/A | N/A
+mobilenetv2_dm05_coco_voc_trainval | MobileNet-v2 Depth-Multiplier = 0.5 | ImageNet MS-COCO VOC 2012 train_aug + trainval sets | N/A | N/A
+mobilenetv2_coco_voc_trainaug | MobileNet-v2 | ImageNet MS-COCO VOC 2012 train_aug set| N/A | N/A
+mobilenetv2_coco_voc_trainval | MobileNet-v2 | ImageNet MS-COCO VOC 2012 train_aug + trainval sets | N/A | N/A
+xception65_coco_voc_trainaug | Xception_65 | ImageNet MS-COCO VOC 2012 train_aug set| [6,12,18] for OS=16 [12,24,36] for OS=8 | OS = 4
+xception65_coco_voc_trainval | Xception_65 | ImageNet MS-COCO VOC 2012 train_aug + trainval sets | [6,12,18] for OS=16 [12,24,36] for OS=8 | OS = 4
+
+In the table, **OS** denotes output stride.
+
+Checkpoint name | Eval OS | Eval scales | Left-right Flip | Multiply-Adds | Runtime (sec) | PASCAL mIOU | File Size
+------------------------------------------------------------------------------------------------------------------------ | :-------: | :------------------------: | :-------------: | :------------------: | :------------: | :----------------------------: | :-------:
+[mobilenetv2_dm05_coco_voc_trainaug](http://download.tensorflow.org/models/deeplabv3_mnv2_dm05_pascal_trainaug_2018_10_01.tar.gz) | 16 | [1.0] | No | 0.88B | - | 70.19% (val) | 7.6MB
+[mobilenetv2_dm05_coco_voc_trainval](http://download.tensorflow.org/models/deeplabv3_mnv2_dm05_pascal_trainval_2018_10_01.tar.gz) | 8 | [1.0] | No | 2.84B | - | 71.83% (test) | 7.6MB
+[mobilenetv2_coco_voc_trainaug](http://download.tensorflow.org/models/deeplabv3_mnv2_pascal_train_aug_2018_01_29.tar.gz) | 16 8 | [1.0] [0.5:0.25:1.75] | No Yes | 2.75B 152.59B | 0.1 26.9 | 75.32% (val) 77.33 (val) | 23MB
+[mobilenetv2_coco_voc_trainval](http://download.tensorflow.org/models/deeplabv3_mnv2_pascal_trainval_2018_01_29.tar.gz) | 8 | [0.5:0.25:1.75] | Yes | 152.59B | 26.9 | 80.25% (**test**) | 23MB
+[xception65_coco_voc_trainaug](http://download.tensorflow.org/models/deeplabv3_pascal_train_aug_2018_01_04.tar.gz) | 16 8 | [1.0] [0.5:0.25:1.75] | No Yes | 54.17B 3055.35B | 0.7 223.2 | 82.20% (val) 83.58% (val) | 439MB
+[xception65_coco_voc_trainval](http://download.tensorflow.org/models/deeplabv3_pascal_trainval_2018_01_04.tar.gz) | 8 | [0.5:0.25:1.75] | Yes | 3055.35B | 223.2 | 87.80% (**test**) | 439MB
+
+In the table, we report both computation complexity (in terms of Multiply-Adds
+and CPU Runtime) and segmentation performance (in terms of mIOU) on the PASCAL
+VOC val or test set. The reported runtime is calculated by tfprof on a
+workstation with CPU E5-1650 v3 @ 3.50GHz and 32GB memory. Note that applying
+multi-scale inputs and left-right flips increases the segmentation performance
+but also significantly increases the computation and thus may not be suitable
+for real-time applications.
+
+## DeepLab models trained on Cityscapes
+
+### Model details
+
+We provide several checkpoints that have been pretrained on Cityscapes
+train_fine set. Note *MobileNet-v2* based model has been pretrained on MS-COCO
+dataset and does not employ ASPP and decoder modules for fast computation.
+
+Checkpoint name | Network backbone | Pretrained dataset | ASPP | Decoder
+------------------------------------- | :--------------: | :-------------------------------------: | :----------------------------------------------: | :-----:
+mobilenetv2_coco_cityscapes_trainfine | MobileNet-v2 | ImageNet MS-COCO Cityscapes train_fine set | N/A | N/A
+mobilenetv3_large_cityscapes_trainfine | MobileNet-v3 Large | Cityscapes train_fine set (No ImageNet) | N/A | OS = 8
+mobilenetv3_small_cityscapes_trainfine | MobileNet-v3 Small | Cityscapes train_fine set (No ImageNet) | N/A | OS = 8
+xception65_cityscapes_trainfine | Xception_65 | ImageNet Cityscapes train_fine set | [6, 12, 18] for OS=16 [12, 24, 36] for OS=8 | OS = 4
+xception71_dpc_cityscapes_trainfine | Xception_71 | ImageNet MS-COCO Cityscapes train_fine set | Dense Prediction Cell | OS = 4
+xception71_dpc_cityscapes_trainval | Xception_71 | ImageNet MS-COCO Cityscapes trainval_fine and coarse set | Dense Prediction Cell | OS = 4
+
+In the table, **OS** denotes output stride.
+
+Note for mobilenet v3 models, we use additional commandline flags as follows:
+
+```
+--model_variant={ mobilenet_v3_large_seg | mobilenet_v3_small_seg }
+--image_pooling_crop_size=769,769
+--image_pooling_stride=4,5
+--add_image_level_feature=1
+--aspp_convs_filters=128
+--aspp_with_concat_projection=0
+--aspp_with_squeeze_and_excitation=1
+--decoder_use_sum_merge=1
+--decoder_filters=19
+--decoder_output_is_logits=1
+--image_se_uses_qsigmoid=1
+--decoder_output_stride=8
+--output_stride=32
+```
+
+Checkpoint name | Eval OS | Eval scales | Left-right Flip | Multiply-Adds | Runtime (sec) | Cityscapes mIOU | File Size
+-------------------------------------------------------------------------------------------------------------------------------- | :-------: | :-------------------------: | :-------------: | :-------------------: | :------------: | :----------------------------: | :-------:
+[mobilenetv2_coco_cityscapes_trainfine](http://download.tensorflow.org/models/deeplabv3_mnv2_cityscapes_train_2018_02_05.tar.gz) | 16 8 | [1.0] [0.75:0.25:1.25] | No Yes | 21.27B 433.24B | 0.8 51.12 | 70.71% (val) 73.57% (val) | 23MB
+[mobilenetv3_large_cityscapes_trainfine](http://download.tensorflow.org/models/deeplab_mnv3_large_cityscapes_trainfine_2019_11_15.tar.gz) | 32 | [1.0] | No | 15.95B | 0.6 | 72.41% (val) | 17MB
+[mobilenetv3_small_cityscapes_trainfine](http://download.tensorflow.org/models/deeplab_mnv3_small_cityscapes_trainfine_2019_11_15.tar.gz) | 32 | [1.0] | No | 4.63B | 0.4 | 68.99% (val) | 5MB
+[xception65_cityscapes_trainfine](http://download.tensorflow.org/models/deeplabv3_cityscapes_train_2018_02_06.tar.gz) | 16 8 | [1.0] [0.75:0.25:1.25] | No Yes | 418.64B 8677.92B | 5.0 422.8 | 78.79% (val) 80.42% (val) | 439MB
+[xception71_dpc_cityscapes_trainfine](http://download.tensorflow.org/models/deeplab_cityscapes_xception71_trainfine_2018_09_08.tar.gz) | 16 | [1.0] | No | 502.07B | - | 80.31% (val) | 445MB
+[xception71_dpc_cityscapes_trainval](http://download.tensorflow.org/models/deeplab_cityscapes_xception71_trainvalfine_2018_09_08.tar.gz) | 8 | [0.75:0.25:2] | Yes | - | - | 82.66% (**test**) | 446MB
+
+### EdgeTPU-DeepLab models on Cityscapes
+
+EdgeTPU is Google's machine learning accelerator architecture for edge devices
+(exists in Coral devices and Pixel4's Neural Core). Leveraging nerual
+architecture search (NAS, also named as Auto-ML) algorithms,
+[EdgeTPU-Mobilenet](https://github.com/tensorflow/models/tree/master/research/slim/nets/mobilenet)
+has been released which yields higher hardware utilization, lower latency, as
+well as better accuracy over Mobilenet-v2/v3. We use EdgeTPU-Mobilenet as the
+backbone and provide checkpoints that have been pretrained on Cityscapes
+train_fine set. We named them as EdgeTPU-DeepLab models.
+
+Checkpoint name | Network backbone | Pretrained dataset | ASPP | Decoder
+-------------------- | :----------------: | :----------------: | :--: | :-----:
+EdgeTPU-DeepLab | EdgeMobilenet-1.0 | ImageNet | N/A | N/A
+EdgeTPU-DeepLab-slim | EdgeMobilenet-0.75 | ImageNet | N/A | N/A
+
+For EdgeTPU-DeepLab-slim, the backbone feature extractor has depth multiplier =
+0.75 and aspp_convs_filters = 128. We do not employ ASPP nor decoder modules to
+further reduce the latency. We employ the same train/eval flags used for
+MobileNet-v2 DeepLab model. Flags changed for EdgeTPU-DeepLab model are listed
+here.
+
+```
+--decoder_output_stride=''
+--aspp_convs_filters=256
+--model_variant=mobilenet_edgetpu
+```
+
+For EdgeTPU-DeepLab-slim, also include the following flags.
+
+```
+--depth_multiplier=0.75
+--aspp_convs_filters=128
+```
+
+Checkpoint name | Eval OS | Eval scales | Cityscapes mIOU | Multiply-Adds | Simulator latency on Pixel 4 EdgeTPU
+---------------------------------------------------------------------------------------------------- | :--------: | :---------: | :--------------------------: | :------------: | :----------------------------------:
+[EdgeTPU-DeepLab](http://download.tensorflow.org/models/edgetpu-deeplab_2020_03_09.tar.gz) | 32 16 | [1.0] | 70.6% (val) 74.1% (val) | 5.6B 7.1B | 13.8 ms 17.5 ms
+[EdgeTPU-DeepLab-slim](http://download.tensorflow.org/models/edgetpu-deeplab-slim_2020_03_09.tar.gz) | 32 16 | [1.0] | 70.0% (val) 73.2% (val) | 3.5B 4.3B | 9.9 ms 13.2 ms
+
+## DeepLab models trained on ADE20K
+
+### Model details
+
+We provide some checkpoints that have been pretrained on ADE20K training set.
+Note that the model has only been pretrained on ImageNet, following the
+dataset rule.
+
+Checkpoint name | Network backbone | Pretrained dataset | ASPP | Decoder | Input size
+------------------------------------- | :--------------: | :-------------------------------------: | :----------------------------------------------: | :-----: | :-----:
+mobilenetv2_ade20k_train | MobileNet-v2 | ImageNet ADE20K training set | N/A | OS = 4 | 257x257
+xception65_ade20k_train | Xception_65 | ImageNet ADE20K training set | [6, 12, 18] for OS=16 [12, 24, 36] for OS=8 | OS = 4 | 513x513
+
+The input dimensions of ADE20K have a huge amount of variation. We resize inputs so that the longest size is 257 for MobileNet-v2 (faster inference) and 513 for Xception_65 (better performation). Note that we also include the decoder module in the MobileNet-v2 checkpoint.
+
+Checkpoint name | Eval OS | Eval scales | Left-right Flip | mIOU | Pixel-wise Accuracy | File Size
+------------------------------------- | :-------: | :-------------------------: | :-------------: | :-------------------: | :-------------------: | :-------:
+[mobilenetv2_ade20k_train](http://download.tensorflow.org/models/deeplabv3_mnv2_ade20k_train_2018_12_03.tar.gz) | 16 | [1.0] | No | 32.04% (val) | 75.41% (val) | 24.8MB
+[xception65_ade20k_train](http://download.tensorflow.org/models/deeplabv3_xception_ade20k_train_2018_05_29.tar.gz) | 8 | [0.5:0.25:1.75] | Yes | 45.65% (val) | 82.52% (val) | 439MB
+
+
+## Checkpoints pretrained on ImageNet
+
+Un-tar'ed directory includes:
+
+* model checkpoint (`model.ckpt.data-00000-of-00001`, `model.ckpt.index`).
+
+### Model details
+
+We also provide some checkpoints that are pretrained on ImageNet and/or COCO (as
+post-fixed in the model name) so that one could use this for training your own
+models.
+
+* mobilenet_v2: We refer the interested users to the TensorFlow open source
+ [MobileNet-V2](https://github.com/tensorflow/models/tree/master/research/slim/nets/mobilenet)
+ for details.
+
+* xception_{41,65,71}: We adapt the original Xception model to the task of
+ semantic segmentation with the following changes: (1) more layers, (2) all
+ max pooling operations are replaced by strided (atrous) separable
+ convolutions, and (3) extra batch-norm and ReLU after each 3x3 depthwise
+ convolution are added. We provide three Xception model variants with
+ different network depths.
+
+* resnet_v1_{50,101}_beta: We modify the original ResNet-101 [10], similar to
+ PSPNet [11] by replacing the first 7x7 convolution with three 3x3
+ convolutions. See resnet_v1_beta.py for more details.
+
+Model name | File Size
+-------------------------------------------------------------------------------------- | :-------:
+[xception_41_imagenet](http://download.tensorflow.org/models/xception_41_2018_05_09.tar.gz ) | 288MB
+[xception_65_imagenet](http://download.tensorflow.org/models/deeplabv3_xception_2018_01_04.tar.gz) | 447MB
+[xception_65_imagenet_coco](http://download.tensorflow.org/models/xception_65_coco_pretrained_2018_10_02.tar.gz) | 292MB
+[xception_71_imagenet](http://download.tensorflow.org/models/xception_71_2018_05_09.tar.gz ) | 474MB
+[resnet_v1_50_beta_imagenet](http://download.tensorflow.org/models/resnet_v1_50_2018_05_04.tar.gz) | 274MB
+[resnet_v1_101_beta_imagenet](http://download.tensorflow.org/models/resnet_v1_101_2018_05_04.tar.gz) | 477MB
+
+## References
+
+1. **Mobilenets: Efficient convolutional neural networks for mobile vision applications**
+ Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, Hartwig Adam
+ [[link]](https://arxiv.org/abs/1704.04861). arXiv:1704.04861, 2017.
+
+2. **Inverted Residuals and Linear Bottlenecks: Mobile Networks for Classification, Detection and Segmentation**
+ Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, Liang-Chieh Chen
+ [[link]](https://arxiv.org/abs/1801.04381). arXiv:1801.04381, 2018.
+
+3. **Xception: Deep Learning with Depthwise Separable Convolutions**
+ François Chollet
+ [[link]](https://arxiv.org/abs/1610.02357). In the Proc. of CVPR, 2017.
+
+4. **Deformable Convolutional Networks -- COCO Detection and Segmentation Challenge 2017 Entry**
+ Haozhi Qi, Zheng Zhang, Bin Xiao, Han Hu, Bowen Cheng, Yichen Wei, Jifeng Dai
+ [[link]](http://presentations.cocodataset.org/COCO17-Detect-MSRA.pdf). ICCV COCO Challenge
+ Workshop, 2017.
+
+5. **The Pascal Visual Object Classes Challenge: A Retrospective**
+ Mark Everingham, S. M. Ali Eslami, Luc Van Gool, Christopher K. I. Williams, John M. Winn, Andrew Zisserman
+ [[link]](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/). IJCV, 2014.
+
+6. **Semantic Contours from Inverse Detectors**
+ Bharath Hariharan, Pablo Arbelaez, Lubomir Bourdev, Subhransu Maji, Jitendra Malik
+ [[link]](http://home.bharathh.info/pubs/codes/SBD/download.html). In the Proc. of ICCV, 2011.
+
+7. **The Cityscapes Dataset for Semantic Urban Scene Understanding**
+ Cordts, Marius, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, Bernt Schiele.
+ [[link]](https://www.cityscapes-dataset.com/). In the Proc. of CVPR, 2016.
+
+8. **Microsoft COCO: Common Objects in Context**
+ Tsung-Yi Lin, Michael Maire, Serge Belongie, Lubomir Bourdev, Ross Girshick, James Hays, Pietro Perona, Deva Ramanan, C. Lawrence Zitnick, Piotr Dollar
+ [[link]](http://cocodataset.org/). In the Proc. of ECCV, 2014.
+
+9. **ImageNet Large Scale Visual Recognition Challenge**
+ Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, Li Fei-Fei
+ [[link]](http://www.image-net.org/). IJCV, 2015.
+
+10. **Deep Residual Learning for Image Recognition**
+ Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun
+ [[link]](https://arxiv.org/abs/1512.03385). CVPR, 2016.
+
+11. **Pyramid Scene Parsing Network**
+ Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, Jiaya Jia
+ [[link]](https://arxiv.org/abs/1612.01105). In CVPR, 2017.
+
+12. **Scene Parsing through ADE20K Dataset**
+ Bolei Zhou, Hang Zhao, Xavier Puig, Sanja Fidler, Adela Barriuso, Antonio Torralba
+ [[link]](http://groups.csail.mit.edu/vision/datasets/ADE20K/). In CVPR,
+ 2017.
+
+13. **Searching for MobileNetV3**
+ Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, Quoc V. Le, Hartwig Adam
+ [[link]](https://arxiv.org/abs/1905.02244). In ICCV, 2019.
diff --git a/deeplab/models/research/deeplab/g3doc/pascal.md b/deeplab/models/research/deeplab/g3doc/pascal.md
new file mode 100644
index 0000000..f4bc84e
--- /dev/null
+++ b/deeplab/models/research/deeplab/g3doc/pascal.md
@@ -0,0 +1,161 @@
+# Running DeepLab on PASCAL VOC 2012 Semantic Segmentation Dataset
+
+This page walks through the steps required to run DeepLab on PASCAL VOC 2012 on
+a local machine.
+
+## Download dataset and convert to TFRecord
+
+We have prepared the script (under the folder `datasets`) to download and
+convert PASCAL VOC 2012 semantic segmentation dataset to TFRecord.
+
+```bash
+# From the tensorflow/models/research/deeplab/datasets directory.
+sh download_and_convert_voc2012.sh
+```
+
+The converted dataset will be saved at
+./deeplab/datasets/pascal_voc_seg/tfrecord
+
+## Recommended Directory Structure for Training and Evaluation
+
+```
++ datasets
+ + pascal_voc_seg
+ + VOCdevkit
+ + VOC2012
+ + JPEGImages
+ + SegmentationClass
+ + tfrecord
+ + exp
+ + train_on_train_set
+ + train
+ + eval
+ + vis
+```
+
+where the folder `train_on_train_set` stores the train/eval/vis events and
+results (when training DeepLab on the PASCAL VOC 2012 train set).
+
+## Running the train/eval/vis jobs
+
+A local training job using `xception_65` can be run with the following command:
+
+```bash
+# From tensorflow/models/research/
+python deeplab/train.py \
+ --logtostderr \
+ --training_number_of_steps=30000 \
+ --train_split="train" \
+ --model_variant="xception_65" \
+ --atrous_rates=6 \
+ --atrous_rates=12 \
+ --atrous_rates=18 \
+ --output_stride=16 \
+ --decoder_output_stride=4 \
+ --train_crop_size="513,513" \
+ --train_batch_size=1 \
+ --dataset="pascal_voc_seg" \
+ --tf_initial_checkpoint=${PATH_TO_INITIAL_CHECKPOINT} \
+ --train_logdir=${PATH_TO_TRAIN_DIR} \
+ --dataset_dir=${PATH_TO_DATASET}
+```
+
+where ${PATH_TO_INITIAL_CHECKPOINT} is the path to the initial checkpoint
+(usually an ImageNet pretrained checkpoint), ${PATH_TO_TRAIN_DIR} is the
+directory in which training checkpoints and events will be written to, and
+${PATH_TO_DATASET} is the directory in which the PASCAL VOC 2012 dataset
+resides.
+
+**Note that for {train,eval,vis}.py:**
+
+1. In order to reproduce our results, one needs to use large batch size (> 12),
+ and set fine_tune_batch_norm = True. Here, we simply use small batch size
+ during training for the purpose of demonstration. If the users have limited
+ GPU memory at hand, please fine-tune from our provided checkpoints whose
+ batch norm parameters have been trained, and use smaller learning rate with
+ fine_tune_batch_norm = False.
+
+2. The users should change atrous_rates from [6, 12, 18] to [12, 24, 36] if
+ setting output_stride=8.
+
+3. The users could skip the flag, `decoder_output_stride`, if you do not want
+ to use the decoder structure.
+
+A local evaluation job using `xception_65` can be run with the following
+command:
+
+```bash
+# From tensorflow/models/research/
+python deeplab/eval.py \
+ --logtostderr \
+ --eval_split="val" \
+ --model_variant="xception_65" \
+ --atrous_rates=6 \
+ --atrous_rates=12 \
+ --atrous_rates=18 \
+ --output_stride=16 \
+ --decoder_output_stride=4 \
+ --eval_crop_size="513,513" \
+ --dataset="pascal_voc_seg" \
+ --checkpoint_dir=${PATH_TO_CHECKPOINT} \
+ --eval_logdir=${PATH_TO_EVAL_DIR} \
+ --dataset_dir=${PATH_TO_DATASET}
+```
+
+where ${PATH_TO_CHECKPOINT} is the path to the trained checkpoint (i.e., the
+path to train_logdir), ${PATH_TO_EVAL_DIR} is the directory in which evaluation
+events will be written to, and ${PATH_TO_DATASET} is the directory in which the
+PASCAL VOC 2012 dataset resides.
+
+A local visualization job using `xception_65` can be run with the following
+command:
+
+```bash
+# From tensorflow/models/research/
+python deeplab/vis.py \
+ --logtostderr \
+ --vis_split="val" \
+ --model_variant="xception_65" \
+ --atrous_rates=6 \
+ --atrous_rates=12 \
+ --atrous_rates=18 \
+ --output_stride=16 \
+ --decoder_output_stride=4 \
+ --vis_crop_size="513,513" \
+ --dataset="pascal_voc_seg" \
+ --checkpoint_dir=${PATH_TO_CHECKPOINT} \
+ --vis_logdir=${PATH_TO_VIS_DIR} \
+ --dataset_dir=${PATH_TO_DATASET}
+```
+
+where ${PATH_TO_CHECKPOINT} is the path to the trained checkpoint (i.e., the
+path to train_logdir), ${PATH_TO_VIS_DIR} is the directory in which evaluation
+events will be written to, and ${PATH_TO_DATASET} is the directory in which the
+PASCAL VOC 2012 dataset resides. Note that if the users would like to save the
+segmentation results for evaluation server, set also_save_raw_predictions =
+True.
+
+## Running Tensorboard
+
+Progress for training and evaluation jobs can be inspected using Tensorboard. If
+using the recommended directory structure, Tensorboard can be run using the
+following command:
+
+```bash
+tensorboard --logdir=${PATH_TO_LOG_DIRECTORY}
+```
+
+where `${PATH_TO_LOG_DIRECTORY}` points to the directory that contains the
+train, eval, and vis directories (e.g., the folder `train_on_train_set` in the
+above example). Please note it may take Tensorboard a couple minutes to populate
+with data.
+
+## Example
+
+We provide a script to run the {train,eval,vis,export_model}.py on the PASCAL VOC
+2012 dataset as an example. See the code in local_test.sh for details.
+
+```bash
+# From tensorflow/models/research/deeplab
+sh local_test.sh
+```
diff --git a/deeplab/models/research/deeplab/g3doc/quantize.md b/deeplab/models/research/deeplab/g3doc/quantize.md
new file mode 100644
index 0000000..65dbdd7
--- /dev/null
+++ b/deeplab/models/research/deeplab/g3doc/quantize.md
@@ -0,0 +1,103 @@
+# Quantize DeepLab model for faster on-device inference
+
+This page describes the steps required to quantize DeepLab model and convert it
+to TFLite for on-device inference. The main steps include:
+
+1. Quantization-aware training
+1. Exporting model
+1. Converting to TFLite FlatBuffer
+
+We provide details for each step below.
+
+## Quantization-aware training
+
+DeepLab supports two approaches to quantize your model.
+
+1. **[Recommended]** Training a non-quantized model until convergence. Then
+ fine-tune the trained float model with quantization using a small learning
+ rate (on PASCAL we use the value of 3e-5) . This fine-tuning step usually
+ takes 2k to 5k steps to converge.
+
+1. Training a deeplab float model with delayed quantization. Usually we delay
+ quantization until the last a few thousand steps in training.
+
+In the current implementation, quantization is only supported with 1)
+`num_clones=1` for training and 2) single scale inference for evaluation,
+visualization and model export. To get the best performance for the quantized
+model, we strongly recommend to train the float model with larger `num_clones`
+and then fine-tune the model with a single clone.
+
+Here shows the commandline to quantize deeplab model trained on PASCAL VOC
+dataset using fine-tuning:
+
+```
+# From tensorflow/models/research/
+python deeplab/train.py \
+ --logtostderr \
+ --training_number_of_steps=3000 \
+ --train_split="train" \
+ --model_variant="mobilenet_v2" \
+ --output_stride=16 \
+ --train_crop_size="513,513" \
+ --train_batch_size=8 \
+ --base_learning_rate=3e-5 \
+ --dataset="pascal_voc_seg" \
+ --quantize_delay_step=0 \
+ --tf_initial_checkpoint=${PATH_TO_TRAINED_FLOAT_MODEL} \
+ --train_logdir=${PATH_TO_TRAIN_DIR} \
+ --dataset_dir=${PATH_TO_DATASET}
+```
+
+## Converting to TFLite FlatBuffer
+
+First use the following commandline to export your trained model.
+
+```
+# From tensorflow/models/research/
+python deeplab/export_model.py \
+ --checkpoint_path=${CHECKPOINT_PATH} \
+ --quantize_delay_step=0 \
+ --export_path=${OUTPUT_DIR}/frozen_inference_graph.pb
+
+```
+
+Commandline below shows how to convert exported graphdef to TFlite model.
+
+```
+# From tensorflow/models/research/
+python deeplab/convert_to_tflite.py \
+ --quantized_graph_def_path=${OUTPUT_DIR}/frozen_inference_graph.pb \
+ --input_tensor_name=MobilenetV2/MobilenetV2/input:0 \
+ --output_tflite_path=${OUTPUT_DIR}/frozen_inference_graph.tflite \
+ --test_image_path=${PATH_TO_TEST_IMAGE}
+```
+
+**[Important]** Note that converted model expects 513x513 RGB input and doesn't
+include preprocessing (resize and pad input image) and post processing (crop
+padded region and resize to original input size). These steps can be implemented
+outside of TFlite model.
+
+## Quantized model on PASCAL VOC
+
+We provide float and quantized checkpoints that have been pretrained on VOC 2012
+train_aug set, using MobileNet-v2 backbone with different depth multipliers.
+Quantized model usually have 1% decay in mIoU.
+
+For quantized (8bit) model, un-tar'ed directory includes:
+
+* a frozen inference graph (frozen_inference_graph.pb)
+
+* a checkpoint (model.ckpt.data*, model.ckpt.index)
+
+* a converted TFlite FlatBuffer file (frozen_inference_graph.tflite)
+
+Checkpoint name | Eval OS | Eval scales | Left-right Flip | Multiply-Adds | Quantize | PASCAL mIOU | Folder Size | TFLite File Size
+-------------------------------------------------------------------------------------------------------------------------------------------- | :-----: | :---------: | :-------------: | :-----------: | :------: | :----------: | :-------: | :-------:
+[mobilenetv2_dm05_coco_voc_trainaug](http://download.tensorflow.org/models/deeplabv3_mnv2_dm05_pascal_trainaug_2018_10_01.tar.gz) | 16 | [1.0] | No | 0.88B | No | 70.19% (val) | 7.6MB | N/A
+[mobilenetv2_dm05_coco_voc_trainaug_8bit](http://download.tensorflow.org/models/deeplabv3_mnv2_dm05_pascal_train_aug_8bit_2019_04_26.tar.gz) | 16 | [1.0] | No | 0.88B | Yes | 69.65% (val) | 8.2MB | 751.1KB
+[mobilenetv2_coco_voc_trainaug](http://download.tensorflow.org/models/deeplabv3_mnv2_pascal_train_aug_2018_01_29.tar.gz) | 16 | [1.0] | No | 2.75B | No | 75.32% (val) | 23MB | N/A
+[mobilenetv2_coco_voc_trainaug_8bit](http://download.tensorflow.org/models/deeplabv3_mnv2_pascal_train_aug_8bit_2019_04_26.tar.gz) | 16 | [1.0] | No | 2.75B | Yes | 74.26% (val) | 24MB | 2.2MB
+
+Note that you might need the nightly build of TensorFlow (see
+[here](https://www.tensorflow.org/install) for install instructions) to convert
+above quantized model to TFLite.
diff --git a/deeplab/models/research/deeplab/inference.py b/deeplab/models/research/deeplab/inference.py
new file mode 100644
index 0000000..543cc23
--- /dev/null
+++ b/deeplab/models/research/deeplab/inference.py
@@ -0,0 +1,172 @@
+import os
+from io import BytesIO
+import tarfile
+import tempfile
+from six.moves import urllib
+
+from matplotlib import gridspec
+from matplotlib import pyplot as plt
+import numpy as np
+from PIL import Image
+
+import tensorflow as tf
+from matplotlib import cm
+
+
+
+class DeepLabModel(object):
+ """Class to load deeplab model and run inference."""
+
+ INPUT_TENSOR_NAME = 'ImageTensor:0'
+ OUTPUT_TENSOR_NAME = 'SemanticPredictions:0'
+ INPUT_SIZE = 448
+ FROZEN_GRAPH_NAME = 'frozen_inference_graph'
+
+ def __init__(self, tarball_path):
+ """Creates and loads pretrained deeplab model."""
+ self.graph = tf.Graph()
+
+ # graph_def = None
+ # Extract frozen graph from tar archive.
+ # tar_file = tarfile.open(tarball_path)
+ # for tar_info in tar_file.getmembers():
+ # if self.FROZEN_GRAPH_NAME in os.path.basename(tar_info.name):
+ # file_handle = tar_file.extractfile(tar_info)
+ # graph_def = tf.GraphDef.FromString(file_handle.read())
+ # break
+
+ # tar_file.close()
+
+ def load_pb(path_to_pb):
+ with tf.gfile.GFile(path_to_pb, "rb") as f:
+ graph_def = tf.GraphDef()
+ graph_def.ParseFromString(f.read())
+ with tf.Graph().as_default() as graph:
+ tf.import_graph_def(graph_def, name='')
+ return graph
+ self.graph = load_pb(tarball_path)
+
+ # graph_def = tf.GraphDef()
+ # graph_def.ParseFromString(open(tarball_path, 'rb'))
+
+ # if graph_def is None:
+ # raise RuntimeError('Cannot find inference graph in tar archive.')
+
+ # with self.graph.as_default():
+ # tf.import_graph_def(graph_def, name='')
+
+ self.sess = tf.Session(graph=self.graph)
+
+ def run(self, image):
+ """Runs inference on a single image.
+
+ Args:
+ image: A PIL.Image object, raw input image.
+
+ Returns:
+ resized_image: RGB image resized from original input image.
+ seg_map: Segmentation map of `resized_image`.
+ """
+ width, height = image.size
+ resize_ratio = 1.0 * self.INPUT_SIZE / max(width, height)
+ target_size = (int(resize_ratio * width), int(resize_ratio * height))
+ resized_image = image.convert('RGB').resize(target_size, Image.ANTIALIAS)
+ batch_seg_map = self.sess.run(
+ self.OUTPUT_TENSOR_NAME,
+ feed_dict={self.INPUT_TENSOR_NAME: [np.asarray(resized_image)]})
+ seg_map = batch_seg_map[0]
+ return resized_image, seg_map
+
+
+def create_pascal_label_colormap():
+ """Creates a label colormap used in PASCAL VOC segmentation benchmark.
+
+ Returns:
+ A Colormap for visualizing segmentation results.
+ """
+ colormap = np.zeros((256, 3), dtype=int)
+ ind = np.arange(256, dtype=int)
+
+ for shift in reversed(range(8)):
+ for channel in range(3):
+ colormap[:, channel] |= ((ind >> channel) & 1) << shift
+ ind >>= 3
+
+ return colormap
+
+
+def label_to_color_image(label):
+ """Adds color defined by the dataset colormap to the label.
+
+ Args:
+ label: A 2D array with integer type, storing the segmentation label.
+
+ Returns:
+ result: A 2D array with floating type. The element of the array
+ is the color indexed by the corresponding element in the input label
+ to the PASCAL color map.
+
+ Raises:
+ ValueError: If label is not of rank 2 or its value is larger than color
+ map maximum entry.
+ """
+ if label.ndim != 2:
+ raise ValueError('Expect 2-D input label')
+
+ colormap = create_pascal_label_colormap()
+
+ if np.max(label) >= len(colormap):
+ raise ValueError('label value too large.')
+
+ return colormap[label]
+
+
+def vis_segmentation(image, seg_map):
+ """Visualizes input image, segmentation map and overlay view."""
+ plt.figure(figsize=(15, 5))
+ grid_spec = gridspec.GridSpec(1, 4, width_ratios=[6, 6, 6, 1])
+
+ plt.subplot(grid_spec[0])
+ plt.imshow(image)
+ plt.axis('off')
+ plt.title('input image')
+
+ plt.subplot(grid_spec[1])
+ seg_image = label_to_color_image(seg_map).astype(np.uint8)
+ plt.imshow(seg_image)
+ plt.axis('off')
+ plt.title('segmentation map')
+
+ plt.subplot(grid_spec[2])
+ plt.imshow(image)
+ plt.imshow(seg_image, alpha=0.7)
+ plt.axis('off')
+ plt.title('segmentation overlay')
+
+ unique_labels = np.unique(seg_map)
+ print(unique_labels)
+ ax = plt.subplot(grid_spec[3])
+ plt.imshow(
+ FULL_COLOR_MAP[unique_labels].astype(np.uint8), interpolation='nearest')
+ ax.yaxis.tick_right()
+ plt.yticks(range(len(unique_labels)), LABEL_NAMES[unique_labels])
+ plt.xticks([], [])
+ ax.tick_params(width=0.0)
+ plt.grid('off')
+ plt.show()
+
+
+LABEL_NAMES = np.asarray([
+ 'Background', 'Heart_O'
+])
+
+FULL_LABEL_MAP = np.arange(len(LABEL_NAMES)).reshape(len(LABEL_NAMES), 1)
+FULL_COLOR_MAP = label_to_color_image(FULL_LABEL_MAP)
+
+model_path = '/Users/mandywoo/Documents/UAV-Forge/image-proc_2020-21/models/research/deeplab/datasets/PQR/exp/train_on_trainval_set/export/frozen_inference_graph.pb'
+MODEL = DeepLabModel(model_path)
+
+
+img = Image.open('/Users/mandywoo/Documents/UAV-Forge/image-proc_2020-21/models/research/deeplab/datasets/PQR/dataset/JPEGImages/Heart_O_img_0.jpg')
+resized_image, seg_map = MODEL.run(img)
+vis_segmentation(resized_image, seg_map)
\ No newline at end of file
diff --git a/deeplab/models/research/deeplab/input_preprocess.py b/deeplab/models/research/deeplab/input_preprocess.py
new file mode 100644
index 0000000..9ca8bce
--- /dev/null
+++ b/deeplab/models/research/deeplab/input_preprocess.py
@@ -0,0 +1,139 @@
+# Lint as: python2, python3
+# Copyright 2018 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+
+"""Prepares the data used for DeepLab training/evaluation."""
+import tensorflow as tf
+from deeplab.core import feature_extractor
+from deeplab.core import preprocess_utils
+
+
+# The probability of flipping the images and labels
+# left-right during training
+_PROB_OF_FLIP = 0.5
+
+
+def preprocess_image_and_label(image,
+ label,
+ crop_height,
+ crop_width,
+ min_resize_value=None,
+ max_resize_value=None,
+ resize_factor=None,
+ min_scale_factor=1.,
+ max_scale_factor=1.,
+ scale_factor_step_size=0,
+ ignore_label=255,
+ is_training=True,
+ model_variant=None):
+ """Preprocesses the image and label.
+
+ Args:
+ image: Input image.
+ label: Ground truth annotation label.
+ crop_height: The height value used to crop the image and label.
+ crop_width: The width value used to crop the image and label.
+ min_resize_value: Desired size of the smaller image side.
+ max_resize_value: Maximum allowed size of the larger image side.
+ resize_factor: Resized dimensions are multiple of factor plus one.
+ min_scale_factor: Minimum scale factor value.
+ max_scale_factor: Maximum scale factor value.
+ scale_factor_step_size: The step size from min scale factor to max scale
+ factor. The input is randomly scaled based on the value of
+ (min_scale_factor, max_scale_factor, scale_factor_step_size).
+ ignore_label: The label value which will be ignored for training and
+ evaluation.
+ is_training: If the preprocessing is used for training or not.
+ model_variant: Model variant (string) for choosing how to mean-subtract the
+ images. See feature_extractor.network_map for supported model variants.
+
+ Returns:
+ original_image: Original image (could be resized).
+ processed_image: Preprocessed image.
+ label: Preprocessed ground truth segmentation label.
+
+ Raises:
+ ValueError: Ground truth label not provided during training.
+ """
+ if is_training and label is None:
+ raise ValueError('During training, label must be provided.')
+ if model_variant is None:
+ tf.logging.warning('Default mean-subtraction is performed. Please specify '
+ 'a model_variant. See feature_extractor.network_map for '
+ 'supported model variants.')
+
+ # Keep reference to original image.
+ original_image = image
+
+ processed_image = tf.cast(image, tf.float32)
+
+ if label is not None:
+ label = tf.cast(label, tf.int32)
+
+ # Resize image and label to the desired range.
+ if min_resize_value or max_resize_value:
+ [processed_image, label] = (
+ preprocess_utils.resize_to_range(
+ image=processed_image,
+ label=label,
+ min_size=min_resize_value,
+ max_size=max_resize_value,
+ factor=resize_factor,
+ align_corners=True))
+ # The `original_image` becomes the resized image.
+ original_image = tf.identity(processed_image)
+
+ # Data augmentation by randomly scaling the inputs.
+ if is_training:
+ scale = preprocess_utils.get_random_scale(
+ min_scale_factor, max_scale_factor, scale_factor_step_size)
+ processed_image, label = preprocess_utils.randomly_scale_image_and_label(
+ processed_image, label, scale)
+ processed_image.set_shape([None, None, 3])
+
+ # Pad image and label to have dimensions >= [crop_height, crop_width]
+ image_shape = tf.shape(processed_image)
+ image_height = image_shape[0]
+ image_width = image_shape[1]
+
+ target_height = image_height + tf.maximum(crop_height - image_height, 0)
+ target_width = image_width + tf.maximum(crop_width - image_width, 0)
+
+ # Pad image with mean pixel value.
+ mean_pixel = tf.reshape(
+ feature_extractor.mean_pixel(model_variant), [1, 1, 3])
+ processed_image = preprocess_utils.pad_to_bounding_box(
+ processed_image, 0, 0, target_height, target_width, mean_pixel)
+
+ if label is not None:
+ label = preprocess_utils.pad_to_bounding_box(
+ label, 0, 0, target_height, target_width, ignore_label)
+
+ # Randomly crop the image and label.
+ if is_training and label is not None:
+ processed_image, label = preprocess_utils.random_crop(
+ [processed_image, label], crop_height, crop_width)
+
+ processed_image.set_shape([crop_height, crop_width, 3])
+
+ if label is not None:
+ label.set_shape([crop_height, crop_width, 1])
+
+ if is_training:
+ # Randomly left-right flip the image and label.
+ processed_image, label, _ = preprocess_utils.flip_dim(
+ [processed_image, label], _PROB_OF_FLIP, dim=1)
+
+ return original_image, processed_image, label
diff --git a/deeplab/models/research/deeplab/local_test.sh b/deeplab/models/research/deeplab/local_test.sh
new file mode 100644
index 0000000..e568ead
--- /dev/null
+++ b/deeplab/models/research/deeplab/local_test.sh
@@ -0,0 +1,148 @@
+#!/bin/bash
+# Copyright 2018 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+#
+# This script is used to run local test on PASCAL VOC 2012. Users could also
+# modify from this script for their use case.
+#
+# Usage:
+# # From the tensorflow/models/research/deeplab directory.
+# bash ./local_test.sh
+#
+#
+
+# Exit immediately if a command exits with a non-zero status.
+set -e
+
+# Move one-level up to tensorflow/models/research directory.
+cd ..
+
+# Update PYTHONPATH.
+export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim
+# set PYTHONPATH=$PYTHONPATH:`../tensorflow/models/research`:`../tensorflow/models/research/slim`
+
+# Set up the working environment.
+CURRENT_DIR=$(pwd)
+WORK_DIR="${CURRENT_DIR}/deeplab"
+
+# Run model_test first to make sure the PYTHONPATH is correctly set.
+python3.7 "${WORK_DIR}"/model_test.py
+
+# Go to datasets folder and download PASCAL VOC 2012 segmentation dataset.
+DATASET_DIR="datasets"
+cd "${WORK_DIR}/${DATASET_DIR}"
+bash download_and_convert_voc2012.sh
+
+# Go back to original directory.
+cd "${CURRENT_DIR}"
+
+# Set up the working directories.
+PASCAL_FOLDER="pascal_voc_seg"
+EXP_FOLDER="exp/train_on_trainval_set"
+INIT_FOLDER="${WORK_DIR}/${DATASET_DIR}/${PASCAL_FOLDER}/init_models"
+TRAIN_LOGDIR="${WORK_DIR}/${DATASET_DIR}/${PASCAL_FOLDER}/${EXP_FOLDER}/train"
+EVAL_LOGDIR="${WORK_DIR}/${DATASET_DIR}/${PASCAL_FOLDER}/${EXP_FOLDER}/eval"
+VIS_LOGDIR="${WORK_DIR}/${DATASET_DIR}/${PASCAL_FOLDER}/${EXP_FOLDER}/vis"
+EXPORT_DIR="${WORK_DIR}/${DATASET_DIR}/${PASCAL_FOLDER}/${EXP_FOLDER}/export"
+mkdir -p "${INIT_FOLDER}"
+mkdir -p "${TRAIN_LOGDIR}"
+mkdir -p "${EVAL_LOGDIR}"
+mkdir -p "${VIS_LOGDIR}"
+mkdir -p "${EXPORT_DIR}"
+
+# Copy locally the trained checkpoint as the initial checkpoint.
+TF_INIT_ROOT="http://download.tensorflow.org/models"
+TF_INIT_CKPT="deeplabv3_pascal_train_aug_2018_01_04.tar.gz"
+cd "${INIT_FOLDER}"
+wget -nd -c "${TF_INIT_ROOT}/${TF_INIT_CKPT}"
+tar -xf "${TF_INIT_CKPT}"
+cd "${CURRENT_DIR}"
+
+PASCAL_DATASET="${WORK_DIR}/${DATASET_DIR}/${PASCAL_FOLDER}/tfrecord"
+
+# Train 10 iterations.
+NUM_ITERATIONS=10
+python3.7 "${WORK_DIR}"/train.py \
+ --logtostderr \
+ --train_split="trainval" \
+ --model_variant="xception_65" \
+ --atrous_rates=6 \
+ --atrous_rates=12 \
+ --atrous_rates=18 \
+ --output_stride=16 \
+ --decoder_output_stride=4 \
+ --train_crop_size="513,513" \
+ --train_batch_size=4 \
+ --training_number_of_steps="${NUM_ITERATIONS}" \
+ --fine_tune_batch_norm=true \
+ --tf_initial_checkpoint="${INIT_FOLDER}/deeplabv3_pascal_train_aug/model.ckpt" \
+ --train_logdir="${TRAIN_LOGDIR}" \
+ --dataset_dir="${PASCAL_DATASET}"
+
+# Run evaluation. This performs eval over the full val split (1449 images) and
+# will take a while.
+# Using the provided checkpoint, one should expect mIOU=82.20%.
+python3.7 "${WORK_DIR}"/eval.py \
+ --logtostderr \
+ --eval_split="val" \
+ --model_variant="xception_65" \
+ --atrous_rates=6 \
+ --atrous_rates=12 \
+ --atrous_rates=18 \
+ --output_stride=16 \
+ --decoder_output_stride=4 \
+ --eval_crop_size="513,513" \
+ --checkpoint_dir="${TRAIN_LOGDIR}" \
+ --eval_logdir="${EVAL_LOGDIR}" \
+ --dataset_dir="${PASCAL_DATASET}" \
+ --max_number_of_evaluations=1
+
+# Visualize the results.
+python3.7 "${WORK_DIR}"/vis.py \
+ --logtostderr \
+ --vis_split="val" \
+ --model_variant="xception_65" \
+ --atrous_rates=6 \
+ --atrous_rates=12 \
+ --atrous_rates=18 \
+ --output_stride=16 \
+ --decoder_output_stride=4 \
+ --vis_crop_size="513,513" \
+ --checkpoint_dir="${TRAIN_LOGDIR}" \
+ --vis_logdir="${VIS_LOGDIR}" \
+ --dataset_dir="${PASCAL_DATASET}" \
+ --max_number_of_iterations=1
+
+# Export the trained checkpoint.
+CKPT_PATH="${TRAIN_LOGDIR}/model.ckpt-${NUM_ITERATIONS}"
+EXPORT_PATH="${EXPORT_DIR}/frozen_inference_graph.pb"
+
+python3.7 "${WORK_DIR}"/export_model.py \
+ --logtostderr \
+ --checkpoint_path="${CKPT_PATH}" \
+ --export_path="${EXPORT_PATH}" \
+ --model_variant="xception_65" \
+ --atrous_rates=6 \
+ --atrous_rates=12 \
+ --atrous_rates=18 \
+ --output_stride=16 \
+ --decoder_output_stride=4 \
+ --num_classes=21 \
+ --crop_size=513 \
+ --crop_size=513 \
+ --inference_scales=1.0
+
+# Run inference with the exported checkpoint.
+# Please refer to the provided deeplab_demo.ipynb for an example.
diff --git a/deeplab/models/research/deeplab/local_test_mobilenetv2.sh b/deeplab/models/research/deeplab/local_test_mobilenetv2.sh
new file mode 100644
index 0000000..c38646f
--- /dev/null
+++ b/deeplab/models/research/deeplab/local_test_mobilenetv2.sh
@@ -0,0 +1,129 @@
+#!/bin/bash
+# Copyright 2018 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+#
+# This script is used to run local test on PASCAL VOC 2012 using MobileNet-v2.
+# Users could also modify from this script for their use case.
+#
+# Usage:
+# # From the tensorflow/models/research/deeplab directory.
+# sh ./local_test_mobilenetv2.sh
+#
+#
+
+# Exit immediately if a command exits with a non-zero status.
+set -e
+
+# Move one-level up to tensorflow/models/research directory.
+cd ..
+
+# Update PYTHONPATH.
+export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim
+
+# Set up the working environment.
+CURRENT_DIR=$(pwd)
+WORK_DIR="${CURRENT_DIR}/deeplab"
+
+# Run model_test first to make sure the PYTHONPATH is correctly set.
+python "${WORK_DIR}"/model_test.py -v
+
+# Go to datasets folder and download PASCAL VOC 2012 segmentation dataset.
+DATASET_DIR="datasets"
+cd "${WORK_DIR}/${DATASET_DIR}"
+sh download_and_convert_voc2012.sh
+
+# Go back to original directory.
+cd "${CURRENT_DIR}"
+
+# Set up the working directories.
+PASCAL_FOLDER="pascal_voc_seg"
+EXP_FOLDER="exp/train_on_trainval_set_mobilenetv2"
+INIT_FOLDER="${WORK_DIR}/${DATASET_DIR}/${PASCAL_FOLDER}/init_models"
+TRAIN_LOGDIR="${WORK_DIR}/${DATASET_DIR}/${PASCAL_FOLDER}/${EXP_FOLDER}/train"
+EVAL_LOGDIR="${WORK_DIR}/${DATASET_DIR}/${PASCAL_FOLDER}/${EXP_FOLDER}/eval"
+VIS_LOGDIR="${WORK_DIR}/${DATASET_DIR}/${PASCAL_FOLDER}/${EXP_FOLDER}/vis"
+EXPORT_DIR="${WORK_DIR}/${DATASET_DIR}/${PASCAL_FOLDER}/${EXP_FOLDER}/export"
+mkdir -p "${INIT_FOLDER}"
+mkdir -p "${TRAIN_LOGDIR}"
+mkdir -p "${EVAL_LOGDIR}"
+mkdir -p "${VIS_LOGDIR}"
+mkdir -p "${EXPORT_DIR}"
+
+# Copy locally the trained checkpoint as the initial checkpoint.
+TF_INIT_ROOT="http://download.tensorflow.org/models"
+CKPT_NAME="deeplabv3_mnv2_pascal_train_aug"
+TF_INIT_CKPT="${CKPT_NAME}_2018_01_29.tar.gz"
+cd "${INIT_FOLDER}"
+wget -nd -c "${TF_INIT_ROOT}/${TF_INIT_CKPT}"
+tar -xf "${TF_INIT_CKPT}"
+cd "${CURRENT_DIR}"
+
+PASCAL_DATASET="${WORK_DIR}/${DATASET_DIR}/${PASCAL_FOLDER}/tfrecord"
+
+# Train 10 iterations.
+NUM_ITERATIONS=10
+python "${WORK_DIR}"/train.py \
+ --logtostderr \
+ --train_split="trainval" \
+ --model_variant="mobilenet_v2" \
+ --output_stride=16 \
+ --train_crop_size="513,513" \
+ --train_batch_size=4 \
+ --training_number_of_steps="${NUM_ITERATIONS}" \
+ --fine_tune_batch_norm=true \
+ --tf_initial_checkpoint="${INIT_FOLDER}/${CKPT_NAME}/model.ckpt-30000" \
+ --train_logdir="${TRAIN_LOGDIR}" \
+ --dataset_dir="${PASCAL_DATASET}"
+
+# Run evaluation. This performs eval over the full val split (1449 images) and
+# will take a while.
+# Using the provided checkpoint, one should expect mIOU=75.34%.
+python "${WORK_DIR}"/eval.py \
+ --logtostderr \
+ --eval_split="val" \
+ --model_variant="mobilenet_v2" \
+ --eval_crop_size="513,513" \
+ --checkpoint_dir="${TRAIN_LOGDIR}" \
+ --eval_logdir="${EVAL_LOGDIR}" \
+ --dataset_dir="${PASCAL_DATASET}" \
+ --max_number_of_evaluations=1
+
+# Visualize the results.
+python "${WORK_DIR}"/vis.py \
+ --logtostderr \
+ --vis_split="val" \
+ --model_variant="mobilenet_v2" \
+ --vis_crop_size="513,513" \
+ --checkpoint_dir="${TRAIN_LOGDIR}" \
+ --vis_logdir="${VIS_LOGDIR}" \
+ --dataset_dir="${PASCAL_DATASET}" \
+ --max_number_of_iterations=1
+
+# Export the trained checkpoint.
+CKPT_PATH="${TRAIN_LOGDIR}/model.ckpt-${NUM_ITERATIONS}"
+EXPORT_PATH="${EXPORT_DIR}/frozen_inference_graph.pb"
+
+python "${WORK_DIR}"/export_model.py \
+ --logtostderr \
+ --checkpoint_path="${CKPT_PATH}" \
+ --export_path="${EXPORT_PATH}" \
+ --model_variant="mobilenet_v2" \
+ --num_classes=21 \
+ --crop_size=513 \
+ --crop_size=513 \
+ --inference_scales=1.0
+
+# Run inference with the exported checkpoint.
+# Please refer to the provided deeplab_demo.ipynb for an example.
diff --git a/deeplab/models/research/deeplab/model.py b/deeplab/models/research/deeplab/model.py
new file mode 100644
index 0000000..311aaa1
--- /dev/null
+++ b/deeplab/models/research/deeplab/model.py
@@ -0,0 +1,911 @@
+# Lint as: python2, python3
+# Copyright 2018 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+r"""Provides DeepLab model definition and helper functions.
+
+DeepLab is a deep learning system for semantic image segmentation with
+the following features:
+
+(1) Atrous convolution to explicitly control the resolution at which
+feature responses are computed within Deep Convolutional Neural Networks.
+
+(2) Atrous spatial pyramid pooling (ASPP) to robustly segment objects at
+multiple scales with filters at multiple sampling rates and effective
+fields-of-views.
+
+(3) ASPP module augmented with image-level feature and batch normalization.
+
+(4) A simple yet effective decoder module to recover the object boundaries.
+
+See the following papers for more details:
+
+"Encoder-Decoder with Atrous Separable Convolution for Semantic Image
+Segmentation"
+Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, Hartwig Adam.
+(https://arxiv.org/abs/1802.02611)
+
+"Rethinking Atrous Convolution for Semantic Image Segmentation,"
+Liang-Chieh Chen, George Papandreou, Florian Schroff, Hartwig Adam
+(https://arxiv.org/abs/1706.05587)
+
+"DeepLab: Semantic Image Segmentation with Deep Convolutional Nets,
+Atrous Convolution, and Fully Connected CRFs",
+Liang-Chieh Chen*, George Papandreou*, Iasonas Kokkinos, Kevin Murphy,
+Alan L Yuille (* equal contribution)
+(https://arxiv.org/abs/1606.00915)
+
+"Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected
+CRFs"
+Liang-Chieh Chen*, George Papandreou*, Iasonas Kokkinos, Kevin Murphy,
+Alan L. Yuille (* equal contribution)
+(https://arxiv.org/abs/1412.7062)
+"""
+import tensorflow as tf
+from tensorflow.contrib import slim as contrib_slim
+from deeplab.core import dense_prediction_cell
+from deeplab.core import feature_extractor
+from deeplab.core import utils
+
+slim = contrib_slim
+
+LOGITS_SCOPE_NAME = 'logits'
+MERGED_LOGITS_SCOPE = 'merged_logits'
+IMAGE_POOLING_SCOPE = 'image_pooling'
+ASPP_SCOPE = 'aspp'
+CONCAT_PROJECTION_SCOPE = 'concat_projection'
+DECODER_SCOPE = 'decoder'
+META_ARCHITECTURE_SCOPE = 'meta_architecture'
+
+PROB_SUFFIX = '_prob'
+
+_resize_bilinear = utils.resize_bilinear
+scale_dimension = utils.scale_dimension
+split_separable_conv2d = utils.split_separable_conv2d
+
+
+def get_extra_layer_scopes(last_layers_contain_logits_only=False):
+ """Gets the scopes for extra layers.
+
+ Args:
+ last_layers_contain_logits_only: Boolean, True if only consider logits as
+ the last layer (i.e., exclude ASPP module, decoder module and so on)
+
+ Returns:
+ A list of scopes for extra layers.
+ """
+ if last_layers_contain_logits_only:
+ return [LOGITS_SCOPE_NAME]
+ else:
+ return [
+ LOGITS_SCOPE_NAME,
+ IMAGE_POOLING_SCOPE,
+ ASPP_SCOPE,
+ CONCAT_PROJECTION_SCOPE,
+ DECODER_SCOPE,
+ META_ARCHITECTURE_SCOPE,
+ ]
+
+
+def predict_labels_multi_scale(images,
+ model_options,
+ eval_scales=(1.0,),
+ add_flipped_images=False):
+ """Predicts segmentation labels.
+
+ Args:
+ images: A tensor of size [batch, height, width, channels].
+ model_options: A ModelOptions instance to configure models.
+ eval_scales: The scales to resize images for evaluation.
+ add_flipped_images: Add flipped images for evaluation or not.
+
+ Returns:
+ A dictionary with keys specifying the output_type (e.g., semantic
+ prediction) and values storing Tensors representing predictions (argmax
+ over channels). Each prediction has size [batch, height, width].
+ """
+ outputs_to_predictions = {
+ output: []
+ for output in model_options.outputs_to_num_classes
+ }
+
+ for i, image_scale in enumerate(eval_scales):
+ with tf.variable_scope(tf.get_variable_scope(), reuse=True if i else None):
+ outputs_to_scales_to_logits = multi_scale_logits(
+ images,
+ model_options=model_options,
+ image_pyramid=[image_scale],
+ is_training=False,
+ fine_tune_batch_norm=False)
+
+ if add_flipped_images:
+ with tf.variable_scope(tf.get_variable_scope(), reuse=True):
+ outputs_to_scales_to_logits_reversed = multi_scale_logits(
+ tf.reverse_v2(images, [2]),
+ model_options=model_options,
+ image_pyramid=[image_scale],
+ is_training=False,
+ fine_tune_batch_norm=False)
+
+ for output in sorted(outputs_to_scales_to_logits):
+ scales_to_logits = outputs_to_scales_to_logits[output]
+ logits = _resize_bilinear(
+ scales_to_logits[MERGED_LOGITS_SCOPE],
+ tf.shape(images)[1:3],
+ scales_to_logits[MERGED_LOGITS_SCOPE].dtype)
+ outputs_to_predictions[output].append(
+ tf.expand_dims(tf.nn.softmax(logits), 4))
+
+ if add_flipped_images:
+ scales_to_logits_reversed = (
+ outputs_to_scales_to_logits_reversed[output])
+ logits_reversed = _resize_bilinear(
+ tf.reverse_v2(scales_to_logits_reversed[MERGED_LOGITS_SCOPE], [2]),
+ tf.shape(images)[1:3],
+ scales_to_logits_reversed[MERGED_LOGITS_SCOPE].dtype)
+ outputs_to_predictions[output].append(
+ tf.expand_dims(tf.nn.softmax(logits_reversed), 4))
+
+ for output in sorted(outputs_to_predictions):
+ predictions = outputs_to_predictions[output]
+ # Compute average prediction across different scales and flipped images.
+ predictions = tf.reduce_mean(tf.concat(predictions, 4), axis=4)
+ outputs_to_predictions[output] = tf.argmax(predictions, 3)
+ outputs_to_predictions[output + PROB_SUFFIX] = tf.nn.softmax(predictions)
+
+ return outputs_to_predictions
+
+
+def predict_labels(images, model_options, image_pyramid=None):
+ """Predicts segmentation labels.
+
+ Args:
+ images: A tensor of size [batch, height, width, channels].
+ model_options: A ModelOptions instance to configure models.
+ image_pyramid: Input image scales for multi-scale feature extraction.
+
+ Returns:
+ A dictionary with keys specifying the output_type (e.g., semantic
+ prediction) and values storing Tensors representing predictions (argmax
+ over channels). Each prediction has size [batch, height, width].
+ """
+ outputs_to_scales_to_logits = multi_scale_logits(
+ images,
+ model_options=model_options,
+ image_pyramid=image_pyramid,
+ is_training=False,
+ fine_tune_batch_norm=False)
+
+ predictions = {}
+ for output in sorted(outputs_to_scales_to_logits):
+ scales_to_logits = outputs_to_scales_to_logits[output]
+ logits = scales_to_logits[MERGED_LOGITS_SCOPE]
+ # There are two ways to obtain the final prediction results: (1) bilinear
+ # upsampling the logits followed by argmax, or (2) argmax followed by
+ # nearest neighbor upsampling. The second option may introduce the "blocking
+ # effect" but is computationally efficient.
+ if model_options.prediction_with_upsampled_logits:
+ logits = _resize_bilinear(logits,
+ tf.shape(images)[1:3],
+ scales_to_logits[MERGED_LOGITS_SCOPE].dtype)
+ predictions[output] = tf.argmax(logits, 3)
+ predictions[output + PROB_SUFFIX] = tf.nn.softmax(logits)
+ else:
+ argmax_results = tf.argmax(logits, 3)
+ argmax_results = tf.image.resize_nearest_neighbor(
+ tf.expand_dims(argmax_results, 3),
+ tf.shape(images)[1:3],
+ align_corners=True,
+ name='resize_prediction')
+ predictions[output] = tf.squeeze(argmax_results, 3)
+ predictions[output + PROB_SUFFIX] = tf.image.resize_bilinear(
+ tf.nn.softmax(logits),
+ tf.shape(images)[1:3],
+ align_corners=True,
+ name='resize_prob')
+ return predictions
+
+
+def multi_scale_logits(images,
+ model_options,
+ image_pyramid,
+ weight_decay=0.0001,
+ is_training=False,
+ fine_tune_batch_norm=False,
+ nas_training_hyper_parameters=None):
+ """Gets the logits for multi-scale inputs.
+
+ The returned logits are all downsampled (due to max-pooling layers)
+ for both training and evaluation.
+
+ Args:
+ images: A tensor of size [batch, height, width, channels].
+ model_options: A ModelOptions instance to configure models.
+ image_pyramid: Input image scales for multi-scale feature extraction.
+ weight_decay: The weight decay for model variables.
+ is_training: Is training or not.
+ fine_tune_batch_norm: Fine-tune the batch norm parameters or not.
+ nas_training_hyper_parameters: A dictionary storing hyper-parameters for
+ training nas models. Its keys are:
+ - `drop_path_keep_prob`: Probability to keep each path in the cell when
+ training.
+ - `total_training_steps`: Total training steps to help drop path
+ probability calculation.
+
+ Returns:
+ outputs_to_scales_to_logits: A map of maps from output_type (e.g.,
+ semantic prediction) to a dictionary of multi-scale logits names to
+ logits. For each output_type, the dictionary has keys which
+ correspond to the scales and values which correspond to the logits.
+ For example, if `scales` equals [1.0, 1.5], then the keys would
+ include 'merged_logits', 'logits_1.00' and 'logits_1.50'.
+
+ Raises:
+ ValueError: If model_options doesn't specify crop_size and its
+ add_image_level_feature = True, since add_image_level_feature requires
+ crop_size information.
+ """
+ # Setup default values.
+ if not image_pyramid:
+ image_pyramid = [1.0]
+ crop_height = (
+ model_options.crop_size[0]
+ if model_options.crop_size else tf.shape(images)[1])
+ crop_width = (
+ model_options.crop_size[1]
+ if model_options.crop_size else tf.shape(images)[2])
+ if model_options.image_pooling_crop_size:
+ image_pooling_crop_height = model_options.image_pooling_crop_size[0]
+ image_pooling_crop_width = model_options.image_pooling_crop_size[1]
+
+ # Compute the height, width for the output logits.
+ if model_options.decoder_output_stride:
+ logits_output_stride = min(model_options.decoder_output_stride)
+ else:
+ logits_output_stride = model_options.output_stride
+
+ logits_height = scale_dimension(
+ crop_height,
+ max(1.0, max(image_pyramid)) / logits_output_stride)
+ logits_width = scale_dimension(
+ crop_width,
+ max(1.0, max(image_pyramid)) / logits_output_stride)
+
+ # Compute the logits for each scale in the image pyramid.
+ outputs_to_scales_to_logits = {
+ k: {}
+ for k in model_options.outputs_to_num_classes
+ }
+
+ num_channels = images.get_shape().as_list()[-1]
+
+ for image_scale in image_pyramid:
+ if image_scale != 1.0:
+ scaled_height = scale_dimension(crop_height, image_scale)
+ scaled_width = scale_dimension(crop_width, image_scale)
+ scaled_crop_size = [scaled_height, scaled_width]
+ scaled_images = _resize_bilinear(images, scaled_crop_size, images.dtype)
+ if model_options.crop_size:
+ scaled_images.set_shape(
+ [None, scaled_height, scaled_width, num_channels])
+ # Adjust image_pooling_crop_size accordingly.
+ scaled_image_pooling_crop_size = None
+ if model_options.image_pooling_crop_size:
+ scaled_image_pooling_crop_size = [
+ scale_dimension(image_pooling_crop_height, image_scale),
+ scale_dimension(image_pooling_crop_width, image_scale)]
+ else:
+ scaled_crop_size = model_options.crop_size
+ scaled_images = images
+ scaled_image_pooling_crop_size = model_options.image_pooling_crop_size
+
+ updated_options = model_options._replace(
+ crop_size=scaled_crop_size,
+ image_pooling_crop_size=scaled_image_pooling_crop_size)
+ outputs_to_logits = _get_logits(
+ scaled_images,
+ updated_options,
+ weight_decay=weight_decay,
+ reuse=tf.AUTO_REUSE,
+ is_training=is_training,
+ fine_tune_batch_norm=fine_tune_batch_norm,
+ nas_training_hyper_parameters=nas_training_hyper_parameters)
+
+ # Resize the logits to have the same dimension before merging.
+ for output in sorted(outputs_to_logits):
+ outputs_to_logits[output] = _resize_bilinear(
+ outputs_to_logits[output], [logits_height, logits_width],
+ outputs_to_logits[output].dtype)
+
+ # Return when only one input scale.
+ if len(image_pyramid) == 1:
+ for output in sorted(model_options.outputs_to_num_classes):
+ outputs_to_scales_to_logits[output][
+ MERGED_LOGITS_SCOPE] = outputs_to_logits[output]
+ return outputs_to_scales_to_logits
+
+ # Save logits to the output map.
+ for output in sorted(model_options.outputs_to_num_classes):
+ outputs_to_scales_to_logits[output][
+ 'logits_%.2f' % image_scale] = outputs_to_logits[output]
+
+ # Merge the logits from all the multi-scale inputs.
+ for output in sorted(model_options.outputs_to_num_classes):
+ # Concatenate the multi-scale logits for each output type.
+ all_logits = [
+ tf.expand_dims(logits, axis=4)
+ for logits in outputs_to_scales_to_logits[output].values()
+ ]
+ all_logits = tf.concat(all_logits, 4)
+ merge_fn = (
+ tf.reduce_max
+ if model_options.merge_method == 'max' else tf.reduce_mean)
+ outputs_to_scales_to_logits[output][MERGED_LOGITS_SCOPE] = merge_fn(
+ all_logits, axis=4)
+
+ return outputs_to_scales_to_logits
+
+
+def extract_features(images,
+ model_options,
+ weight_decay=0.0001,
+ reuse=None,
+ is_training=False,
+ fine_tune_batch_norm=False,
+ nas_training_hyper_parameters=None):
+ """Extracts features by the particular model_variant.
+
+ Args:
+ images: A tensor of size [batch, height, width, channels].
+ model_options: A ModelOptions instance to configure models.
+ weight_decay: The weight decay for model variables.
+ reuse: Reuse the model variables or not.
+ is_training: Is training or not.
+ fine_tune_batch_norm: Fine-tune the batch norm parameters or not.
+ nas_training_hyper_parameters: A dictionary storing hyper-parameters for
+ training nas models. Its keys are:
+ - `drop_path_keep_prob`: Probability to keep each path in the cell when
+ training.
+ - `total_training_steps`: Total training steps to help drop path
+ probability calculation.
+
+ Returns:
+ concat_logits: A tensor of size [batch, feature_height, feature_width,
+ feature_channels], where feature_height/feature_width are determined by
+ the images height/width and output_stride.
+ end_points: A dictionary from components of the network to the corresponding
+ activation.
+ """
+ features, end_points = feature_extractor.extract_features(
+ images,
+ output_stride=model_options.output_stride,
+ multi_grid=model_options.multi_grid,
+ model_variant=model_options.model_variant,
+ depth_multiplier=model_options.depth_multiplier,
+ divisible_by=model_options.divisible_by,
+ weight_decay=weight_decay,
+ reuse=reuse,
+ is_training=is_training,
+ preprocessed_images_dtype=model_options.preprocessed_images_dtype,
+ fine_tune_batch_norm=fine_tune_batch_norm,
+ nas_architecture_options=model_options.nas_architecture_options,
+ nas_training_hyper_parameters=nas_training_hyper_parameters,
+ use_bounded_activation=model_options.use_bounded_activation)
+
+ if not model_options.aspp_with_batch_norm:
+ return features, end_points
+ else:
+ if model_options.dense_prediction_cell_config is not None:
+ tf.logging.info('Using dense prediction cell config.')
+ dense_prediction_layer = dense_prediction_cell.DensePredictionCell(
+ config=model_options.dense_prediction_cell_config,
+ hparams={
+ 'conv_rate_multiplier': 16 // model_options.output_stride,
+ })
+ concat_logits = dense_prediction_layer.build_cell(
+ features,
+ output_stride=model_options.output_stride,
+ crop_size=model_options.crop_size,
+ image_pooling_crop_size=model_options.image_pooling_crop_size,
+ weight_decay=weight_decay,
+ reuse=reuse,
+ is_training=is_training,
+ fine_tune_batch_norm=fine_tune_batch_norm)
+ return concat_logits, end_points
+ else:
+ # The following codes employ the DeepLabv3 ASPP module. Note that we
+ # could express the ASPP module as one particular dense prediction
+ # cell architecture. We do not do so but leave the following codes
+ # for backward compatibility.
+ batch_norm_params = utils.get_batch_norm_params(
+ decay=0.9997,
+ epsilon=1e-5,
+ scale=True,
+ is_training=(is_training and fine_tune_batch_norm),
+ sync_batch_norm_method=model_options.sync_batch_norm_method)
+ batch_norm = utils.get_batch_norm_fn(
+ model_options.sync_batch_norm_method)
+ activation_fn = (
+ tf.nn.relu6 if model_options.use_bounded_activation else tf.nn.relu)
+ with slim.arg_scope(
+ [slim.conv2d, slim.separable_conv2d],
+ weights_regularizer=slim.l2_regularizer(weight_decay),
+ activation_fn=activation_fn,
+ normalizer_fn=batch_norm,
+ padding='SAME',
+ stride=1,
+ reuse=reuse):
+ with slim.arg_scope([batch_norm], **batch_norm_params):
+ depth = model_options.aspp_convs_filters
+ branch_logits = []
+
+ if model_options.add_image_level_feature:
+ if model_options.crop_size is not None:
+ image_pooling_crop_size = model_options.image_pooling_crop_size
+ # If image_pooling_crop_size is not specified, use crop_size.
+ if image_pooling_crop_size is None:
+ image_pooling_crop_size = model_options.crop_size
+ pool_height = scale_dimension(
+ image_pooling_crop_size[0],
+ 1. / model_options.output_stride)
+ pool_width = scale_dimension(
+ image_pooling_crop_size[1],
+ 1. / model_options.output_stride)
+ image_feature = slim.avg_pool2d(
+ features, [pool_height, pool_width],
+ model_options.image_pooling_stride, padding='VALID')
+ resize_height = scale_dimension(
+ model_options.crop_size[0],
+ 1. / model_options.output_stride)
+ resize_width = scale_dimension(
+ model_options.crop_size[1],
+ 1. / model_options.output_stride)
+ else:
+ # If crop_size is None, we simply do global pooling.
+ pool_height = tf.shape(features)[1]
+ pool_width = tf.shape(features)[2]
+ image_feature = tf.reduce_mean(
+ features, axis=[1, 2], keepdims=True)
+ resize_height = pool_height
+ resize_width = pool_width
+ image_feature_activation_fn = tf.nn.relu
+ image_feature_normalizer_fn = batch_norm
+ if model_options.aspp_with_squeeze_and_excitation:
+ image_feature_activation_fn = tf.nn.sigmoid
+ if model_options.image_se_uses_qsigmoid:
+ image_feature_activation_fn = utils.q_sigmoid
+ image_feature_normalizer_fn = None
+ image_feature = slim.conv2d(
+ image_feature, depth, 1,
+ activation_fn=image_feature_activation_fn,
+ normalizer_fn=image_feature_normalizer_fn,
+ scope=IMAGE_POOLING_SCOPE)
+ image_feature = _resize_bilinear(
+ image_feature,
+ [resize_height, resize_width],
+ image_feature.dtype)
+ # Set shape for resize_height/resize_width if they are not Tensor.
+ if isinstance(resize_height, tf.Tensor):
+ resize_height = None
+ if isinstance(resize_width, tf.Tensor):
+ resize_width = None
+ image_feature.set_shape([None, resize_height, resize_width, depth])
+ if not model_options.aspp_with_squeeze_and_excitation:
+ branch_logits.append(image_feature)
+
+ # Employ a 1x1 convolution.
+ branch_logits.append(slim.conv2d(features, depth, 1,
+ scope=ASPP_SCOPE + str(0)))
+
+ if model_options.atrous_rates:
+ # Employ 3x3 convolutions with different atrous rates.
+ for i, rate in enumerate(model_options.atrous_rates, 1):
+ scope = ASPP_SCOPE + str(i)
+ if model_options.aspp_with_separable_conv:
+ aspp_features = split_separable_conv2d(
+ features,
+ filters=depth,
+ rate=rate,
+ weight_decay=weight_decay,
+ scope=scope)
+ else:
+ aspp_features = slim.conv2d(
+ features, depth, 3, rate=rate, scope=scope)
+ branch_logits.append(aspp_features)
+
+ # Merge branch logits.
+ concat_logits = tf.concat(branch_logits, 3)
+ if model_options.aspp_with_concat_projection:
+ concat_logits = slim.conv2d(
+ concat_logits, depth, 1, scope=CONCAT_PROJECTION_SCOPE)
+ concat_logits = slim.dropout(
+ concat_logits,
+ keep_prob=0.9,
+ is_training=is_training,
+ scope=CONCAT_PROJECTION_SCOPE + '_dropout')
+ if (model_options.add_image_level_feature and
+ model_options.aspp_with_squeeze_and_excitation):
+ concat_logits *= image_feature
+
+ return concat_logits, end_points
+
+
+def _get_logits(images,
+ model_options,
+ weight_decay=0.0001,
+ reuse=None,
+ is_training=False,
+ fine_tune_batch_norm=False,
+ nas_training_hyper_parameters=None):
+ """Gets the logits by atrous/image spatial pyramid pooling.
+
+ Args:
+ images: A tensor of size [batch, height, width, channels].
+ model_options: A ModelOptions instance to configure models.
+ weight_decay: The weight decay for model variables.
+ reuse: Reuse the model variables or not.
+ is_training: Is training or not.
+ fine_tune_batch_norm: Fine-tune the batch norm parameters or not.
+ nas_training_hyper_parameters: A dictionary storing hyper-parameters for
+ training nas models. Its keys are:
+ - `drop_path_keep_prob`: Probability to keep each path in the cell when
+ training.
+ - `total_training_steps`: Total training steps to help drop path
+ probability calculation.
+
+ Returns:
+ outputs_to_logits: A map from output_type to logits.
+ """
+ features, end_points = extract_features(
+ images,
+ model_options,
+ weight_decay=weight_decay,
+ reuse=reuse,
+ is_training=is_training,
+ fine_tune_batch_norm=fine_tune_batch_norm,
+ nas_training_hyper_parameters=nas_training_hyper_parameters)
+
+ if model_options.decoder_output_stride:
+ crop_size = model_options.crop_size
+ if crop_size is None:
+ crop_size = [tf.shape(images)[1], tf.shape(images)[2]]
+ features = refine_by_decoder(
+ features,
+ end_points,
+ crop_size=crop_size,
+ decoder_output_stride=model_options.decoder_output_stride,
+ decoder_use_separable_conv=model_options.decoder_use_separable_conv,
+ decoder_use_sum_merge=model_options.decoder_use_sum_merge,
+ decoder_filters=model_options.decoder_filters,
+ decoder_output_is_logits=model_options.decoder_output_is_logits,
+ model_variant=model_options.model_variant,
+ weight_decay=weight_decay,
+ reuse=reuse,
+ is_training=is_training,
+ fine_tune_batch_norm=fine_tune_batch_norm,
+ use_bounded_activation=model_options.use_bounded_activation)
+
+ outputs_to_logits = {}
+ for output in sorted(model_options.outputs_to_num_classes):
+ if model_options.decoder_output_is_logits:
+ outputs_to_logits[output] = tf.identity(features,
+ name=output)
+ else:
+ outputs_to_logits[output] = get_branch_logits(
+ features,
+ model_options.outputs_to_num_classes[output],
+ model_options.atrous_rates,
+ aspp_with_batch_norm=model_options.aspp_with_batch_norm,
+ kernel_size=model_options.logits_kernel_size,
+ weight_decay=weight_decay,
+ reuse=reuse,
+ scope_suffix=output)
+
+ return outputs_to_logits
+
+
+def refine_by_decoder(features,
+ end_points,
+ crop_size=None,
+ decoder_output_stride=None,
+ decoder_use_separable_conv=False,
+ decoder_use_sum_merge=False,
+ decoder_filters=256,
+ decoder_output_is_logits=False,
+ model_variant=None,
+ weight_decay=0.0001,
+ reuse=None,
+ is_training=False,
+ fine_tune_batch_norm=False,
+ use_bounded_activation=False,
+ sync_batch_norm_method='None'):
+ """Adds the decoder to obtain sharper segmentation results.
+
+ Args:
+ features: A tensor of size [batch, features_height, features_width,
+ features_channels].
+ end_points: A dictionary from components of the network to the corresponding
+ activation.
+ crop_size: A tuple [crop_height, crop_width] specifying whole patch crop
+ size.
+ decoder_output_stride: A list of integers specifying the output stride of
+ low-level features used in the decoder module.
+ decoder_use_separable_conv: Employ separable convolution for decoder or not.
+ decoder_use_sum_merge: Boolean, decoder uses simple sum merge or not.
+ decoder_filters: Integer, decoder filter size.
+ decoder_output_is_logits: Boolean, using decoder output as logits or not.
+ model_variant: Model variant for feature extraction.
+ weight_decay: The weight decay for model variables.
+ reuse: Reuse the model variables or not.
+ is_training: Is training or not.
+ fine_tune_batch_norm: Fine-tune the batch norm parameters or not.
+ use_bounded_activation: Whether or not to use bounded activations. Bounded
+ activations better lend themselves to quantized inference.
+ sync_batch_norm_method: String, method used to sync batch norm. Currently
+ only support `None` (no sync batch norm) and `tpu` (use tpu code to
+ sync batch norm).
+
+ Returns:
+ Decoder output with size [batch, decoder_height, decoder_width,
+ decoder_channels].
+
+ Raises:
+ ValueError: If crop_size is None.
+ """
+ if crop_size is None:
+ raise ValueError('crop_size must be provided when using decoder.')
+ batch_norm_params = utils.get_batch_norm_params(
+ decay=0.9997,
+ epsilon=1e-5,
+ scale=True,
+ is_training=(is_training and fine_tune_batch_norm),
+ sync_batch_norm_method=sync_batch_norm_method)
+ batch_norm = utils.get_batch_norm_fn(sync_batch_norm_method)
+ decoder_depth = decoder_filters
+ projected_filters = 48
+ if decoder_use_sum_merge:
+ # When using sum merge, the projected filters must be equal to decoder
+ # filters.
+ projected_filters = decoder_filters
+ if decoder_output_is_logits:
+ # Overwrite the setting when decoder output is logits.
+ activation_fn = None
+ normalizer_fn = None
+ conv2d_kernel = 1
+ # Use original conv instead of separable conv.
+ decoder_use_separable_conv = False
+ else:
+ # Default setting when decoder output is not logits.
+ activation_fn = tf.nn.relu6 if use_bounded_activation else tf.nn.relu
+ normalizer_fn = batch_norm
+ conv2d_kernel = 3
+ with slim.arg_scope(
+ [slim.conv2d, slim.separable_conv2d],
+ weights_regularizer=slim.l2_regularizer(weight_decay),
+ activation_fn=activation_fn,
+ normalizer_fn=normalizer_fn,
+ padding='SAME',
+ stride=1,
+ reuse=reuse):
+ with slim.arg_scope([batch_norm], **batch_norm_params):
+ with tf.variable_scope(DECODER_SCOPE, DECODER_SCOPE, [features]):
+ decoder_features = features
+ decoder_stage = 0
+ scope_suffix = ''
+ for output_stride in decoder_output_stride:
+ feature_list = feature_extractor.networks_to_feature_maps[
+ model_variant][
+ feature_extractor.DECODER_END_POINTS][output_stride]
+ # If only one decoder stage, we do not change the scope name in
+ # order for backward compactibility.
+ if decoder_stage:
+ scope_suffix = '_{}'.format(decoder_stage)
+ for i, name in enumerate(feature_list):
+ decoder_features_list = [decoder_features]
+ # MobileNet and NAS variants use different naming convention.
+ if ('mobilenet' in model_variant or
+ model_variant.startswith('mnas') or
+ model_variant.startswith('nas')):
+ feature_name = name
+ else:
+ feature_name = '{}/{}'.format(
+ feature_extractor.name_scope[model_variant], name)
+ decoder_features_list.append(
+ slim.conv2d(
+ end_points[feature_name],
+ projected_filters,
+ 1,
+ scope='feature_projection' + str(i) + scope_suffix))
+ # Determine the output size.
+ decoder_height = scale_dimension(crop_size[0], 1.0 / output_stride)
+ decoder_width = scale_dimension(crop_size[1], 1.0 / output_stride)
+ # Resize to decoder_height/decoder_width.
+ for j, feature in enumerate(decoder_features_list):
+ decoder_features_list[j] = _resize_bilinear(
+ feature, [decoder_height, decoder_width], feature.dtype)
+ h = (None if isinstance(decoder_height, tf.Tensor)
+ else decoder_height)
+ w = (None if isinstance(decoder_width, tf.Tensor)
+ else decoder_width)
+ decoder_features_list[j].set_shape([None, h, w, None])
+ if decoder_use_sum_merge:
+ decoder_features = _decoder_with_sum_merge(
+ decoder_features_list,
+ decoder_depth,
+ conv2d_kernel=conv2d_kernel,
+ decoder_use_separable_conv=decoder_use_separable_conv,
+ weight_decay=weight_decay,
+ scope_suffix=scope_suffix)
+ else:
+ if not decoder_use_separable_conv:
+ scope_suffix = str(i) + scope_suffix
+ decoder_features = _decoder_with_concat_merge(
+ decoder_features_list,
+ decoder_depth,
+ decoder_use_separable_conv=decoder_use_separable_conv,
+ weight_decay=weight_decay,
+ scope_suffix=scope_suffix)
+ decoder_stage += 1
+ return decoder_features
+
+
+def _decoder_with_sum_merge(decoder_features_list,
+ decoder_depth,
+ conv2d_kernel=3,
+ decoder_use_separable_conv=True,
+ weight_decay=0.0001,
+ scope_suffix=''):
+ """Decoder with sum to merge features.
+
+ Args:
+ decoder_features_list: A list of decoder features.
+ decoder_depth: Integer, the filters used in the convolution.
+ conv2d_kernel: Integer, the convolution kernel size.
+ decoder_use_separable_conv: Boolean, use separable conv or not.
+ weight_decay: Weight decay for the model variables.
+ scope_suffix: String, used in the scope suffix.
+
+ Returns:
+ decoder features merged with sum.
+
+ Raises:
+ RuntimeError: If decoder_features_list have length not equal to 2.
+ """
+ if len(decoder_features_list) != 2:
+ raise RuntimeError('Expect decoder_features has length 2.')
+ # Only apply one convolution when decoder use sum merge.
+ if decoder_use_separable_conv:
+ decoder_features = split_separable_conv2d(
+ decoder_features_list[0],
+ filters=decoder_depth,
+ rate=1,
+ weight_decay=weight_decay,
+ scope='decoder_split_sep_conv0'+scope_suffix) + decoder_features_list[1]
+ else:
+ decoder_features = slim.conv2d(
+ decoder_features_list[0],
+ decoder_depth,
+ conv2d_kernel,
+ scope='decoder_conv0'+scope_suffix) + decoder_features_list[1]
+ return decoder_features
+
+
+def _decoder_with_concat_merge(decoder_features_list,
+ decoder_depth,
+ decoder_use_separable_conv=True,
+ weight_decay=0.0001,
+ scope_suffix=''):
+ """Decoder with concatenation to merge features.
+
+ This decoder method applies two convolutions to smooth the features obtained
+ by concatenating the input decoder_features_list.
+
+ This decoder module is proposed in the DeepLabv3+ paper.
+
+ Args:
+ decoder_features_list: A list of decoder features.
+ decoder_depth: Integer, the filters used in the convolution.
+ decoder_use_separable_conv: Boolean, use separable conv or not.
+ weight_decay: Weight decay for the model variables.
+ scope_suffix: String, used in the scope suffix.
+
+ Returns:
+ decoder features merged with concatenation.
+ """
+ if decoder_use_separable_conv:
+ decoder_features = split_separable_conv2d(
+ tf.concat(decoder_features_list, 3),
+ filters=decoder_depth,
+ rate=1,
+ weight_decay=weight_decay,
+ scope='decoder_conv0'+scope_suffix)
+ decoder_features = split_separable_conv2d(
+ decoder_features,
+ filters=decoder_depth,
+ rate=1,
+ weight_decay=weight_decay,
+ scope='decoder_conv1'+scope_suffix)
+ else:
+ num_convs = 2
+ decoder_features = slim.repeat(
+ tf.concat(decoder_features_list, 3),
+ num_convs,
+ slim.conv2d,
+ decoder_depth,
+ 3,
+ scope='decoder_conv'+scope_suffix)
+ return decoder_features
+
+
+def get_branch_logits(features,
+ num_classes,
+ atrous_rates=None,
+ aspp_with_batch_norm=False,
+ kernel_size=1,
+ weight_decay=0.0001,
+ reuse=None,
+ scope_suffix=''):
+ """Gets the logits from each model's branch.
+
+ The underlying model is branched out in the last layer when atrous
+ spatial pyramid pooling is employed, and all branches are sum-merged
+ to form the final logits.
+
+ Args:
+ features: A float tensor of shape [batch, height, width, channels].
+ num_classes: Number of classes to predict.
+ atrous_rates: A list of atrous convolution rates for last layer.
+ aspp_with_batch_norm: Use batch normalization layers for ASPP.
+ kernel_size: Kernel size for convolution.
+ weight_decay: Weight decay for the model variables.
+ reuse: Reuse model variables or not.
+ scope_suffix: Scope suffix for the model variables.
+
+ Returns:
+ Merged logits with shape [batch, height, width, num_classes].
+
+ Raises:
+ ValueError: Upon invalid input kernel_size value.
+ """
+ # When using batch normalization with ASPP, ASPP has been applied before
+ # in extract_features, and thus we simply apply 1x1 convolution here.
+ if aspp_with_batch_norm or atrous_rates is None:
+ if kernel_size != 1:
+ raise ValueError('Kernel size must be 1 when atrous_rates is None or '
+ 'using aspp_with_batch_norm. Gets %d.' % kernel_size)
+ atrous_rates = [1]
+
+ with slim.arg_scope(
+ [slim.conv2d],
+ weights_regularizer=slim.l2_regularizer(weight_decay),
+ weights_initializer=tf.truncated_normal_initializer(stddev=0.01),
+ reuse=reuse):
+ with tf.variable_scope(LOGITS_SCOPE_NAME, LOGITS_SCOPE_NAME, [features]):
+ branch_logits = []
+ for i, rate in enumerate(atrous_rates):
+ scope = scope_suffix
+ if i:
+ scope += '_%d' % i
+
+ branch_logits.append(
+ slim.conv2d(
+ features,
+ num_classes,
+ kernel_size=kernel_size,
+ rate=rate,
+ activation_fn=None,
+ normalizer_fn=None,
+ scope=scope))
+
+ return tf.add_n(branch_logits)
diff --git a/deeplab/models/research/deeplab/model_test.py b/deeplab/models/research/deeplab/model_test.py
new file mode 100644
index 0000000..d8413d7
--- /dev/null
+++ b/deeplab/models/research/deeplab/model_test.py
@@ -0,0 +1,148 @@
+# Lint as: python2, python3
+# Copyright 2018 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+
+"""Tests for DeepLab model and some helper functions."""
+
+import tensorflow as tf
+
+from deeplab import common
+from deeplab import model
+
+
+class DeeplabModelTest(tf.test.TestCase):
+
+ def testWrongDeepLabVariant(self):
+ model_options = common.ModelOptions([])._replace(
+ model_variant='no_such_variant')
+ with self.assertRaises(ValueError):
+ model._get_logits(images=[], model_options=model_options)
+
+ def testBuildDeepLabv2(self):
+ batch_size = 2
+ crop_size = [41, 41]
+
+ # Test with two image_pyramids.
+ image_pyramids = [[1], [0.5, 1]]
+
+ # Test two model variants.
+ model_variants = ['xception_65', 'mobilenet_v2']
+
+ # Test with two output_types.
+ outputs_to_num_classes = {'semantic': 3,
+ 'direction': 2}
+
+ expected_endpoints = [['merged_logits'],
+ ['merged_logits',
+ 'logits_0.50',
+ 'logits_1.00']]
+ expected_num_logits = [1, 3]
+
+ for model_variant in model_variants:
+ model_options = common.ModelOptions(outputs_to_num_classes)._replace(
+ add_image_level_feature=False,
+ aspp_with_batch_norm=False,
+ aspp_with_separable_conv=False,
+ model_variant=model_variant)
+
+ for i, image_pyramid in enumerate(image_pyramids):
+ g = tf.Graph()
+ with g.as_default():
+ with self.test_session(graph=g):
+ inputs = tf.random_uniform(
+ (batch_size, crop_size[0], crop_size[1], 3))
+ outputs_to_scales_to_logits = model.multi_scale_logits(
+ inputs, model_options, image_pyramid=image_pyramid)
+
+ # Check computed results for each output type.
+ for output in outputs_to_num_classes:
+ scales_to_logits = outputs_to_scales_to_logits[output]
+ self.assertListEqual(sorted(scales_to_logits.keys()),
+ sorted(expected_endpoints[i]))
+
+ # Expected number of logits = len(image_pyramid) + 1, since the
+ # last logits is merged from all the scales.
+ self.assertEqual(len(scales_to_logits), expected_num_logits[i])
+
+ def testForwardpassDeepLabv3plus(self):
+ crop_size = [33, 33]
+ outputs_to_num_classes = {'semantic': 3}
+
+ model_options = common.ModelOptions(
+ outputs_to_num_classes,
+ crop_size,
+ output_stride=16
+ )._replace(
+ add_image_level_feature=True,
+ aspp_with_batch_norm=True,
+ logits_kernel_size=1,
+ decoder_output_stride=[4],
+ model_variant='mobilenet_v2') # Employ MobileNetv2 for fast test.
+
+ g = tf.Graph()
+ with g.as_default():
+ with self.test_session(graph=g) as sess:
+ inputs = tf.random_uniform(
+ (1, crop_size[0], crop_size[1], 3))
+ outputs_to_scales_to_logits = model.multi_scale_logits(
+ inputs,
+ model_options,
+ image_pyramid=[1.0])
+
+ sess.run(tf.global_variables_initializer())
+ outputs_to_scales_to_logits = sess.run(outputs_to_scales_to_logits)
+
+ # Check computed results for each output type.
+ for output in outputs_to_num_classes:
+ scales_to_logits = outputs_to_scales_to_logits[output]
+ # Expect only one output.
+ self.assertEqual(len(scales_to_logits), 1)
+ for logits in scales_to_logits.values():
+ self.assertTrue(logits.any())
+
+ def testBuildDeepLabWithDensePredictionCell(self):
+ batch_size = 1
+ crop_size = [33, 33]
+ outputs_to_num_classes = {'semantic': 2}
+ expected_endpoints = ['merged_logits']
+ dense_prediction_cell_config = [
+ {'kernel': 3, 'rate': [1, 6], 'op': 'conv', 'input': -1},
+ {'kernel': 3, 'rate': [18, 15], 'op': 'conv', 'input': 0},
+ ]
+ model_options = common.ModelOptions(
+ outputs_to_num_classes,
+ crop_size,
+ output_stride=16)._replace(
+ aspp_with_batch_norm=True,
+ model_variant='mobilenet_v2',
+ dense_prediction_cell_config=dense_prediction_cell_config)
+ g = tf.Graph()
+ with g.as_default():
+ with self.test_session(graph=g):
+ inputs = tf.random_uniform(
+ (batch_size, crop_size[0], crop_size[1], 3))
+ outputs_to_scales_to_model_results = model.multi_scale_logits(
+ inputs,
+ model_options,
+ image_pyramid=[1.0])
+ for output in outputs_to_num_classes:
+ scales_to_model_results = outputs_to_scales_to_model_results[output]
+ self.assertListEqual(
+ list(scales_to_model_results), expected_endpoints)
+ self.assertEqual(len(scales_to_model_results), 1)
+
+
+if __name__ == '__main__':
+ tf.test.main()
diff --git a/deeplab/models/research/deeplab/setup.txt b/deeplab/models/research/deeplab/setup.txt
new file mode 100644
index 0000000..06a68e3
--- /dev/null
+++ b/deeplab/models/research/deeplab/setup.txt
@@ -0,0 +1,78 @@
+python version 2.7
+tensorflow version 1.14.0
+
+git clone "https://github.com/tensorflow/models.git"
+
+Folder Structure
++models
+ +deeplab
+ +datasets
+ +PQR (our custom dataset, make this dir)
+ +dataset
+ +ImageSets
+ - train.txt
+ - trainval.txt
+ - val.txt
+ +JPEGImages
+ - img_1.jpg
+ - img_2.jpg
+ +SegmentationClass
+ - img_1.jpg (notice same name as images in JPEGImages)
+ - img_2.jpg
+ +SegmentationClassRaw (don't make this directory, label_pqr.py will make it)
+ +exp
+ +train_on_trainval_set
+ +init_models
+ +deeplabv3_pascal_train_aug (download tar.gz file, put in init_models directory
+ and open it and this directory should appear)
+ +tfrecord
+
+Directory Explanations
+-ImageSets
+-> make 3 txt files
+ - train.txt (with names of training images in each row without image extension)
+ - val.txt (with names of validation images in each row without image extension)
+ - trainval.txt (with names of both training and validation images in each row without image extension)
+-JPEGImages
+-> all images
+-SegmentationClass
+-> segmentation masks with color
+-SegmentationClassRaw
+-> segmentation masks with indexed color, run label_pqr.py to get this
+-exp
+-> all training/evaluation/visualization files will go in here
+-deeplabv3_pascal_train_aug: pretrained weights download
+-> download this, put it in this folder, unpack it, done
+-> http://download.tensorflow.org/models/deeplabv3_pascal_train_aug_2018_01_04.tar.gz
+
+1: pip3 install --user gast==0.2.2
+2: pip3 install tf_slim
+3: From tensorflow/models/research/
+export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim
+4: From tensorflow/models/research/deeplab/datasets
+python3.7 datasets/label_pqr.py
+5: From tensorflow/models/research/deeplab/datasets
+python3.7 ./build_new_pqr_data.py --image_folder="./PQR/dataset/JPEGImages" --semantic_segmentation_folder="./PQR/dataset/SegmentationClassRaw" --list_folder="./PQR/dataset/ImageSets" --image_format="jpg" --output_dir="./PQR/tfrecord"
+- note: replace paths like "./PQR/dataset/JPEGImages" with the complete path in case the script can't find it
+6: From tensorflow/models/research/deeplab
+bash train-pqr.sh
+7: From tensorflow/models/research/deeplab
+python3.7 inference.py
+- note: change the class labels and image path at the very bottom of this script for now
+
+Notes:
+- to add/change classes
+ - go to train-pqr.sh and change --num_classes=2
+ - go to models/research/deeplab/datasets/data_generator.py and change:
+ _PQR_SEG_INFORMATION = DatasetDescriptor(
+ splits_to_sizes={
+ 'train': 3, # number of file in the train folder
+ 'trainval': 5,
+ 'val': 2,
+ },
+ num_classes=2, # number of classes in your dataset
+ ignore_label=255, # white edges that will be ignored to be class
+ )
+ - go to models/research/deeplab/inference.py and update the class names at the very bottom
+ - go to datasets/label_pqr.py and add color:class pairs to palette
+ - put images correctly in ImageSets, JPEGImages, SegmentationClass
\ No newline at end of file
diff --git a/deeplab/models/research/deeplab/testing/info.md b/deeplab/models/research/deeplab/testing/info.md
new file mode 100644
index 0000000..b84d2ad
--- /dev/null
+++ b/deeplab/models/research/deeplab/testing/info.md
@@ -0,0 +1,6 @@
+This directory contains testing data.
+
+# pascal_voc_seg
+This folder contains data specific to pascal_voc_seg dataset. val-00000-of-00001.tfrecord contains
+three randomly generated images with format defined in
+tensorflow/models/research/deeplab/datasets/build_voc2012_data.py.
diff --git a/deeplab/models/research/deeplab/testing/pascal_voc_seg/val-00000-of-00001.tfrecord b/deeplab/models/research/deeplab/testing/pascal_voc_seg/val-00000-of-00001.tfrecord
new file mode 100644
index 0000000..e81455b
Binary files /dev/null and b/deeplab/models/research/deeplab/testing/pascal_voc_seg/val-00000-of-00001.tfrecord differ
diff --git a/deeplab/models/research/deeplab/train-pqr.sh b/deeplab/models/research/deeplab/train-pqr.sh
new file mode 100644
index 0000000..2326370
--- /dev/null
+++ b/deeplab/models/research/deeplab/train-pqr.sh
@@ -0,0 +1,95 @@
+cd ..
+# Set up the working environment.
+CURRENT_DIR=$(pwd)
+WORK_DIR="${CURRENT_DIR}/deeplab"
+DATASET_DIR="datasets"
+
+# Set up the working directories.
+PQR_FOLDER="PQR"
+EXP_FOLDER="exp/train_on_trainval_set"
+# INIT_FOLDER="${WORK_DIR}/${DATASET_DIR}/${PQR_FOLDER}/${EXP_FOLDER}/init_models"
+INIT_FOLDER="${WORK_DIR}/${DATASET_DIR}/${PQR_FOLDER}/init_models"
+TRAIN_LOGDIR="${WORK_DIR}/${DATASET_DIR}/${PQR_FOLDER}/${EXP_FOLDER}/train"
+DATASET="${WORK_DIR}/${DATASET_DIR}/${PQR_FOLDER}/tfrecord"
+EVAL_LOGDIR="${WORK_DIR}/${DATASET_DIR}/${PQR_FOLDER}/${EXP_FOLDER}/eval"
+VIS_LOGDIR="${WORK_DIR}/${DATASET_DIR}/${PQR_FOLDER}/${EXP_FOLDER}/vis"
+EXPORT_DIR="${WORK_DIR}/${DATASET_DIR}/${PQR_FOLDER}/${EXP_FOLDER}/export"
+
+mkdir -p "${WORK_DIR}/${DATASET_DIR}/${PQR_FOLDER}/exp"
+mkdir -p "${TRAIN_LOGDIR}"
+mkdir -p "${EVAL_LOGDIR}"
+mkdir -p "${VIS_LOGDIR}"
+mkdir -p "${EXPORT_DIR}"
+
+NUM_ITERATIONS=5
+python3.7 "${WORK_DIR}"/train.py \
+ --logtostderr \
+ --train_split="trainval" \
+ --model_variant="xception_65" \
+ --atrous_rates=6 \
+ --atrous_rates=12 \
+ --atrous_rates=18 \
+ --output_stride=16 \
+ --decoder_output_stride=4 \
+ --train_crop_size="448,448" \
+ --train_batch_size=4 \
+ --training_number_of_steps="${NUM_ITERATIONS}" \
+ --fine_tune_batch_norm=true \
+ --tf_initial_checkpoint="${INIT_FOLDER}/deeplabv3_pascal_train_aug/model.ckpt" \
+ --train_logdir="${TRAIN_LOGDIR}" \
+ --dataset_dir="${DATASET}" \
+ --dataset="pqr" \
+ --initialize_last_layer=False
+
+python3.7 "${WORK_DIR}"/eval.py \
+ --logtostderr \
+ --eval_split="val" \
+ --model_variant="xception_65" \
+ --atrous_rates=6 \
+ --atrous_rates=12 \
+ --atrous_rates=18 \
+ --output_stride=16 \
+ --decoder_output_stride=4 \
+ --eval_crop_size="448,448" \
+ --checkpoint_dir="${TRAIN_LOGDIR}" \
+ --eval_logdir="${EVAL_LOGDIR}" \
+ --dataset_dir="${PQR_DATASET}" \
+ --max_number_of_evaluations=1
+ --dataset="pqr"
+
+# Visualize the results.
+python3.7 "${WORK_DIR}"/vis.py \
+ --logtostderr \
+ --vis_split="val" \
+ --model_variant="xception_65" \
+ --atrous_rates=6 \
+ --atrous_rates=12 \
+ --atrous_rates=18 \
+ --output_stride=16 \
+ --decoder_output_stride=4 \
+ --vis_crop_size="448,448" \
+ --checkpoint_dir="${TRAIN_LOGDIR}" \
+ --vis_logdir="${VIS_LOGDIR}" \
+ --dataset_dir="${PQR_DATASET}" \
+ --max_number_of_iterations=1
+ --dataset="pqr"
+
+# Export the trained checkpoint.
+CKPT_PATH="${TRAIN_LOGDIR}/model.ckpt-${NUM_ITERATIONS}"
+EXPORT_PATH="${EXPORT_DIR}/frozen_inference_graph.pb"
+
+python3.7 "${WORK_DIR}"/export_model.py \
+ --logtostderr \
+ --checkpoint_path="${CKPT_PATH}" \
+ --export_path="${EXPORT_PATH}" \
+ --model_variant="xception_65" \
+ --atrous_rates=6 \
+ --atrous_rates=12 \
+ --atrous_rates=18 \
+ --output_stride=16 \
+ --decoder_output_stride=4 \
+ --num_classes=2 \
+ --crop_size=448 \
+ --crop_size=448 \
+ --inference_scales=1.0
+
diff --git a/deeplab/models/research/deeplab/train.py b/deeplab/models/research/deeplab/train.py
new file mode 100644
index 0000000..fbe060d
--- /dev/null
+++ b/deeplab/models/research/deeplab/train.py
@@ -0,0 +1,464 @@
+# Lint as: python2, python3
+# Copyright 2018 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Training script for the DeepLab model.
+
+See model.py for more details and usage.
+"""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import six
+import tensorflow as tf
+from tensorflow.contrib import quantize as contrib_quantize
+from tensorflow.contrib import tfprof as contrib_tfprof
+from deeplab import common
+from deeplab import model
+from deeplab.datasets import data_generator
+from deeplab.utils import train_utils
+from deployment import model_deploy
+
+slim = tf.contrib.slim
+flags = tf.app.flags
+FLAGS = flags.FLAGS
+
+# Settings for multi-GPUs/multi-replicas training.
+
+flags.DEFINE_integer('num_clones', 1, 'Number of clones to deploy.')
+
+flags.DEFINE_boolean('clone_on_cpu', False, 'Use CPUs to deploy clones.')
+
+flags.DEFINE_integer('num_replicas', 1, 'Number of worker replicas.')
+
+flags.DEFINE_integer('startup_delay_steps', 15,
+ 'Number of training steps between replicas startup.')
+
+flags.DEFINE_integer(
+ 'num_ps_tasks', 0,
+ 'The number of parameter servers. If the value is 0, then '
+ 'the parameters are handled locally by the worker.')
+
+flags.DEFINE_string('master', '', 'BNS name of the tensorflow server')
+
+flags.DEFINE_integer('task', 0, 'The task ID.')
+
+# Settings for logging.
+
+flags.DEFINE_string('train_logdir', None,
+ 'Where the checkpoint and logs are stored.')
+
+flags.DEFINE_integer('log_steps', 10,
+ 'Display logging information at every log_steps.')
+
+flags.DEFINE_integer('save_interval_secs', 1200,
+ 'How often, in seconds, we save the model to disk.')
+
+flags.DEFINE_integer('save_summaries_secs', 600,
+ 'How often, in seconds, we compute the summaries.')
+
+flags.DEFINE_boolean(
+ 'save_summaries_images', False,
+ 'Save sample inputs, labels, and semantic predictions as '
+ 'images to summary.')
+
+# Settings for profiling.
+
+flags.DEFINE_string('profile_logdir', None,
+ 'Where the profile files are stored.')
+
+# Settings for training strategy.
+
+flags.DEFINE_enum('optimizer', 'momentum', ['momentum', 'adam'],
+ 'Which optimizer to use.')
+
+
+# Momentum optimizer flags
+
+flags.DEFINE_enum('learning_policy', 'poly', ['poly', 'step'],
+ 'Learning rate policy for training.')
+
+# Use 0.007 when training on PASCAL augmented training set, train_aug. When
+# fine-tuning on PASCAL trainval set, use learning rate=0.0001.
+flags.DEFINE_float('base_learning_rate', .0001,
+ 'The base learning rate for model training.')
+
+flags.DEFINE_float('decay_steps', 0.0,
+ 'Decay steps for polynomial learning rate schedule.')
+
+flags.DEFINE_float('end_learning_rate', 0.0,
+ 'End learning rate for polynomial learning rate schedule.')
+
+flags.DEFINE_float('learning_rate_decay_factor', 0.1,
+ 'The rate to decay the base learning rate.')
+
+flags.DEFINE_integer('learning_rate_decay_step', 2000,
+ 'Decay the base learning rate at a fixed step.')
+
+flags.DEFINE_float('learning_power', 0.9,
+ 'The power value used in the poly learning policy.')
+
+flags.DEFINE_integer('training_number_of_steps', 30000,
+ 'The number of steps used for training')
+
+flags.DEFINE_float('momentum', 0.9, 'The momentum value to use')
+
+# Adam optimizer flags
+flags.DEFINE_float('adam_learning_rate', 0.001,
+ 'Learning rate for the adam optimizer.')
+flags.DEFINE_float('adam_epsilon', 1e-08, 'Adam optimizer epsilon.')
+
+# When fine_tune_batch_norm=True, use at least batch size larger than 12
+# (batch size more than 16 is better). Otherwise, one could use smaller batch
+# size and set fine_tune_batch_norm=False.
+flags.DEFINE_integer('train_batch_size', 8,
+ 'The number of images in each batch during training.')
+
+# For weight_decay, use 0.00004 for MobileNet-V2 or Xcpetion model variants.
+# Use 0.0001 for ResNet model variants.
+flags.DEFINE_float('weight_decay', 0.00004,
+ 'The value of the weight decay for training.')
+
+flags.DEFINE_list('train_crop_size', '513,513',
+ 'Image crop size [height, width] during training.')
+
+flags.DEFINE_float(
+ 'last_layer_gradient_multiplier', 1.0,
+ 'The gradient multiplier for last layers, which is used to '
+ 'boost the gradient of last layers if the value > 1.')
+
+flags.DEFINE_boolean('upsample_logits', True,
+ 'Upsample logits during training.')
+
+# Hyper-parameters for NAS training strategy.
+
+flags.DEFINE_float(
+ 'drop_path_keep_prob', 1.0,
+ 'Probability to keep each path in the NAS cell when training.')
+
+# Settings for fine-tuning the network.
+
+flags.DEFINE_string('tf_initial_checkpoint', None,
+ 'The initial checkpoint in tensorflow format.')
+
+# Set to False if one does not want to re-use the trained classifier weights.
+flags.DEFINE_boolean('initialize_last_layer', True,
+ 'Initialize the last layer.')
+
+flags.DEFINE_boolean('last_layers_contain_logits_only', False,
+ 'Only consider logits as last layers or not.')
+
+flags.DEFINE_integer('slow_start_step', 0,
+ 'Training model with small learning rate for few steps.')
+
+flags.DEFINE_float('slow_start_learning_rate', 1e-4,
+ 'Learning rate employed during slow start.')
+
+# Set to True if one wants to fine-tune the batch norm parameters in DeepLabv3.
+# Set to False and use small batch size to save GPU memory.
+flags.DEFINE_boolean('fine_tune_batch_norm', True,
+ 'Fine tune the batch norm parameters or not.')
+
+flags.DEFINE_float('min_scale_factor', 0.5,
+ 'Mininum scale factor for data augmentation.')
+
+flags.DEFINE_float('max_scale_factor', 2.,
+ 'Maximum scale factor for data augmentation.')
+
+flags.DEFINE_float('scale_factor_step_size', 0.25,
+ 'Scale factor step size for data augmentation.')
+
+# For `xception_65`, use atrous_rates = [12, 24, 36] if output_stride = 8, or
+# rates = [6, 12, 18] if output_stride = 16. For `mobilenet_v2`, use None. Note
+# one could use different atrous_rates/output_stride during training/evaluation.
+flags.DEFINE_multi_integer('atrous_rates', None,
+ 'Atrous rates for atrous spatial pyramid pooling.')
+
+flags.DEFINE_integer('output_stride', 16,
+ 'The ratio of input to output spatial resolution.')
+
+# Hard example mining related flags.
+flags.DEFINE_integer(
+ 'hard_example_mining_step', 0,
+ 'The training step in which exact hard example mining kicks off. Note we '
+ 'gradually reduce the mining percent to the specified '
+ 'top_k_percent_pixels. For example, if hard_example_mining_step=100K and '
+ 'top_k_percent_pixels=0.25, then mining percent will gradually reduce from '
+ '100% to 25% until 100K steps after which we only mine top 25% pixels.')
+
+flags.DEFINE_float(
+ 'top_k_percent_pixels', 1.0,
+ 'The top k percent pixels (in terms of the loss values) used to compute '
+ 'loss during training. This is useful for hard pixel mining.')
+
+# Quantization setting.
+flags.DEFINE_integer(
+ 'quantize_delay_step', -1,
+ 'Steps to start quantized training. If < 0, will not quantize model.')
+
+# Dataset settings.
+flags.DEFINE_string('dataset', 'pascal_voc_seg',
+ 'Name of the segmentation dataset.')
+
+flags.DEFINE_string('train_split', 'train',
+ 'Which split of the dataset to be used for training')
+
+flags.DEFINE_string('dataset_dir', None, 'Where the dataset reside.')
+
+
+def _build_deeplab(iterator, outputs_to_num_classes, ignore_label):
+ """Builds a clone of DeepLab.
+
+ Args:
+ iterator: An iterator of type tf.data.Iterator for images and labels.
+ outputs_to_num_classes: A map from output type to the number of classes. For
+ example, for the task of semantic segmentation with 21 semantic classes,
+ we would have outputs_to_num_classes['semantic'] = 21.
+ ignore_label: Ignore label.
+ """
+ samples = iterator.get_next()
+
+ # Add name to input and label nodes so we can add to summary.
+ samples[common.IMAGE] = tf.identity(samples[common.IMAGE], name=common.IMAGE)
+ samples[common.LABEL] = tf.identity(samples[common.LABEL], name=common.LABEL)
+
+ model_options = common.ModelOptions(
+ outputs_to_num_classes=outputs_to_num_classes,
+ crop_size=[int(sz) for sz in FLAGS.train_crop_size],
+ atrous_rates=FLAGS.atrous_rates,
+ output_stride=FLAGS.output_stride)
+
+ outputs_to_scales_to_logits = model.multi_scale_logits(
+ samples[common.IMAGE],
+ model_options=model_options,
+ image_pyramid=FLAGS.image_pyramid,
+ weight_decay=FLAGS.weight_decay,
+ is_training=True,
+ fine_tune_batch_norm=FLAGS.fine_tune_batch_norm,
+ nas_training_hyper_parameters={
+ 'drop_path_keep_prob': FLAGS.drop_path_keep_prob,
+ 'total_training_steps': FLAGS.training_number_of_steps,
+ })
+
+ # Add name to graph node so we can add to summary.
+ output_type_dict = outputs_to_scales_to_logits[common.OUTPUT_TYPE]
+ output_type_dict[model.MERGED_LOGITS_SCOPE] = tf.identity(
+ output_type_dict[model.MERGED_LOGITS_SCOPE], name=common.OUTPUT_TYPE)
+
+ for output, num_classes in six.iteritems(outputs_to_num_classes):
+ train_utils.add_softmax_cross_entropy_loss_for_each_scale(
+ outputs_to_scales_to_logits[output],
+ samples[common.LABEL],
+ num_classes,
+ ignore_label,
+ loss_weight=model_options.label_weights,
+ upsample_logits=FLAGS.upsample_logits,
+ hard_example_mining_step=FLAGS.hard_example_mining_step,
+ top_k_percent_pixels=FLAGS.top_k_percent_pixels,
+ scope=output)
+
+
+def main(unused_argv):
+ tf.logging.set_verbosity(tf.logging.INFO)
+ # Set up deployment (i.e., multi-GPUs and/or multi-replicas).
+ config = model_deploy.DeploymentConfig(
+ num_clones=FLAGS.num_clones,
+ clone_on_cpu=FLAGS.clone_on_cpu,
+ replica_id=FLAGS.task,
+ num_replicas=FLAGS.num_replicas,
+ num_ps_tasks=FLAGS.num_ps_tasks)
+
+ # Split the batch across GPUs.
+ assert FLAGS.train_batch_size % config.num_clones == 0, (
+ 'Training batch size not divisble by number of clones (GPUs).')
+
+ clone_batch_size = FLAGS.train_batch_size // config.num_clones
+
+ tf.gfile.MakeDirs(FLAGS.train_logdir)
+ tf.logging.info('Training on %s set', FLAGS.train_split)
+
+ with tf.Graph().as_default() as graph:
+ with tf.device(config.inputs_device()):
+ dataset = data_generator.Dataset(
+ dataset_name=FLAGS.dataset,
+ split_name=FLAGS.train_split,
+ dataset_dir=FLAGS.dataset_dir,
+ batch_size=clone_batch_size,
+ crop_size=[int(sz) for sz in FLAGS.train_crop_size],
+ min_resize_value=FLAGS.min_resize_value,
+ max_resize_value=FLAGS.max_resize_value,
+ resize_factor=FLAGS.resize_factor,
+ min_scale_factor=FLAGS.min_scale_factor,
+ max_scale_factor=FLAGS.max_scale_factor,
+ scale_factor_step_size=FLAGS.scale_factor_step_size,
+ model_variant=FLAGS.model_variant,
+ num_readers=4,
+ is_training=True,
+ should_shuffle=True,
+ should_repeat=True)
+
+ # Create the global step on the device storing the variables.
+ with tf.device(config.variables_device()):
+ global_step = tf.train.get_or_create_global_step()
+
+ # Define the model and create clones.
+ model_fn = _build_deeplab
+ model_args = (dataset.get_one_shot_iterator(), {
+ common.OUTPUT_TYPE: dataset.num_of_classes
+ }, dataset.ignore_label)
+ clones = model_deploy.create_clones(config, model_fn, args=model_args)
+
+ # Gather update_ops from the first clone. These contain, for example,
+ # the updates for the batch_norm variables created by model_fn.
+ first_clone_scope = config.clone_scope(0)
+ update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS, first_clone_scope)
+
+ # Gather initial summaries.
+ summaries = set(tf.get_collection(tf.GraphKeys.SUMMARIES))
+
+ # Add summaries for model variables.
+ for model_var in tf.model_variables():
+ summaries.add(tf.summary.histogram(model_var.op.name, model_var))
+
+ # Add summaries for images, labels, semantic predictions
+ if FLAGS.save_summaries_images:
+ summary_image = graph.get_tensor_by_name(
+ ('%s/%s:0' % (first_clone_scope, common.IMAGE)).strip('/'))
+ summaries.add(
+ tf.summary.image('samples/%s' % common.IMAGE, summary_image))
+
+ first_clone_label = graph.get_tensor_by_name(
+ ('%s/%s:0' % (first_clone_scope, common.LABEL)).strip('/'))
+ # Scale up summary image pixel values for better visualization.
+ pixel_scaling = max(1, 255 // dataset.num_of_classes)
+ summary_label = tf.cast(first_clone_label * pixel_scaling, tf.uint8)
+ summaries.add(
+ tf.summary.image('samples/%s' % common.LABEL, summary_label))
+
+ first_clone_output = graph.get_tensor_by_name(
+ ('%s/%s:0' % (first_clone_scope, common.OUTPUT_TYPE)).strip('/'))
+ predictions = tf.expand_dims(tf.argmax(first_clone_output, 3), -1)
+
+ summary_predictions = tf.cast(predictions * pixel_scaling, tf.uint8)
+ summaries.add(
+ tf.summary.image(
+ 'samples/%s' % common.OUTPUT_TYPE, summary_predictions))
+
+ # Add summaries for losses.
+ for loss in tf.get_collection(tf.GraphKeys.LOSSES, first_clone_scope):
+ summaries.add(tf.summary.scalar('losses/%s' % loss.op.name, loss))
+
+ # Build the optimizer based on the device specification.
+ with tf.device(config.optimizer_device()):
+ learning_rate = train_utils.get_model_learning_rate(
+ FLAGS.learning_policy,
+ FLAGS.base_learning_rate,
+ FLAGS.learning_rate_decay_step,
+ FLAGS.learning_rate_decay_factor,
+ FLAGS.training_number_of_steps,
+ FLAGS.learning_power,
+ FLAGS.slow_start_step,
+ FLAGS.slow_start_learning_rate,
+ decay_steps=FLAGS.decay_steps,
+ end_learning_rate=FLAGS.end_learning_rate)
+
+ summaries.add(tf.summary.scalar('learning_rate', learning_rate))
+
+ if FLAGS.optimizer == 'momentum':
+ optimizer = tf.train.MomentumOptimizer(learning_rate, FLAGS.momentum)
+ elif FLAGS.optimizer == 'adam':
+ optimizer = tf.train.AdamOptimizer(
+ learning_rate=FLAGS.adam_learning_rate, epsilon=FLAGS.adam_epsilon)
+ else:
+ raise ValueError('Unknown optimizer')
+
+ if FLAGS.quantize_delay_step >= 0:
+ if FLAGS.num_clones > 1:
+ raise ValueError('Quantization doesn\'t support multi-clone yet.')
+ contrib_quantize.create_training_graph(
+ quant_delay=FLAGS.quantize_delay_step)
+
+ startup_delay_steps = FLAGS.task * FLAGS.startup_delay_steps
+
+ with tf.device(config.variables_device()):
+ total_loss, grads_and_vars = model_deploy.optimize_clones(
+ clones, optimizer)
+ total_loss = tf.check_numerics(total_loss, 'Loss is inf or nan.')
+ summaries.add(tf.summary.scalar('total_loss', total_loss))
+
+ # Modify the gradients for biases and last layer variables.
+ last_layers = model.get_extra_layer_scopes(
+ FLAGS.last_layers_contain_logits_only)
+ grad_mult = train_utils.get_model_gradient_multipliers(
+ last_layers, FLAGS.last_layer_gradient_multiplier)
+ if grad_mult:
+ grads_and_vars = slim.learning.multiply_gradients(
+ grads_and_vars, grad_mult)
+
+ # Create gradient update op.
+ grad_updates = optimizer.apply_gradients(
+ grads_and_vars, global_step=global_step)
+ update_ops.append(grad_updates)
+ update_op = tf.group(*update_ops)
+ with tf.control_dependencies([update_op]):
+ train_tensor = tf.identity(total_loss, name='train_op')
+
+ # Add the summaries from the first clone. These contain the summaries
+ # created by model_fn and either optimize_clones() or _gather_clone_loss().
+ summaries |= set(
+ tf.get_collection(tf.GraphKeys.SUMMARIES, first_clone_scope))
+
+ # Merge all summaries together.
+ summary_op = tf.summary.merge(list(summaries))
+
+ # Soft placement allows placing on CPU ops without GPU implementation.
+ session_config = tf.ConfigProto(
+ allow_soft_placement=True, log_device_placement=False)
+
+ # Start the training.
+ profile_dir = FLAGS.profile_logdir
+ if profile_dir is not None:
+ tf.gfile.MakeDirs(profile_dir)
+
+ with contrib_tfprof.ProfileContext(
+ enabled=profile_dir is not None, profile_dir=profile_dir):
+ init_fn = None
+ if FLAGS.tf_initial_checkpoint:
+ init_fn = train_utils.get_model_init_fn(
+ FLAGS.train_logdir,
+ FLAGS.tf_initial_checkpoint,
+ FLAGS.initialize_last_layer,
+ last_layers,
+ ignore_missing_vars=True)
+
+ slim.learning.train(
+ train_tensor,
+ logdir=FLAGS.train_logdir,
+ log_every_n_steps=FLAGS.log_steps,
+ master=FLAGS.master,
+ number_of_steps=FLAGS.training_number_of_steps,
+ is_chief=(FLAGS.task == 0),
+ session_config=session_config,
+ startup_delay_steps=startup_delay_steps,
+ init_fn=init_fn,
+ summary_op=summary_op,
+ save_summaries_secs=FLAGS.save_summaries_secs,
+ save_interval_secs=FLAGS.save_interval_secs)
+
+
+if __name__ == '__main__':
+ flags.mark_flag_as_required('train_logdir')
+ flags.mark_flag_as_required('dataset_dir')
+ tf.app.run()
diff --git a/deeplab/models/research/deeplab/utils/__init__.py b/deeplab/models/research/deeplab/utils/__init__.py
new file mode 100644
index 0000000..e69de29
diff --git a/deeplab/models/research/deeplab/utils/get_dataset_colormap.py b/deeplab/models/research/deeplab/utils/get_dataset_colormap.py
new file mode 100644
index 0000000..c0502e3
--- /dev/null
+++ b/deeplab/models/research/deeplab/utils/get_dataset_colormap.py
@@ -0,0 +1,416 @@
+# Lint as: python2, python3
+# Copyright 2018 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Visualizes the segmentation results via specified color map.
+
+Visualizes the semantic segmentation results by the color map
+defined by the different datasets. Supported colormaps are:
+
+* ADE20K (http://groups.csail.mit.edu/vision/datasets/ADE20K/).
+
+* Cityscapes dataset (https://www.cityscapes-dataset.com).
+
+* Mapillary Vistas (https://research.mapillary.com).
+
+* PASCAL VOC 2012 (http://host.robots.ox.ac.uk/pascal/VOC/).
+"""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import numpy as np
+from six.moves import range
+
+# Dataset names.
+_ADE20K = 'ade20k'
+_CITYSCAPES = 'cityscapes'
+_MAPILLARY_VISTAS = 'mapillary_vistas'
+_PASCAL = 'pascal'
+
+# Max number of entries in the colormap for each dataset.
+_DATASET_MAX_ENTRIES = {
+ _ADE20K: 151,
+ _CITYSCAPES: 256,
+ _MAPILLARY_VISTAS: 66,
+ _PASCAL: 512,
+}
+
+
+def create_ade20k_label_colormap():
+ """Creates a label colormap used in ADE20K segmentation benchmark.
+
+ Returns:
+ A colormap for visualizing segmentation results.
+ """
+ return np.asarray([
+ [0, 0, 0],
+ [120, 120, 120],
+ [180, 120, 120],
+ [6, 230, 230],
+ [80, 50, 50],
+ [4, 200, 3],
+ [120, 120, 80],
+ [140, 140, 140],
+ [204, 5, 255],
+ [230, 230, 230],
+ [4, 250, 7],
+ [224, 5, 255],
+ [235, 255, 7],
+ [150, 5, 61],
+ [120, 120, 70],
+ [8, 255, 51],
+ [255, 6, 82],
+ [143, 255, 140],
+ [204, 255, 4],
+ [255, 51, 7],
+ [204, 70, 3],
+ [0, 102, 200],
+ [61, 230, 250],
+ [255, 6, 51],
+ [11, 102, 255],
+ [255, 7, 71],
+ [255, 9, 224],
+ [9, 7, 230],
+ [220, 220, 220],
+ [255, 9, 92],
+ [112, 9, 255],
+ [8, 255, 214],
+ [7, 255, 224],
+ [255, 184, 6],
+ [10, 255, 71],
+ [255, 41, 10],
+ [7, 255, 255],
+ [224, 255, 8],
+ [102, 8, 255],
+ [255, 61, 6],
+ [255, 194, 7],
+ [255, 122, 8],
+ [0, 255, 20],
+ [255, 8, 41],
+ [255, 5, 153],
+ [6, 51, 255],
+ [235, 12, 255],
+ [160, 150, 20],
+ [0, 163, 255],
+ [140, 140, 140],
+ [250, 10, 15],
+ [20, 255, 0],
+ [31, 255, 0],
+ [255, 31, 0],
+ [255, 224, 0],
+ [153, 255, 0],
+ [0, 0, 255],
+ [255, 71, 0],
+ [0, 235, 255],
+ [0, 173, 255],
+ [31, 0, 255],
+ [11, 200, 200],
+ [255, 82, 0],
+ [0, 255, 245],
+ [0, 61, 255],
+ [0, 255, 112],
+ [0, 255, 133],
+ [255, 0, 0],
+ [255, 163, 0],
+ [255, 102, 0],
+ [194, 255, 0],
+ [0, 143, 255],
+ [51, 255, 0],
+ [0, 82, 255],
+ [0, 255, 41],
+ [0, 255, 173],
+ [10, 0, 255],
+ [173, 255, 0],
+ [0, 255, 153],
+ [255, 92, 0],
+ [255, 0, 255],
+ [255, 0, 245],
+ [255, 0, 102],
+ [255, 173, 0],
+ [255, 0, 20],
+ [255, 184, 184],
+ [0, 31, 255],
+ [0, 255, 61],
+ [0, 71, 255],
+ [255, 0, 204],
+ [0, 255, 194],
+ [0, 255, 82],
+ [0, 10, 255],
+ [0, 112, 255],
+ [51, 0, 255],
+ [0, 194, 255],
+ [0, 122, 255],
+ [0, 255, 163],
+ [255, 153, 0],
+ [0, 255, 10],
+ [255, 112, 0],
+ [143, 255, 0],
+ [82, 0, 255],
+ [163, 255, 0],
+ [255, 235, 0],
+ [8, 184, 170],
+ [133, 0, 255],
+ [0, 255, 92],
+ [184, 0, 255],
+ [255, 0, 31],
+ [0, 184, 255],
+ [0, 214, 255],
+ [255, 0, 112],
+ [92, 255, 0],
+ [0, 224, 255],
+ [112, 224, 255],
+ [70, 184, 160],
+ [163, 0, 255],
+ [153, 0, 255],
+ [71, 255, 0],
+ [255, 0, 163],
+ [255, 204, 0],
+ [255, 0, 143],
+ [0, 255, 235],
+ [133, 255, 0],
+ [255, 0, 235],
+ [245, 0, 255],
+ [255, 0, 122],
+ [255, 245, 0],
+ [10, 190, 212],
+ [214, 255, 0],
+ [0, 204, 255],
+ [20, 0, 255],
+ [255, 255, 0],
+ [0, 153, 255],
+ [0, 41, 255],
+ [0, 255, 204],
+ [41, 0, 255],
+ [41, 255, 0],
+ [173, 0, 255],
+ [0, 245, 255],
+ [71, 0, 255],
+ [122, 0, 255],
+ [0, 255, 184],
+ [0, 92, 255],
+ [184, 255, 0],
+ [0, 133, 255],
+ [255, 214, 0],
+ [25, 194, 194],
+ [102, 255, 0],
+ [92, 0, 255],
+ ])
+
+
+def create_cityscapes_label_colormap():
+ """Creates a label colormap used in CITYSCAPES segmentation benchmark.
+
+ Returns:
+ A colormap for visualizing segmentation results.
+ """
+ colormap = np.zeros((256, 3), dtype=np.uint8)
+ colormap[0] = [128, 64, 128]
+ colormap[1] = [244, 35, 232]
+ colormap[2] = [70, 70, 70]
+ colormap[3] = [102, 102, 156]
+ colormap[4] = [190, 153, 153]
+ colormap[5] = [153, 153, 153]
+ colormap[6] = [250, 170, 30]
+ colormap[7] = [220, 220, 0]
+ colormap[8] = [107, 142, 35]
+ colormap[9] = [152, 251, 152]
+ colormap[10] = [70, 130, 180]
+ colormap[11] = [220, 20, 60]
+ colormap[12] = [255, 0, 0]
+ colormap[13] = [0, 0, 142]
+ colormap[14] = [0, 0, 70]
+ colormap[15] = [0, 60, 100]
+ colormap[16] = [0, 80, 100]
+ colormap[17] = [0, 0, 230]
+ colormap[18] = [119, 11, 32]
+ return colormap
+
+
+def create_mapillary_vistas_label_colormap():
+ """Creates a label colormap used in Mapillary Vistas segmentation benchmark.
+
+ Returns:
+ A colormap for visualizing segmentation results.
+ """
+ return np.asarray([
+ [165, 42, 42],
+ [0, 192, 0],
+ [196, 196, 196],
+ [190, 153, 153],
+ [180, 165, 180],
+ [102, 102, 156],
+ [102, 102, 156],
+ [128, 64, 255],
+ [140, 140, 200],
+ [170, 170, 170],
+ [250, 170, 160],
+ [96, 96, 96],
+ [230, 150, 140],
+ [128, 64, 128],
+ [110, 110, 110],
+ [244, 35, 232],
+ [150, 100, 100],
+ [70, 70, 70],
+ [150, 120, 90],
+ [220, 20, 60],
+ [255, 0, 0],
+ [255, 0, 0],
+ [255, 0, 0],
+ [200, 128, 128],
+ [255, 255, 255],
+ [64, 170, 64],
+ [128, 64, 64],
+ [70, 130, 180],
+ [255, 255, 255],
+ [152, 251, 152],
+ [107, 142, 35],
+ [0, 170, 30],
+ [255, 255, 128],
+ [250, 0, 30],
+ [0, 0, 0],
+ [220, 220, 220],
+ [170, 170, 170],
+ [222, 40, 40],
+ [100, 170, 30],
+ [40, 40, 40],
+ [33, 33, 33],
+ [170, 170, 170],
+ [0, 0, 142],
+ [170, 170, 170],
+ [210, 170, 100],
+ [153, 153, 153],
+ [128, 128, 128],
+ [0, 0, 142],
+ [250, 170, 30],
+ [192, 192, 192],
+ [220, 220, 0],
+ [180, 165, 180],
+ [119, 11, 32],
+ [0, 0, 142],
+ [0, 60, 100],
+ [0, 0, 142],
+ [0, 0, 90],
+ [0, 0, 230],
+ [0, 80, 100],
+ [128, 64, 64],
+ [0, 0, 110],
+ [0, 0, 70],
+ [0, 0, 192],
+ [32, 32, 32],
+ [0, 0, 0],
+ [0, 0, 0],
+ ])
+
+
+def create_pascal_label_colormap():
+ """Creates a label colormap used in PASCAL VOC segmentation benchmark.
+
+ Returns:
+ A colormap for visualizing segmentation results.
+ """
+ colormap = np.zeros((_DATASET_MAX_ENTRIES[_PASCAL], 3), dtype=int)
+ ind = np.arange(_DATASET_MAX_ENTRIES[_PASCAL], dtype=int)
+
+ for shift in reversed(list(range(8))):
+ for channel in range(3):
+ colormap[:, channel] |= bit_get(ind, channel) << shift
+ ind >>= 3
+
+ return colormap
+
+
+def get_ade20k_name():
+ return _ADE20K
+
+
+def get_cityscapes_name():
+ return _CITYSCAPES
+
+
+def get_mapillary_vistas_name():
+ return _MAPILLARY_VISTAS
+
+
+def get_pascal_name():
+ return _PASCAL
+
+
+def bit_get(val, idx):
+ """Gets the bit value.
+
+ Args:
+ val: Input value, int or numpy int array.
+ idx: Which bit of the input val.
+
+ Returns:
+ The "idx"-th bit of input val.
+ """
+ return (val >> idx) & 1
+
+
+def create_label_colormap(dataset=_PASCAL):
+ """Creates a label colormap for the specified dataset.
+
+ Args:
+ dataset: The colormap used in the dataset.
+
+ Returns:
+ A numpy array of the dataset colormap.
+
+ Raises:
+ ValueError: If the dataset is not supported.
+ """
+ if dataset == _ADE20K:
+ return create_ade20k_label_colormap()
+ elif dataset == _CITYSCAPES:
+ return create_cityscapes_label_colormap()
+ elif dataset == _MAPILLARY_VISTAS:
+ return create_mapillary_vistas_label_colormap()
+ elif dataset == _PASCAL:
+ return create_pascal_label_colormap()
+ else:
+ raise ValueError('Unsupported dataset.')
+
+
+def label_to_color_image(label, dataset=_PASCAL):
+ """Adds color defined by the dataset colormap to the label.
+
+ Args:
+ label: A 2D array with integer type, storing the segmentation label.
+ dataset: The colormap used in the dataset.
+
+ Returns:
+ result: A 2D array with floating type. The element of the array
+ is the color indexed by the corresponding element in the input label
+ to the dataset color map.
+
+ Raises:
+ ValueError: If label is not of rank 2 or its value is larger than color
+ map maximum entry.
+ """
+ if label.ndim != 2:
+ raise ValueError('Expect 2-D input label. Got {}'.format(label.shape))
+
+ if np.max(label) >= _DATASET_MAX_ENTRIES[dataset]:
+ raise ValueError(
+ 'label value too large: {} >= {}.'.format(
+ np.max(label), _DATASET_MAX_ENTRIES[dataset]))
+
+ colormap = create_label_colormap(dataset)
+ return colormap[label]
+
+
+def get_dataset_colormap_max_entries(dataset):
+ return _DATASET_MAX_ENTRIES[dataset]
diff --git a/deeplab/models/research/deeplab/utils/get_dataset_colormap_test.py b/deeplab/models/research/deeplab/utils/get_dataset_colormap_test.py
new file mode 100644
index 0000000..89adb2c
--- /dev/null
+++ b/deeplab/models/research/deeplab/utils/get_dataset_colormap_test.py
@@ -0,0 +1,97 @@
+# Lint as: python2, python3
+# Copyright 2018 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+
+"""Tests for get_dataset_colormap.py."""
+
+import numpy as np
+import tensorflow as tf
+
+from deeplab.utils import get_dataset_colormap
+
+
+class VisualizationUtilTest(tf.test.TestCase):
+
+ def testBitGet(self):
+ """Test that if the returned bit value is correct."""
+ self.assertEqual(1, get_dataset_colormap.bit_get(9, 0))
+ self.assertEqual(0, get_dataset_colormap.bit_get(9, 1))
+ self.assertEqual(0, get_dataset_colormap.bit_get(9, 2))
+ self.assertEqual(1, get_dataset_colormap.bit_get(9, 3))
+
+ def testPASCALLabelColorMapValue(self):
+ """Test the getd color map value."""
+ colormap = get_dataset_colormap.create_pascal_label_colormap()
+
+ # Only test a few sampled entries in the color map.
+ self.assertTrue(np.array_equal([128., 0., 128.], colormap[5, :]))
+ self.assertTrue(np.array_equal([128., 192., 128.], colormap[23, :]))
+ self.assertTrue(np.array_equal([128., 0., 192.], colormap[37, :]))
+ self.assertTrue(np.array_equal([224., 192., 192.], colormap[127, :]))
+ self.assertTrue(np.array_equal([192., 160., 192.], colormap[175, :]))
+
+ def testLabelToPASCALColorImage(self):
+ """Test the value of the converted label value."""
+ label = np.array([[0, 16, 16], [52, 7, 52]])
+ expected_result = np.array([
+ [[0, 0, 0], [0, 64, 0], [0, 64, 0]],
+ [[0, 64, 192], [128, 128, 128], [0, 64, 192]]
+ ])
+ colored_label = get_dataset_colormap.label_to_color_image(
+ label, get_dataset_colormap.get_pascal_name())
+ self.assertTrue(np.array_equal(expected_result, colored_label))
+
+ def testUnExpectedLabelValueForLabelToPASCALColorImage(self):
+ """Raise ValueError when input value exceeds range."""
+ label = np.array([[120], [600]])
+ with self.assertRaises(ValueError):
+ get_dataset_colormap.label_to_color_image(
+ label, get_dataset_colormap.get_pascal_name())
+
+ def testUnExpectedLabelDimensionForLabelToPASCALColorImage(self):
+ """Raise ValueError if input dimension is not correct."""
+ label = np.array([120])
+ with self.assertRaises(ValueError):
+ get_dataset_colormap.label_to_color_image(
+ label, get_dataset_colormap.get_pascal_name())
+
+ def testGetColormapForUnsupportedDataset(self):
+ with self.assertRaises(ValueError):
+ get_dataset_colormap.create_label_colormap('unsupported_dataset')
+
+ def testUnExpectedLabelDimensionForLabelToADE20KColorImage(self):
+ label = np.array([250])
+ with self.assertRaises(ValueError):
+ get_dataset_colormap.label_to_color_image(
+ label, get_dataset_colormap.get_ade20k_name())
+
+ def testFirstColorInADE20KColorMap(self):
+ label = np.array([[1, 3], [10, 20]])
+ expected_result = np.array([
+ [[120, 120, 120], [6, 230, 230]],
+ [[4, 250, 7], [204, 70, 3]]
+ ])
+ colored_label = get_dataset_colormap.label_to_color_image(
+ label, get_dataset_colormap.get_ade20k_name())
+ self.assertTrue(np.array_equal(colored_label, expected_result))
+
+ def testMapillaryVistasColorMapValue(self):
+ colormap = get_dataset_colormap.create_mapillary_vistas_label_colormap()
+ self.assertTrue(np.array_equal([190, 153, 153], colormap[3, :]))
+ self.assertTrue(np.array_equal([102, 102, 156], colormap[6, :]))
+
+
+if __name__ == '__main__':
+ tf.test.main()
diff --git a/deeplab/models/research/deeplab/utils/save_annotation.py b/deeplab/models/research/deeplab/utils/save_annotation.py
new file mode 100644
index 0000000..2444df7
--- /dev/null
+++ b/deeplab/models/research/deeplab/utils/save_annotation.py
@@ -0,0 +1,66 @@
+# Lint as: python2, python3
+# Copyright 2018 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Saves an annotation as one png image.
+
+This script saves an annotation as one png image, and has the option to add
+colormap to the png image for better visualization.
+"""
+
+import numpy as np
+import PIL.Image as img
+import tensorflow as tf
+
+from deeplab.utils import get_dataset_colormap
+
+
+def save_annotation(label,
+ save_dir,
+ filename,
+ add_colormap=True,
+ normalize_to_unit_values=False,
+ scale_values=False,
+ colormap_type=get_dataset_colormap.get_pascal_name()):
+ """Saves the given label to image on disk.
+
+ Args:
+ label: The numpy array to be saved. The data will be converted
+ to uint8 and saved as png image.
+ save_dir: String, the directory to which the results will be saved.
+ filename: String, the image filename.
+ add_colormap: Boolean, add color map to the label or not.
+ normalize_to_unit_values: Boolean, normalize the input values to [0, 1].
+ scale_values: Boolean, scale the input values to [0, 255] for visualization.
+ colormap_type: String, colormap type for visualization.
+ """
+ # Add colormap for visualizing the prediction.
+ if add_colormap:
+ colored_label = get_dataset_colormap.label_to_color_image(
+ label, colormap_type)
+ else:
+ colored_label = label
+ if normalize_to_unit_values:
+ min_value = np.amin(colored_label)
+ max_value = np.amax(colored_label)
+ range_value = max_value - min_value
+ if range_value != 0:
+ colored_label = (colored_label - min_value) / range_value
+
+ if scale_values:
+ colored_label = 255. * colored_label
+
+ pil_image = img.fromarray(colored_label.astype(dtype=np.uint8))
+ with tf.gfile.Open('%s/%s.png' % (save_dir, filename), mode='w') as f:
+ pil_image.save(f, 'PNG')
diff --git a/deeplab/models/research/deeplab/utils/train_utils.py b/deeplab/models/research/deeplab/utils/train_utils.py
new file mode 100644
index 0000000..14bbd6e
--- /dev/null
+++ b/deeplab/models/research/deeplab/utils/train_utils.py
@@ -0,0 +1,372 @@
+# Lint as: python2, python3
+# Copyright 2018 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Utility functions for training."""
+
+import six
+import tensorflow as tf
+from tensorflow.contrib import framework as contrib_framework
+
+from deeplab.core import preprocess_utils
+from deeplab.core import utils
+
+
+def _div_maybe_zero(total_loss, num_present):
+ """Normalizes the total loss with the number of present pixels."""
+ return tf.to_float(num_present > 0) * tf.math.divide(
+ total_loss,
+ tf.maximum(1e-5, num_present))
+
+
+def add_softmax_cross_entropy_loss_for_each_scale(scales_to_logits,
+ labels,
+ num_classes,
+ ignore_label,
+ loss_weight=1.0,
+ upsample_logits=True,
+ hard_example_mining_step=0,
+ top_k_percent_pixels=1.0,
+ gt_is_matting_map=False,
+ scope=None):
+ """Adds softmax cross entropy loss for logits of each scale.
+
+ Args:
+ scales_to_logits: A map from logits names for different scales to logits.
+ The logits have shape [batch, logits_height, logits_width, num_classes].
+ labels: Groundtruth labels with shape [batch, image_height, image_width, 1].
+ num_classes: Integer, number of target classes.
+ ignore_label: Integer, label to ignore.
+ loss_weight: A float or a list of loss weights. If it is a float, it means
+ all the labels have the same weight. If it is a list of weights, then each
+ element in the list represents the weight for the label of its index, for
+ example, loss_weight = [0.1, 0.5] means the weight for label 0 is 0.1 and
+ the weight for label 1 is 0.5.
+ upsample_logits: Boolean, upsample logits or not.
+ hard_example_mining_step: An integer, the training step in which the hard
+ exampling mining kicks off. Note that we gradually reduce the mining
+ percent to the top_k_percent_pixels. For example, if
+ hard_example_mining_step = 100K and top_k_percent_pixels = 0.25, then
+ mining percent will gradually reduce from 100% to 25% until 100K steps
+ after which we only mine top 25% pixels.
+ top_k_percent_pixels: A float, the value lies in [0.0, 1.0]. When its value
+ < 1.0, only compute the loss for the top k percent pixels (e.g., the top
+ 20% pixels). This is useful for hard pixel mining.
+ gt_is_matting_map: If true, the groundtruth is a matting map of confidence
+ score. If false, the groundtruth is an integer valued class mask.
+ scope: String, the scope for the loss.
+
+ Raises:
+ ValueError: Label or logits is None, or groundtruth is matting map while
+ label is not floating value.
+ """
+ if labels is None:
+ raise ValueError('No label for softmax cross entropy loss.')
+
+ # If input groundtruth is a matting map of confidence, check if the input
+ # labels are floating point values.
+ if gt_is_matting_map and not labels.dtype.is_floating:
+ raise ValueError('Labels must be floats if groundtruth is a matting map.')
+
+ for scale, logits in six.iteritems(scales_to_logits):
+ loss_scope = None
+ if scope:
+ loss_scope = '%s_%s' % (scope, scale)
+
+ if upsample_logits:
+ # Label is not downsampled, and instead we upsample logits.
+ logits = tf.image.resize_bilinear(
+ logits,
+ preprocess_utils.resolve_shape(labels, 4)[1:3],
+ align_corners=True)
+ scaled_labels = labels
+ else:
+ # Label is downsampled to the same size as logits.
+ # When gt_is_matting_map = true, label downsampling with nearest neighbor
+ # method may introduce artifacts. However, to avoid ignore_label from
+ # being interpolated with other labels, we still perform nearest neighbor
+ # interpolation.
+ # TODO(huizhongc): Change to bilinear interpolation by processing padded
+ # and non-padded label separately.
+ if gt_is_matting_map:
+ tf.logging.warning(
+ 'Label downsampling with nearest neighbor may introduce artifacts.')
+
+ scaled_labels = tf.image.resize_nearest_neighbor(
+ labels,
+ preprocess_utils.resolve_shape(logits, 4)[1:3],
+ align_corners=True)
+
+ scaled_labels = tf.reshape(scaled_labels, shape=[-1])
+ weights = utils.get_label_weight_mask(
+ scaled_labels, ignore_label, num_classes, label_weights=loss_weight)
+ # Dimension of keep_mask is equal to the total number of pixels.
+ keep_mask = tf.cast(
+ tf.not_equal(scaled_labels, ignore_label), dtype=tf.float32)
+
+ train_labels = None
+ logits = tf.reshape(logits, shape=[-1, num_classes])
+
+ if gt_is_matting_map:
+ # When the groundtruth is integer label mask, we can assign class
+ # dependent label weights to the loss. When the groundtruth is image
+ # matting confidence, we do not apply class-dependent label weight (i.e.,
+ # label_weight = 1.0).
+ if loss_weight != 1.0:
+ raise ValueError(
+ 'loss_weight must equal to 1 if groundtruth is matting map.')
+
+ # Assign label value 0 to ignore pixels. The exact label value of ignore
+ # pixel does not matter, because those ignore_value pixel losses will be
+ # multiplied to 0 weight.
+ train_labels = scaled_labels * keep_mask
+
+ train_labels = tf.expand_dims(train_labels, 1)
+ train_labels = tf.concat([1 - train_labels, train_labels], axis=1)
+ else:
+ train_labels = tf.one_hot(
+ scaled_labels, num_classes, on_value=1.0, off_value=0.0)
+
+ default_loss_scope = ('softmax_all_pixel_loss'
+ if top_k_percent_pixels == 1.0 else
+ 'softmax_hard_example_mining')
+ with tf.name_scope(loss_scope, default_loss_scope,
+ [logits, train_labels, weights]):
+ # Compute the loss for all pixels.
+ pixel_losses = tf.nn.softmax_cross_entropy_with_logits_v2(
+ labels=tf.stop_gradient(
+ train_labels, name='train_labels_stop_gradient'),
+ logits=logits,
+ name='pixel_losses')
+ weighted_pixel_losses = tf.multiply(pixel_losses, weights)
+
+ if top_k_percent_pixels == 1.0:
+ total_loss = tf.reduce_sum(weighted_pixel_losses)
+ num_present = tf.reduce_sum(keep_mask)
+ loss = _div_maybe_zero(total_loss, num_present)
+ tf.losses.add_loss(loss)
+ else:
+ num_pixels = tf.to_float(tf.shape(logits)[0])
+ # Compute the top_k_percent pixels based on current training step.
+ if hard_example_mining_step == 0:
+ # Directly focus on the top_k pixels.
+ top_k_pixels = tf.to_int32(top_k_percent_pixels * num_pixels)
+ else:
+ # Gradually reduce the mining percent to top_k_percent_pixels.
+ global_step = tf.to_float(tf.train.get_or_create_global_step())
+ ratio = tf.minimum(1.0, global_step / hard_example_mining_step)
+ top_k_pixels = tf.to_int32(
+ (ratio * top_k_percent_pixels + (1.0 - ratio)) * num_pixels)
+ top_k_losses, _ = tf.nn.top_k(weighted_pixel_losses,
+ k=top_k_pixels,
+ sorted=True,
+ name='top_k_percent_pixels')
+ total_loss = tf.reduce_sum(top_k_losses)
+ num_present = tf.reduce_sum(
+ tf.to_float(tf.not_equal(top_k_losses, 0.0)))
+ loss = _div_maybe_zero(total_loss, num_present)
+ tf.losses.add_loss(loss)
+
+
+def get_model_init_fn(train_logdir,
+ tf_initial_checkpoint,
+ initialize_last_layer,
+ last_layers,
+ ignore_missing_vars=False):
+ """Gets the function initializing model variables from a checkpoint.
+
+ Args:
+ train_logdir: Log directory for training.
+ tf_initial_checkpoint: TensorFlow checkpoint for initialization.
+ initialize_last_layer: Initialize last layer or not.
+ last_layers: Last layers of the model.
+ ignore_missing_vars: Ignore missing variables in the checkpoint.
+
+ Returns:
+ Initialization function.
+ """
+ if tf_initial_checkpoint is None:
+ tf.logging.info('Not initializing the model from a checkpoint.')
+ return None
+
+ if tf.train.latest_checkpoint(train_logdir):
+ tf.logging.info('Ignoring initialization; other checkpoint exists')
+ return None
+
+ tf.logging.info('Initializing model from path: %s', tf_initial_checkpoint)
+
+ # Variables that will not be restored.
+ exclude_list = ['global_step']
+ if not initialize_last_layer:
+ exclude_list.extend(last_layers)
+
+ variables_to_restore = contrib_framework.get_variables_to_restore(
+ exclude=exclude_list)
+
+ if variables_to_restore:
+ init_op, init_feed_dict = contrib_framework.assign_from_checkpoint(
+ tf_initial_checkpoint,
+ variables_to_restore,
+ ignore_missing_vars=ignore_missing_vars)
+ global_step = tf.train.get_or_create_global_step()
+
+ def restore_fn(sess):
+ sess.run(init_op, init_feed_dict)
+ sess.run([global_step])
+
+ return restore_fn
+
+ return None
+
+
+def get_model_gradient_multipliers(last_layers, last_layer_gradient_multiplier):
+ """Gets the gradient multipliers.
+
+ The gradient multipliers will adjust the learning rates for model
+ variables. For the task of semantic segmentation, the models are
+ usually fine-tuned from the models trained on the task of image
+ classification. To fine-tune the models, we usually set larger (e.g.,
+ 10 times larger) learning rate for the parameters of last layer.
+
+ Args:
+ last_layers: Scopes of last layers.
+ last_layer_gradient_multiplier: The gradient multiplier for last layers.
+
+ Returns:
+ The gradient multiplier map with variables as key, and multipliers as value.
+ """
+ gradient_multipliers = {}
+
+ for var in tf.model_variables():
+ # Double the learning rate for biases.
+ if 'biases' in var.op.name:
+ gradient_multipliers[var.op.name] = 2.
+
+ # Use larger learning rate for last layer variables.
+ for layer in last_layers:
+ if layer in var.op.name and 'biases' in var.op.name:
+ gradient_multipliers[var.op.name] = 2 * last_layer_gradient_multiplier
+ break
+ elif layer in var.op.name:
+ gradient_multipliers[var.op.name] = last_layer_gradient_multiplier
+ break
+
+ return gradient_multipliers
+
+
+def get_model_learning_rate(learning_policy,
+ base_learning_rate,
+ learning_rate_decay_step,
+ learning_rate_decay_factor,
+ training_number_of_steps,
+ learning_power,
+ slow_start_step,
+ slow_start_learning_rate,
+ slow_start_burnin_type='none',
+ decay_steps=0.0,
+ end_learning_rate=0.0,
+ boundaries=None,
+ boundary_learning_rates=None):
+ """Gets model's learning rate.
+
+ Computes the model's learning rate for different learning policy.
+ Right now, only "step" and "poly" are supported.
+ (1) The learning policy for "step" is computed as follows:
+ current_learning_rate = base_learning_rate *
+ learning_rate_decay_factor ^ (global_step / learning_rate_decay_step)
+ See tf.train.exponential_decay for details.
+ (2) The learning policy for "poly" is computed as follows:
+ current_learning_rate = base_learning_rate *
+ (1 - global_step / training_number_of_steps) ^ learning_power
+
+ Args:
+ learning_policy: Learning rate policy for training.
+ base_learning_rate: The base learning rate for model training.
+ learning_rate_decay_step: Decay the base learning rate at a fixed step.
+ learning_rate_decay_factor: The rate to decay the base learning rate.
+ training_number_of_steps: Number of steps for training.
+ learning_power: Power used for 'poly' learning policy.
+ slow_start_step: Training model with small learning rate for the first
+ few steps.
+ slow_start_learning_rate: The learning rate employed during slow start.
+ slow_start_burnin_type: The burnin type for the slow start stage. Can be
+ `none` which means no burnin or `linear` which means the learning rate
+ increases linearly from slow_start_learning_rate and reaches
+ base_learning_rate after slow_start_steps.
+ decay_steps: Float, `decay_steps` for polynomial learning rate.
+ end_learning_rate: Float, `end_learning_rate` for polynomial learning rate.
+ boundaries: A list of `Tensor`s or `int`s or `float`s with strictly
+ increasing entries.
+ boundary_learning_rates: A list of `Tensor`s or `float`s or `int`s that
+ specifies the values for the intervals defined by `boundaries`. It should
+ have one more element than `boundaries`, and all elements should have the
+ same type.
+
+ Returns:
+ Learning rate for the specified learning policy.
+
+ Raises:
+ ValueError: If learning policy or slow start burnin type is not recognized.
+ ValueError: If `boundaries` and `boundary_learning_rates` are not set for
+ multi_steps learning rate decay.
+ """
+ global_step = tf.train.get_or_create_global_step()
+ adjusted_global_step = tf.maximum(global_step - slow_start_step, 0)
+ if decay_steps == 0.0:
+ tf.logging.info('Setting decay_steps to total training steps.')
+ decay_steps = training_number_of_steps - slow_start_step
+ if learning_policy == 'step':
+ learning_rate = tf.train.exponential_decay(
+ base_learning_rate,
+ adjusted_global_step,
+ learning_rate_decay_step,
+ learning_rate_decay_factor,
+ staircase=True)
+ elif learning_policy == 'poly':
+ learning_rate = tf.train.polynomial_decay(
+ base_learning_rate,
+ adjusted_global_step,
+ decay_steps=decay_steps,
+ end_learning_rate=end_learning_rate,
+ power=learning_power)
+ elif learning_policy == 'cosine':
+ learning_rate = tf.train.cosine_decay(
+ base_learning_rate,
+ adjusted_global_step,
+ training_number_of_steps - slow_start_step)
+ elif learning_policy == 'multi_steps':
+ if boundaries is None or boundary_learning_rates is None:
+ raise ValueError('Must set `boundaries` and `boundary_learning_rates` '
+ 'for multi_steps learning rate decay.')
+ learning_rate = tf.train.piecewise_constant_decay(
+ adjusted_global_step,
+ boundaries,
+ boundary_learning_rates)
+ else:
+ raise ValueError('Unknown learning policy.')
+
+ adjusted_slow_start_learning_rate = slow_start_learning_rate
+ if slow_start_burnin_type == 'linear':
+ # Do linear burnin. Increase linearly from slow_start_learning_rate and
+ # reach base_learning_rate after (global_step >= slow_start_steps).
+ adjusted_slow_start_learning_rate = (
+ slow_start_learning_rate +
+ (base_learning_rate - slow_start_learning_rate) *
+ tf.to_float(global_step) / slow_start_step)
+ elif slow_start_burnin_type != 'none':
+ raise ValueError('Unknown burnin type.')
+
+ # Employ small learning rate at the first few steps for warm start.
+ return tf.where(global_step < slow_start_step,
+ adjusted_slow_start_learning_rate, learning_rate)
diff --git a/deeplab/models/research/deeplab/vis.py b/deeplab/models/research/deeplab/vis.py
new file mode 100644
index 0000000..20808d3
--- /dev/null
+++ b/deeplab/models/research/deeplab/vis.py
@@ -0,0 +1,327 @@
+# Lint as: python2, python3
+# Copyright 2018 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+
+"""Segmentation results visualization on a given set of images.
+
+See model.py for more details and usage.
+"""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import os.path
+import time
+import numpy as np
+from six.moves import range
+import tensorflow as tf
+from tensorflow.contrib import quantize as contrib_quantize
+from tensorflow.contrib import training as contrib_training
+from deeplab import common
+from deeplab import model
+from deeplab.datasets import data_generator
+from deeplab.utils import save_annotation
+
+flags = tf.app.flags
+
+FLAGS = flags.FLAGS
+
+flags.DEFINE_string('master', '', 'BNS name of the tensorflow server')
+
+# Settings for log directories.
+
+flags.DEFINE_string('vis_logdir', None, 'Where to write the event logs.')
+
+flags.DEFINE_string('checkpoint_dir', None, 'Directory of model checkpoints.')
+
+# Settings for visualizing the model.
+
+flags.DEFINE_integer('vis_batch_size', 1,
+ 'The number of images in each batch during evaluation.')
+
+flags.DEFINE_list('vis_crop_size', '513,513',
+ 'Crop size [height, width] for visualization.')
+
+flags.DEFINE_integer('eval_interval_secs', 60 * 5,
+ 'How often (in seconds) to run evaluation.')
+
+# For `xception_65`, use atrous_rates = [12, 24, 36] if output_stride = 8, or
+# rates = [6, 12, 18] if output_stride = 16. For `mobilenet_v2`, use None. Note
+# one could use different atrous_rates/output_stride during training/evaluation.
+flags.DEFINE_multi_integer('atrous_rates', None,
+ 'Atrous rates for atrous spatial pyramid pooling.')
+
+flags.DEFINE_integer('output_stride', 16,
+ 'The ratio of input to output spatial resolution.')
+
+# Change to [0.5, 0.75, 1.0, 1.25, 1.5, 1.75] for multi-scale test.
+flags.DEFINE_multi_float('eval_scales', [1.0],
+ 'The scales to resize images for evaluation.')
+
+# Change to True for adding flipped images during test.
+flags.DEFINE_bool('add_flipped_images', False,
+ 'Add flipped images for evaluation or not.')
+
+flags.DEFINE_integer(
+ 'quantize_delay_step', -1,
+ 'Steps to start quantized training. If < 0, will not quantize model.')
+
+# Dataset settings.
+
+flags.DEFINE_string('dataset', 'pascal_voc_seg',
+ 'Name of the segmentation dataset.')
+
+flags.DEFINE_string('vis_split', 'val',
+ 'Which split of the dataset used for visualizing results')
+
+flags.DEFINE_string('dataset_dir', None, 'Where the dataset reside.')
+
+flags.DEFINE_enum('colormap_type', 'pascal', ['pascal', 'cityscapes', 'ade20k'],
+ 'Visualization colormap type.')
+
+flags.DEFINE_boolean('also_save_raw_predictions', False,
+ 'Also save raw predictions.')
+
+flags.DEFINE_integer('max_number_of_iterations', 0,
+ 'Maximum number of visualization iterations. Will loop '
+ 'indefinitely upon nonpositive values.')
+
+# The folder where semantic segmentation predictions are saved.
+_SEMANTIC_PREDICTION_SAVE_FOLDER = 'segmentation_results'
+
+# The folder where raw semantic segmentation predictions are saved.
+_RAW_SEMANTIC_PREDICTION_SAVE_FOLDER = 'raw_segmentation_results'
+
+# The format to save image.
+_IMAGE_FORMAT = '%06d_image'
+
+# The format to save prediction
+_PREDICTION_FORMAT = '%06d_prediction'
+
+# To evaluate Cityscapes results on the evaluation server, the labels used
+# during training should be mapped to the labels for evaluation.
+_CITYSCAPES_TRAIN_ID_TO_EVAL_ID = [7, 8, 11, 12, 13, 17, 19, 20, 21, 22,
+ 23, 24, 25, 26, 27, 28, 31, 32, 33]
+
+
+def _convert_train_id_to_eval_id(prediction, train_id_to_eval_id):
+ """Converts the predicted label for evaluation.
+
+ There are cases where the training labels are not equal to the evaluation
+ labels. This function is used to perform the conversion so that we could
+ evaluate the results on the evaluation server.
+
+ Args:
+ prediction: Semantic segmentation prediction.
+ train_id_to_eval_id: A list mapping from train id to evaluation id.
+
+ Returns:
+ Semantic segmentation prediction whose labels have been changed.
+ """
+ converted_prediction = prediction.copy()
+ for train_id, eval_id in enumerate(train_id_to_eval_id):
+ converted_prediction[prediction == train_id] = eval_id
+
+ return converted_prediction
+
+
+def _process_batch(sess, original_images, semantic_predictions, image_names,
+ image_heights, image_widths, image_id_offset, save_dir,
+ raw_save_dir, train_id_to_eval_id=None):
+ """Evaluates one single batch qualitatively.
+
+ Args:
+ sess: TensorFlow session.
+ original_images: One batch of original images.
+ semantic_predictions: One batch of semantic segmentation predictions.
+ image_names: Image names.
+ image_heights: Image heights.
+ image_widths: Image widths.
+ image_id_offset: Image id offset for indexing images.
+ save_dir: The directory where the predictions will be saved.
+ raw_save_dir: The directory where the raw predictions will be saved.
+ train_id_to_eval_id: A list mapping from train id to eval id.
+ """
+ (original_images,
+ semantic_predictions,
+ image_names,
+ image_heights,
+ image_widths) = sess.run([original_images, semantic_predictions,
+ image_names, image_heights, image_widths])
+
+ num_image = semantic_predictions.shape[0]
+ for i in range(num_image):
+ image_height = np.squeeze(image_heights[i])
+ image_width = np.squeeze(image_widths[i])
+ original_image = np.squeeze(original_images[i])
+ semantic_prediction = np.squeeze(semantic_predictions[i])
+ crop_semantic_prediction = semantic_prediction[:image_height, :image_width]
+
+ # Save image.
+ save_annotation.save_annotation(
+ original_image, save_dir, _IMAGE_FORMAT % (image_id_offset + i),
+ add_colormap=False)
+
+ # Save prediction.
+ save_annotation.save_annotation(
+ crop_semantic_prediction, save_dir,
+ _PREDICTION_FORMAT % (image_id_offset + i), add_colormap=True,
+ colormap_type=FLAGS.colormap_type)
+
+ if FLAGS.also_save_raw_predictions:
+ image_filename = os.path.basename(image_names[i])
+
+ if train_id_to_eval_id is not None:
+ crop_semantic_prediction = _convert_train_id_to_eval_id(
+ crop_semantic_prediction,
+ train_id_to_eval_id)
+ save_annotation.save_annotation(
+ crop_semantic_prediction, raw_save_dir, image_filename,
+ add_colormap=False)
+
+
+def main(unused_argv):
+ tf.logging.set_verbosity(tf.logging.INFO)
+
+ # Get dataset-dependent information.
+ dataset = data_generator.Dataset(
+ dataset_name=FLAGS.dataset,
+ split_name=FLAGS.vis_split,
+ dataset_dir=FLAGS.dataset_dir,
+ batch_size=FLAGS.vis_batch_size,
+ crop_size=[int(sz) for sz in FLAGS.vis_crop_size],
+ min_resize_value=FLAGS.min_resize_value,
+ max_resize_value=FLAGS.max_resize_value,
+ resize_factor=FLAGS.resize_factor,
+ model_variant=FLAGS.model_variant,
+ is_training=False,
+ should_shuffle=False,
+ should_repeat=False)
+
+ train_id_to_eval_id = None
+ if dataset.dataset_name == data_generator.get_cityscapes_dataset_name():
+ tf.logging.info('Cityscapes requires converting train_id to eval_id.')
+ train_id_to_eval_id = _CITYSCAPES_TRAIN_ID_TO_EVAL_ID
+
+ # Prepare for visualization.
+ tf.gfile.MakeDirs(FLAGS.vis_logdir)
+ save_dir = os.path.join(FLAGS.vis_logdir, _SEMANTIC_PREDICTION_SAVE_FOLDER)
+ tf.gfile.MakeDirs(save_dir)
+ raw_save_dir = os.path.join(
+ FLAGS.vis_logdir, _RAW_SEMANTIC_PREDICTION_SAVE_FOLDER)
+ tf.gfile.MakeDirs(raw_save_dir)
+
+ tf.logging.info('Visualizing on %s set', FLAGS.vis_split)
+
+ with tf.Graph().as_default():
+ samples = dataset.get_one_shot_iterator().get_next()
+
+ model_options = common.ModelOptions(
+ outputs_to_num_classes={common.OUTPUT_TYPE: dataset.num_of_classes},
+ crop_size=[int(sz) for sz in FLAGS.vis_crop_size],
+ atrous_rates=FLAGS.atrous_rates,
+ output_stride=FLAGS.output_stride)
+
+ if tuple(FLAGS.eval_scales) == (1.0,):
+ tf.logging.info('Performing single-scale test.')
+ predictions = model.predict_labels(
+ samples[common.IMAGE],
+ model_options=model_options,
+ image_pyramid=FLAGS.image_pyramid)
+ else:
+ tf.logging.info('Performing multi-scale test.')
+ if FLAGS.quantize_delay_step >= 0:
+ raise ValueError(
+ 'Quantize mode is not supported with multi-scale test.')
+ predictions = model.predict_labels_multi_scale(
+ samples[common.IMAGE],
+ model_options=model_options,
+ eval_scales=FLAGS.eval_scales,
+ add_flipped_images=FLAGS.add_flipped_images)
+ predictions = predictions[common.OUTPUT_TYPE]
+
+ if FLAGS.min_resize_value and FLAGS.max_resize_value:
+ # Only support batch_size = 1, since we assume the dimensions of original
+ # image after tf.squeeze is [height, width, 3].
+ assert FLAGS.vis_batch_size == 1
+
+ # Reverse the resizing and padding operations performed in preprocessing.
+ # First, we slice the valid regions (i.e., remove padded region) and then
+ # we resize the predictions back.
+ original_image = tf.squeeze(samples[common.ORIGINAL_IMAGE])
+ original_image_shape = tf.shape(original_image)
+ predictions = tf.slice(
+ predictions,
+ [0, 0, 0],
+ [1, original_image_shape[0], original_image_shape[1]])
+ resized_shape = tf.to_int32([tf.squeeze(samples[common.HEIGHT]),
+ tf.squeeze(samples[common.WIDTH])])
+ predictions = tf.squeeze(
+ tf.image.resize_images(tf.expand_dims(predictions, 3),
+ resized_shape,
+ method=tf.image.ResizeMethod.NEAREST_NEIGHBOR,
+ align_corners=True), 3)
+
+ tf.train.get_or_create_global_step()
+ if FLAGS.quantize_delay_step >= 0:
+ contrib_quantize.create_eval_graph()
+
+ num_iteration = 0
+ max_num_iteration = FLAGS.max_number_of_iterations
+
+ checkpoints_iterator = contrib_training.checkpoints_iterator(
+ FLAGS.checkpoint_dir, min_interval_secs=FLAGS.eval_interval_secs)
+ for checkpoint_path in checkpoints_iterator:
+ num_iteration += 1
+ tf.logging.info(
+ 'Starting visualization at ' + time.strftime('%Y-%m-%d-%H:%M:%S',
+ time.gmtime()))
+ tf.logging.info('Visualizing with model %s', checkpoint_path)
+
+ scaffold = tf.train.Scaffold(init_op=tf.global_variables_initializer())
+ session_creator = tf.train.ChiefSessionCreator(
+ scaffold=scaffold,
+ master=FLAGS.master,
+ checkpoint_filename_with_path=checkpoint_path)
+ with tf.train.MonitoredSession(
+ session_creator=session_creator, hooks=None) as sess:
+ batch = 0
+ image_id_offset = 0
+
+ while not sess.should_stop():
+ tf.logging.info('Visualizing batch %d', batch + 1)
+ _process_batch(sess=sess,
+ original_images=samples[common.ORIGINAL_IMAGE],
+ semantic_predictions=predictions,
+ image_names=samples[common.IMAGE_NAME],
+ image_heights=samples[common.HEIGHT],
+ image_widths=samples[common.WIDTH],
+ image_id_offset=image_id_offset,
+ save_dir=save_dir,
+ raw_save_dir=raw_save_dir,
+ train_id_to_eval_id=train_id_to_eval_id)
+ image_id_offset += FLAGS.vis_batch_size
+ batch += 1
+
+ tf.logging.info(
+ 'Finished visualization at ' + time.strftime('%Y-%m-%d-%H:%M:%S',
+ time.gmtime()))
+ if max_num_iteration > 0 and num_iteration >= max_num_iteration:
+ break
+
+if __name__ == '__main__':
+ flags.mark_flag_as_required('checkpoint_dir')
+ flags.mark_flag_as_required('vis_logdir')
+ flags.mark_flag_as_required('dataset_dir')
+ tf.app.run()
diff --git a/shapes/Circle.mtl b/shapes/Circle.mtl
new file mode 100644
index 0000000..3b4b004
--- /dev/null
+++ b/shapes/Circle.mtl
@@ -0,0 +1,12 @@
+# Blender MTL File: 'None'
+# Material Count: 1
+
+newmtl SVGMat.007
+Ns 323.999994
+Ka 1.000000 1.000000 1.000000
+Kd 1.000000 1.000000 1.000000
+Ks 0.500000 0.500000 0.500000
+Ke 0.000000 0.000000 0.000000
+Ni 1.000000
+d 1.000000
+illum 2
diff --git a/shapes/Circle.obj b/shapes/Circle.obj
new file mode 100644
index 0000000..47d0969
--- /dev/null
+++ b/shapes/Circle.obj
@@ -0,0 +1,125 @@
+# Blender v2.81 (sub 16) OBJ File: ''
+# www.blender.org
+mtllib Circle.mtl
+o Circle
+v -0.001828 0.000000 -0.013312
+v 0.000883 0.000000 -0.013405
+v -0.000004 0.000000 -0.013434
+v 0.001762 0.000000 -0.013318
+v 0.002630 0.000000 -0.013174
+v -0.003577 0.000000 -0.012954
+v 0.003484 0.000000 -0.012974
+v 0.004321 0.000000 -0.012719
+v -0.005236 0.000000 -0.012378
+v 0.005139 0.000000 -0.012411
+v 0.005936 0.000000 -0.012051
+v -0.006788 0.000000 -0.011600
+v 0.006708 0.000000 -0.011639
+v 0.007453 0.000000 -0.011177
+v -0.008218 0.000000 -0.010634
+v 0.008168 0.000000 -0.010665
+v 0.008852 0.000000 -0.010105
+v -0.009509 0.000000 -0.009498
+v 0.009500 0.000000 -0.009498
+v 0.010107 0.000000 -0.008850
+v -0.010645 0.000000 -0.008208
+v 0.010667 0.000000 -0.008167
+v -0.011610 0.000000 -0.006779
+v 0.011179 0.000000 -0.007452
+v 0.011641 0.000000 -0.006707
+v -0.012389 0.000000 -0.005227
+v 0.012053 0.000000 -0.005935
+v 0.012414 0.000000 -0.005139
+v -0.012965 0.000000 -0.003568
+v 0.012722 0.000000 -0.004321
+v 0.012976 0.000000 -0.003484
+v -0.013323 0.000000 -0.001820
+v 0.013176 0.000000 -0.002630
+v 0.013320 0.000000 -0.001762
+v -0.013445 -0.000000 0.000004
+v 0.013408 0.000000 -0.000884
+v 0.013437 -0.000000 0.000004
+v -0.013323 -0.000000 0.001827
+v 0.013314 -0.000000 0.001827
+v -0.012965 -0.000000 0.003576
+v 0.012957 -0.000000 0.003576
+v -0.012389 -0.000000 0.005235
+v 0.012381 -0.000000 0.005235
+v -0.011610 -0.000000 0.006786
+v 0.011602 -0.000000 0.006786
+v -0.010645 -0.000000 0.008216
+v 0.010636 -0.000000 0.008216
+v -0.009509 -0.000000 0.009506
+v 0.009500 -0.000000 0.009506
+v -0.008218 -0.000000 0.010642
+v 0.008209 -0.000000 0.010642
+v -0.006788 -0.000000 0.011607
+v 0.006780 -0.000000 0.011607
+v -0.005236 -0.000000 0.012386
+v 0.005228 -0.000000 0.012386
+v -0.003577 -0.000000 0.012962
+v 0.003569 -0.000000 0.012962
+v -0.001828 -0.000000 0.013320
+v 0.001820 -0.000000 0.013320
+v -0.000004 -0.000000 0.013442
+vn -0.0000 1.0000 0.0000
+usemtl SVGMat.007
+s 1
+f 1//1 2//1 3//1
+f 1//1 4//1 2//1
+f 1//1 5//1 4//1
+f 6//1 5//1 1//1
+f 6//1 7//1 5//1
+f 6//1 8//1 7//1
+f 9//1 8//1 6//1
+f 9//1 10//1 8//1
+f 9//1 11//1 10//1
+f 12//1 11//1 9//1
+f 12//1 13//1 11//1
+f 12//1 14//1 13//1
+f 15//1 14//1 12//1
+f 15//1 16//1 14//1
+f 15//1 17//1 16//1
+f 18//1 17//1 15//1
+f 18//1 19//1 17//1
+f 18//1 20//1 19//1
+f 21//1 20//1 18//1
+f 21//1 22//1 20//1
+f 23//1 22//1 21//1
+f 23//1 24//1 22//1
+f 23//1 25//1 24//1
+f 26//1 25//1 23//1
+f 26//1 27//1 25//1
+f 26//1 28//1 27//1
+f 29//1 28//1 26//1
+f 29//1 30//1 28//1
+f 29//1 31//1 30//1
+f 32//1 31//1 29//1
+f 32//1 33//1 31//1
+f 32//1 34//1 33//1
+f 35//1 34//1 32//1
+f 35//1 36//1 34//1
+f 35//1 37//1 36//1
+f 38//1 37//1 35//1
+f 38//1 39//1 37//1
+f 40//1 39//1 38//1
+f 40//1 41//1 39//1
+f 42//1 41//1 40//1
+f 42//1 43//1 41//1
+f 44//1 43//1 42//1
+f 44//1 45//1 43//1
+f 46//1 45//1 44//1
+f 46//1 47//1 45//1
+f 48//1 47//1 46//1
+f 48//1 49//1 47//1
+f 50//1 49//1 48//1
+f 50//1 51//1 49//1
+f 52//1 51//1 50//1
+f 52//1 53//1 51//1
+f 54//1 53//1 52//1
+f 54//1 55//1 53//1
+f 56//1 55//1 54//1
+f 56//1 57//1 55//1
+f 58//1 57//1 56//1
+f 58//1 59//1 57//1
+f 60//1 59//1 58//1
diff --git a/shapes/Half_Circle.mtl b/shapes/Half_Circle.mtl
new file mode 100644
index 0000000..ca15ca0
--- /dev/null
+++ b/shapes/Half_Circle.mtl
@@ -0,0 +1,12 @@
+# Blender MTL File: 'None'
+# Material Count: 1
+
+newmtl SVGMat.001
+Ns 323.999994
+Ka 1.000000 1.000000 1.000000
+Kd 1.000000 1.000000 1.000000
+Ks 0.500000 0.500000 0.500000
+Ke 0.000000 0.000000 0.000000
+Ni 1.000000
+d 1.000000
+illum 2
diff --git a/shapes/Half_Circle.obj b/shapes/Half_Circle.obj
new file mode 100644
index 0000000..3234aca
--- /dev/null
+++ b/shapes/Half_Circle.obj
@@ -0,0 +1,80 @@
+# Blender v2.81 (sub 16) OBJ File: ''
+# www.blender.org
+mtllib Half_Circle.mtl
+o Half_Circle
+v -0.001822 0.000000 -0.006587
+v 0.001822 0.000000 -0.006587
+v 0.000000 0.000000 -0.006710
+v -0.003570 0.000000 -0.006231
+v 0.003570 0.000000 -0.006231
+v -0.005227 0.000000 -0.005656
+v 0.005227 0.000000 -0.005656
+v -0.006778 0.000000 -0.004879
+v 0.006778 0.000000 -0.004879
+v -0.008207 0.000000 -0.003914
+v 0.008207 0.000000 -0.003914
+v -0.009497 0.000000 -0.002781
+v 0.009498 0.000000 -0.002781
+v -0.010634 0.000000 -0.001492
+v 0.010634 0.000000 -0.001492
+v -0.011600 0.000000 -0.000065
+v 0.011600 0.000000 -0.000065
+v -0.012379 -0.000000 0.001484
+v 0.012379 -0.000000 0.001485
+v -0.012957 -0.000000 0.003141
+v 0.012957 -0.000000 0.003141
+v -0.013316 -0.000000 0.004889
+v 0.013316 -0.000000 0.004889
+v -0.013441 -0.000000 0.006710
+v 0.013441 -0.000000 0.006710
+v -0.012912 -0.000000 0.006710
+v -0.011450 -0.000000 0.006710
+v -0.009241 -0.000000 0.006710
+v -0.006472 -0.000000 0.006710
+v -0.003329 -0.000000 0.006710
+v 0.000000 -0.000000 0.006710
+v 0.003329 -0.000000 0.006710
+v 0.006472 -0.000000 0.006710
+v 0.009241 -0.000000 0.006710
+v 0.011450 -0.000000 0.006710
+v 0.012912 -0.000000 0.006710
+vn 0.0000 1.0000 0.0000
+vn 0.0000 1.0000 -0.0001
+vn 0.0000 0.0000 -1.0000
+vn 0.0000 0.0000 1.0000
+usemtl SVGMat.001
+s 1
+f 1//1 2//1 3//1
+f 4//1 2//1 1//1
+f 4//1 5//1 2//1
+f 6//1 5//1 4//1
+f 6//1 7//1 5//1
+f 8//1 7//1 6//1
+f 8//1 9//1 7//1
+f 10//1 9//1 8//1
+f 10//1 11//1 9//1
+f 12//1 11//1 10//1
+f 12//1 13//1 11//1
+f 14//1 13//1 12//1
+f 14//1 15//1 13//1
+f 16//1 15//1 14//1
+f 16//1 17//1 15//1
+f 18//1 17//1 16//1
+f 18//1 19//1 17//1
+f 20//1 19//1 18//1
+f 20//1 21//1 19//1
+f 22//1 21//1 20//1
+f 22//1 23//1 21//1
+f 24//1 23//1 22//1
+f 24//1 25//2 23//1
+f 26//1 25//2 24//1
+f 27//3 25//4 26//3
+f 28//4 25//4 27//3
+f 29//3 25//4 28//4
+f 30//4 25//4 29//3
+f 31//4 25//4 30//4
+f 32//4 25//4 31//4
+f 33//4 25//4 32//4
+f 34//4 25//4 33//4
+f 35//4 25//4 34//4
+f 36//4 25//4 35//4
diff --git a/shapes/Heart.mtl b/shapes/Heart.mtl
new file mode 100644
index 0000000..17a6972
--- /dev/null
+++ b/shapes/Heart.mtl
@@ -0,0 +1,12 @@
+# Blender MTL File: 'None'
+# Material Count: 1
+
+newmtl SVGMat.012
+Ns 323.999994
+Ka 1.000000 1.000000 1.000000
+Kd 1.000000 1.000000 1.000000
+Ks 0.500000 0.500000 0.500000
+Ke 0.000000 0.000000 0.000000
+Ni 1.000000
+d 1.000000
+illum 2
diff --git a/shapes/Heart.obj b/shapes/Heart.obj
new file mode 100644
index 0000000..853dce4
--- /dev/null
+++ b/shapes/Heart.obj
@@ -0,0 +1,53 @@
+# Blender v2.81 (sub 16) OBJ File: ''
+# www.blender.org
+mtllib Heart.mtl
+o Heart
+v 0.003850 0.000000 -0.012279
+v 0.008587 0.000000 -0.013150
+v 0.006222 0.000000 -0.013228
+v -0.008587 0.000000 -0.013150
+v -0.003850 0.000000 -0.012279
+v -0.006222 0.000000 -0.013228
+v 0.010720 0.000000 -0.012140
+v -0.010720 0.000000 -0.012140
+v 0.001701 0.000000 -0.010211
+v -0.001701 0.000000 -0.010211
+v 0.012391 0.000000 -0.010289
+v -0.012391 0.000000 -0.010289
+v 0.013374 0.000000 -0.007691
+v -0.013374 0.000000 -0.007691
+v 0.000000 0.000000 -0.006929
+v 0.013441 0.000000 -0.004440
+v -0.013441 0.000000 -0.004440
+v 0.012365 0.000000 -0.000630
+v -0.012365 0.000000 -0.000630
+v -0.009917 -0.000000 0.003647
+v 0.009917 -0.000000 0.003647
+v -0.005872 -0.000000 0.008298
+v 0.005872 -0.000000 0.008298
+v 0.000000 -0.000000 0.013228
+vn 0.0000 1.0000 0.0000
+usemtl SVGMat.012
+s 1
+f 1//1 2//1 3//1
+f 4//1 5//1 6//1
+f 1//1 7//1 2//1
+f 8//1 5//1 4//1
+f 9//1 7//1 1//1
+f 8//1 10//1 5//1
+f 9//1 11//1 7//1
+f 12//1 10//1 8//1
+f 9//1 13//1 11//1
+f 14//1 10//1 12//1
+f 15//1 13//1 9//1
+f 14//1 15//1 10//1
+f 15//1 16//1 13//1
+f 17//1 15//1 14//1
+f 17//1 16//1 15//1
+f 17//1 18//1 16//1
+f 19//1 18//1 17//1
+f 20//1 18//1 19//1
+f 20//1 21//1 18//1
+f 22//1 21//1 20//1
+f 22//1 23//1 21//1
+f 24//1 23//1 22//1
diff --git a/shapes/Plus.mtl b/shapes/Plus.mtl
new file mode 100644
index 0000000..de372bb
--- /dev/null
+++ b/shapes/Plus.mtl
@@ -0,0 +1,12 @@
+# Blender MTL File: 'None'
+# Material Count: 1
+
+newmtl SVGMat.016
+Ns 323.999994
+Ka 1.000000 1.000000 1.000000
+Kd 1.000000 1.000000 1.000000
+Ks 0.500000 0.500000 0.500000
+Ke 0.000000 0.000000 0.000000
+Ni 1.000000
+d 1.000000
+illum 2
diff --git a/shapes/Plus.obj b/shapes/Plus.obj
new file mode 100644
index 0000000..cd74ed2
--- /dev/null
+++ b/shapes/Plus.obj
@@ -0,0 +1,31 @@
+# Blender v2.81 (sub 16) OBJ File: ''
+# www.blender.org
+mtllib Plus.mtl
+o Plus
+v -0.006324 0.000000 -0.006322
+v 0.006324 0.000000 -0.013438
+v -0.006324 0.000000 -0.013438
+v 0.006324 0.000000 -0.006322
+v -0.013441 -0.000000 0.006322
+v -0.013441 0.000000 -0.006322
+v 0.013441 0.000000 -0.006322
+v 0.013441 -0.000000 0.006322
+v -0.006324 -0.000000 0.006322
+v -0.006324 -0.000000 0.013438
+v 0.006324 -0.000000 0.006322
+v 0.006324 -0.000000 0.013438
+vn 0.0000 1.0000 0.0000
+vn 0.0000 0.0000 1.0000
+vn 0.0000 0.0000 -1.0000
+usemtl SVGMat.016
+s 1
+f 1//1 2//1 3//1
+f 1//1 4//1 2//1
+f 5//1 1//1 6//1
+f 5//1 4//1 1//1
+f 5//1 7//1 4//1
+f 5//1 8//1 7//1
+f 9//2 8//2 5//2
+f 10//1 11//1 9//1
+f 11//3 8//3 9//3
+f 10//1 12//1 11//1
diff --git a/shapes/shapes-1.svg b/shapes/shapes-1.svg
new file mode 100644
index 0000000..a71b967
--- /dev/null
+++ b/shapes/shapes-1.svg
@@ -0,0 +1,32 @@
+
+
+
diff --git a/shapes/shapes-10.svg b/shapes/shapes-10.svg
new file mode 100644
index 0000000..cbb67ba
--- /dev/null
+++ b/shapes/shapes-10.svg
@@ -0,0 +1,32 @@
+
+
+
diff --git a/shapes/shapes-11.svg b/shapes/shapes-11.svg
new file mode 100644
index 0000000..d7d48b8
--- /dev/null
+++ b/shapes/shapes-11.svg
@@ -0,0 +1,32 @@
+
+
+
diff --git a/shapes/shapes-12.svg b/shapes/shapes-12.svg
new file mode 100644
index 0000000..5c90a1e
--- /dev/null
+++ b/shapes/shapes-12.svg
@@ -0,0 +1,32 @@
+
+
+
diff --git a/shapes/shapes-13.svg b/shapes/shapes-13.svg
new file mode 100644
index 0000000..184c393
--- /dev/null
+++ b/shapes/shapes-13.svg
@@ -0,0 +1,32 @@
+
+
+
diff --git a/shapes/shapes-14.svg b/shapes/shapes-14.svg
new file mode 100644
index 0000000..139f84c
--- /dev/null
+++ b/shapes/shapes-14.svg
@@ -0,0 +1,32 @@
+
+
+
diff --git a/shapes/shapes-15.svg b/shapes/shapes-15.svg
new file mode 100644
index 0000000..ad7ec40
--- /dev/null
+++ b/shapes/shapes-15.svg
@@ -0,0 +1,32 @@
+
+
+
diff --git a/shapes/shapes-2.svg b/shapes/shapes-2.svg
new file mode 100644
index 0000000..13f3383
--- /dev/null
+++ b/shapes/shapes-2.svg
@@ -0,0 +1,32 @@
+
+
+
diff --git a/shapes/shapes-3.svg b/shapes/shapes-3.svg
new file mode 100644
index 0000000..8bf9f49
--- /dev/null
+++ b/shapes/shapes-3.svg
@@ -0,0 +1,32 @@
+
+
+
diff --git a/shapes/shapes-4.svg b/shapes/shapes-4.svg
new file mode 100644
index 0000000..218208a
--- /dev/null
+++ b/shapes/shapes-4.svg
@@ -0,0 +1,32 @@
+
+
+
diff --git a/shapes/shapes-5.svg b/shapes/shapes-5.svg
new file mode 100644
index 0000000..8877ee2
--- /dev/null
+++ b/shapes/shapes-5.svg
@@ -0,0 +1,32 @@
+
+
+
diff --git a/shapes/shapes-6.svg b/shapes/shapes-6.svg
new file mode 100644
index 0000000..5400483
--- /dev/null
+++ b/shapes/shapes-6.svg
@@ -0,0 +1,32 @@
+
+
+
diff --git a/shapes/shapes-7.svg b/shapes/shapes-7.svg
new file mode 100644
index 0000000..1a0a256
--- /dev/null
+++ b/shapes/shapes-7.svg
@@ -0,0 +1,32 @@
+
+
+
diff --git a/shapes/shapes-8.svg b/shapes/shapes-8.svg
new file mode 100644
index 0000000..882b3a5
--- /dev/null
+++ b/shapes/shapes-8.svg
@@ -0,0 +1,32 @@
+
+
+
diff --git a/shapes/shapes-9.svg b/shapes/shapes-9.svg
new file mode 100644
index 0000000..00df825
--- /dev/null
+++ b/shapes/shapes-9.svg
@@ -0,0 +1,32 @@
+
+
+
diff --git a/shapes/square.mtl b/shapes/square.mtl
new file mode 100644
index 0000000..2a9c57b
--- /dev/null
+++ b/shapes/square.mtl
@@ -0,0 +1,12 @@
+# Blender MTL File: 'None'
+# Material Count: 1
+
+newmtl SVGMat.090
+Ns 323.999994
+Ka 1.000000 1.000000 1.000000
+Kd 1.000000 1.000000 1.000000
+Ks 0.500000 0.500000 0.500000
+Ke 0.000000 0.000000 0.000000
+Ni 1.000000
+d 1.000000
+illum 2
diff --git a/shapes/square.obj b/shapes/square.obj
new file mode 100644
index 0000000..9cf0df6
--- /dev/null
+++ b/shapes/square.obj
@@ -0,0 +1,13 @@
+# Blender v2.81 (sub 16) OBJ File: ''
+# www.blender.org
+mtllib square.mtl
+o Square
+v -0.152400 -0.000000 0.152366
+v 0.152400 0.000000 -0.152366
+v -0.152400 0.000000 -0.152366
+v 0.152400 -0.000000 0.152366
+vn 0.0000 1.0000 0.0000
+usemtl SVGMat.090
+s 1
+f 1//1 2//1 3//1
+f 1//1 4//1 2//1
diff --git a/shapes/triangle.mtl b/shapes/triangle.mtl
new file mode 100644
index 0000000..13962f2
--- /dev/null
+++ b/shapes/triangle.mtl
@@ -0,0 +1,12 @@
+# Blender MTL File: 'None'
+# Material Count: 1
+
+newmtl SVGMat.089
+Ns 323.999994
+Ka 1.000000 1.000000 1.000000
+Kd 1.000000 1.000000 1.000000
+Ks 0.500000 0.500000 0.500000
+Ke 0.000000 0.000000 0.000000
+Ni 1.000000
+d 1.000000
+illum 2
diff --git a/shapes/triangle.obj b/shapes/triangle.obj
new file mode 100644
index 0000000..4adfa27
--- /dev/null
+++ b/shapes/triangle.obj
@@ -0,0 +1,11 @@
+# Blender v2.81 (sub 16) OBJ File: ''
+# www.blender.org
+mtllib triangle.mtl
+o Triangle
+v -0.152400 -0.000000 0.101585
+v 0.152400 -0.000000 0.101585
+v 0.000000 0.000000 -0.203170
+vn 0.0000 1.0000 0.0000
+usemtl SVGMat.089
+s 1
+f 1//1 2//1 3//1