onlineinference · Aug 30, 2020
diff --git a/‎InstColorization.ipynb
+87-130 b/‎InstColorization.ipynb
+87-130
diff --git a/‎README.md
+3 b/‎README.md
+3
diff --git a/‎README_TRAIN.md
+112 b/‎README_TRAIN.md
+112
diff --git a/‎download.py
+24-3 b/‎download.py
+24-3
diff --git a/‎fusion_dataset.py
+122 b/‎fusion_dataset.py
+122
diff --git a/‎image_util.py
+13 b/‎image_util.py
+13
diff --git a/‎inference_bbox.py
+7-1 b/‎inference_bbox.py
+7-1
diff --git a/‎models/base_model.py
-1 b/‎models/base_model.py
-1
diff --git a/‎models/networks.py
+40 b/‎models/networks.py
+40
diff --git a/‎models/train_model.py
+182 b/‎models/train_model.py
+182
diff --git a/‎options/train_options.py
+4 b/‎options/train_options.py
+4
diff --git a/‎scripts/prepare_cocostuff.sh
+5 b/‎scripts/prepare_cocostuff.sh
+5
diff --git a/‎scripts/prepare_train_box.sh
+3 b/‎scripts/prepare_train_box.sh
+3
diff --git a/‎scripts/train.sh
+18 b/‎scripts/train.sh
+18
diff --git a/‎train.py
+113 b/‎train.py
+113
@@ -70,6 +70,9 @@ All the colorized results would save in `results` folder.
 
 * Note: all the images would convert into L channel to colorize in [test_fusion.py's L51](test_fusion.py#L51)
 
+## Training the Model
+Please follow this [tutorial](README_TRAIN.md) to train the colorization model.
+
 ## License
 This work is licensed under MIT License. See [LICENSE](LICENSE) for details. 
 
 
@@ -0,0 +1,112 @@
+# [CVPR 2020] Instance-aware Image Colorization
+[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ericsujw/InstColorization/blob/master/InstColorization.ipynb)
+
+### [[Paper](https://arxiv.org/abs/2005.10825)] [[Project Website](https://ericsujw.github.io/InstColorization/)] [[Google Colab](https://colab.research.google.com/github/ericsujw/InstColorization/blob/master/InstColorization.ipynb)]
+
+<p align='center'>
+<img src='imgs/teaser.png' width=1000>
+</p>
+
+Image colorization is inherently an ill-posed problem with multi-modal uncertainty. Previous methods leverage the deep neural network to map input grayscale images to plausible color outputs directly. Although these learning-based methods have shown impressive performance, they usually fail on the input images that contain multiple objects. The leading cause is that existing models perform learning and colorization on the entire image. In the absence of a clear figure-ground separation, these models cannot effectively locate and learn meaningful object-level semantics. In this paper, we propose a method for achieving instance-aware colorization. Our network architecture leverages an off-the-shelf object detector to obtain cropped object images and uses an instance colorization network to extract object-level features. We use a similar network to extract the full-image features and apply a fusion module to full object-level and image-level features to predict the final colors. Both colorization networks and fusion modules are learned from a large-scale dataset. Experimental results show that our work outperforms existing methods on different quality metrics and achieves state-of-the-art performance on image colorization.
+
+
+**Instance-aware Image Colorization**
+<br/>
+[Jheng-Wei Su](https://github.com/ericsujw), 
+[Hung-Kuo Chu](https://cgv.cs.nthu.edu.tw/hkchu/), and 
+[Jia-Bin Huang](https://filebox.ece.vt.edu/~jbhuang/)
+<br/>
+In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
+
+## Prerequisites
+* [CUDA 10.1](https://developer.nvidia.com/cuda-10.1-download-archive-update2)
+* Python3
+* Pytorch >= 1.5
+* Detectron2
+* OpenCV-Python
+* Pillow/scikit-image
+* Please refer to the [env.yml](env.yml) for detail dependencies.
+
+## Getting Started
+1. Clone this repo:
+```sh
+git clone https://github.com/ericsujw/InstColorization
+cd InstColorization
+```
+2. Install [conda](https://www.anaconda.com/).
+3. Install all the dependencies
+```sh
+conda env create --file env.yml
+```
+4. Switch to the conda environment
+```sh
+conda activate instacolorization
+```
+5. Install other dependencies
+```sh
+sh scripts/install.sh
+```
+
+## Dataset Preparation
+### COCOStuff
+1. Download and unzip the COCOStuff training set:
+```sh
+sh scripts/prepare_cocostuff.sh
+```
+2. Now the COCOStuff train set would place in [train_data](train_data).
+
+### Your own Dataset
+1. If you want to train on your dataset, you should change the dataset path in [scripts/prepare_train_box.sh's L1](scripts/prepare_train_box.sh#L1) and in [scripts/train.sh's L1](scripts/train.sh#L1).
+
+## Pretrained Model
+1. Download it from [google drive](https://drive.google.com/open?id=1Xb-DKAA9ibCVLqm8teKd1MWk6imjwTBh).
+```sh
+sh scripts/download_model.sh
+```
+2. Now the pretrained models would place in [checkpoints](checkpoints).
+
+## Instance Prediction
+Please follow the command below to predict all the bounding boxes fo the images in `${DATASET_DIR}` folder.
+```sh
+sh scripts/prepare_train_box.sh
+```
+All the prediction results would save in `${DATASET_DIR}_bbox` folder.
+
+## Training the Instance-aware Image Colorization model
+Simply run the following command, then the training pipeline would get start.
+```sh
+sh scripts/train.sh
+```
+To view training results and loss plots, run `visdom -port 8098` and click the URL http://localhost:8098.
+
+This is a 3 stage training process.
+1. We would start to train our full image colorization branch based on the [siggraph_retrained's pretrained weight](https://github.com/richzhang/colorization-pytorch).
+2. We would use the full image colorization branch's weight as our instance colorization branch's pretrained weight.
+3. Finally, we would train the fusion module.
+
+## Testing the Instance-aware Image Colorization model
+1. Our model's weight would place in [checkpoints/coco_mask](checkpoints/coco_mask).
+2. Change the checkpoint's path in [test_fusion.py's L38](test_fusion.py#L38) from `coco_finetuned_mask_256_ffs` to `coco_mask`
+3. Please follow the command below to colorize all the images in `example` foler based on the weight placed in `coco_mask`.
+
+    ```
+    python test_fusion.py --name test_fusion --sample_p 1.0 --model fusion --fineSize 256 --test_img_dir example --results_img_dir results
+    ```
+    All the colorized results would save in `results` folder.
+
+## License
+This work is licensed under MIT License. See [LICENSE](LICENSE) for details. 
+
+## Citation
+If you find our code/models useful, please consider citing our paper:
+```
+@inproceedings{Su-CVPR-2020,
+  author = {Su, Jheng-Wei and Chu, Hung-Kuo and Huang, Jia-Bin},
+  title = {Instance-aware Image Colorization},
+  booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
+  year = {2020}
+}
+```
+
+## Acknowledgments
+Our code borrows heavily from the amazing [colorization-pytorch](https://github.com/richzhang/colorization-pytorch) repository.
@@ -1,5 +1,8 @@
 #taken from this StackOverflow answer: https://stackoverflow.com/a/39225039
 import requests
+from os.path import join, isdir
+import os
+from argparse import ArgumentParser
 
 def download_file_from_google_drive(id, destination):
     URL = "https://docs.google.com/uc?export=download"
@@ -30,6 +33,24 @@ def save_response_content(response, destination):
             if chunk: # filter out keep-alive new chunks
                 f.write(chunk)
 
-file_id = '1Xb-DKAA9ibCVLqm8teKd1MWk6imjwTBh'
-destination = 'checkpoints.zip'
-download_file_from_google_drive(file_id, destination)
+
+parser = ArgumentParser()
+parser.add_argument("--mode", type=str, default='pretrained-weight', help='pretrained-weight / cocostuff')
+parser.add_argument("--dataset_dir", type=str, default='data', help='training dataset path')
+args = parser.parse_args()
+
+if args.mode == 'pretrained-weight':
+
+    file_id = '1Xb-DKAA9ibCVLqm8teKd1MWk6imjwTBh'
+    destination = 'checkpoints.zip'
+    download_file_from_google_drive(file_id, destination)
+
+elif args.mode == 'cocostuff':
+    print('download cocostuff training dataset')
+    url = "http://images.cocodataset.org/zips/train2017.zip"
+    response = requests.get(url, stream = True)
+    if isdir(join(args.dataset_dir, "cocostuff")) is False:
+        os.makedirs(join(args.dataset_dir, "cocostuff"))
+    save_response_content(response, join(args.dataset_dir, "cocostuff", "train.zip"))
+else:
+    print('Error Mode!')
@@ -1,5 +1,6 @@
 from os import listdir
 from os.path import isfile, join
+from random import sample
 
 import numpy as np
 import torch
@@ -54,5 +55,126 @@ def __getitem__(self, index):
             output['empty_box'] = True
         return output
 
+    def __len__(self):
+        return len(self.IMAGE_ID_LIST)
+
+
+class Training_Full_Dataset(Data.Dataset):
+    '''
+    Training on COCOStuff dataset. [train2017.zip]
+    
+    Download the training set from https://github.com/nightrome/cocostuff
+    '''
+    def __init__(self, opt):
+        self.IMAGE_DIR = opt.train_img_dir
+        self.transforms = transforms.Compose([transforms.Resize((opt.fineSize, opt.fineSize), interpolation=2),
+                                              transforms.ToTensor()])
+        self.IMAGE_ID_LIST = [f for f in listdir(self.IMAGE_DIR) if isfile(join(self.IMAGE_DIR, f))]
+
+    def __getitem__(self, index):
+        output_image_path = join(self.IMAGE_DIR, self.IMAGE_ID_LIST[index])
+        rgb_img, gray_img = gen_gray_color_pil(output_image_path)
+        output = {}
+        output['rgb_img'] = self.transforms(rgb_img)
+        output['gray_img'] = self.transforms(gray_img)
+        return output
+
+    def __len__(self):
+        return len(self.IMAGE_ID_LIST)
+
+
+class Training_Instance_Dataset(Data.Dataset):
+    '''
+    Training on COCOStuff dataset. [train2017.zip]
+    
+    Download the training set from https://github.com/nightrome/cocostuff
+
+    Make sure you've predicted all the images' bounding boxes using inference_bbox.py
+
+    It would be better if you can filter out the images which don't have any box.
+    '''
+    def __init__(self, opt):
+        self.PRED_BBOX_DIR = '{0}_bbox'.format(opt.train_img_dir)
+        self.IMAGE_DIR = opt.train_img_dir
+        self.IMAGE_ID_LIST = [f for f in listdir(self.IMAGE_DIR) if isfile(join(self.IMAGE_DIR, f))]
+        self.transforms = transforms.Compose([
+            transforms.Resize((opt.fineSize, opt.fineSize), interpolation=2),
+            transforms.ToTensor()
+        ])
+    
+    def __getitem__(self, index):
+        pred_info_path = join(self.PRED_BBOX_DIR, self.IMAGE_ID_LIST[index].split('.')[0] + '.npz')
+        output_image_path = join(self.IMAGE_DIR, self.IMAGE_ID_LIST[index])
+        pred_bbox = gen_maskrcnn_bbox_fromPred(pred_info_path)
+
+        rgb_img, gray_img = gen_gray_color_pil(output_image_path)
+
+        index_list = range(len(pred_bbox))
+        index_list = sample(index_list, 1)
+        startx, starty, endx, endy = pred_bbox[index_list[0]]
+        output = {}
+        output['rgb_img'] = self.transforms(rgb_img.crop((startx, starty, endx, endy)))
+        output['gray_img'] = self.transforms(gray_img.crop((startx, starty, endx, endy)))
+        return output
+
+    def __len__(self):
+        return len(self.IMAGE_ID_LIST)
+
+
+class Training_Fusion_Dataset(Data.Dataset):
+    '''
+    Training on COCOStuff dataset. [train2017.zip]
+    
+    Download the training set from https://github.com/nightrome/cocostuff
+
+    Make sure you've predicted all the images' bounding boxes using inference_bbox.py
+
+    It would be better if you can filter out the images which don't have any box.
+    '''
+    def __init__(self, opt, box_num=8):
+        self.PRED_BBOX_DIR = '{0}_bbox'.format(opt.train_img_dir)
+        self.IMAGE_DIR = opt.train_img_dir
+        self.IMAGE_ID_LIST = [f for f in listdir(self.IMAGE_DIR) if isfile(join(self.IMAGE_DIR, f))]
+
+        self.transforms = transforms.Compose([transforms.Resize((opt.fineSize, opt.fineSize), interpolation=2),
+                                              transforms.ToTensor()])
+        self.final_size = opt.fineSize
+        self.box_num = box_num
+
+    def __getitem__(self, index):
+        pred_info_path = join(self.PRED_BBOX_DIR, self.IMAGE_ID_LIST[index].split('.')[0] + '.npz')
+        output_image_path = join(self.IMAGE_DIR, self.IMAGE_ID_LIST[index])
+        pred_bbox = gen_maskrcnn_bbox_fromPred(pred_info_path, self.box_num)
+
+        full_rgb_list = []
+        full_gray_list = []
+        rgb_img, gray_image = gen_gray_color_pil(output_image_path)
+        full_rgb_list.append(self.transforms(rgb_img))
+        full_gray_list.append(self.transforms(gray_image))
+        
+        cropped_rgb_list = []
+        cropped_gray_list = []
+        index_list = range(len(pred_bbox))
+        box_info, box_info_2x, box_info_4x, box_info_8x = np.zeros((4, len(index_list), 6))
+        for i in range(len(index_list)):
+            startx, starty, endx, endy = pred_bbox[i]
+            box_info[i] = np.array(get_box_info(pred_bbox[i], rgb_img.size, self.final_size))
+            box_info_2x[i] = np.array(get_box_info(pred_bbox[i], rgb_img.size, self.final_size // 2))
+            box_info_4x[i] = np.array(get_box_info(pred_bbox[i], rgb_img.size, self.final_size // 4))
+            box_info_8x[i] = np.array(get_box_info(pred_bbox[i], rgb_img.size, self.final_size // 8))
+            cropped_rgb_list.append(self.transforms(rgb_img.crop((startx, starty, endx, endy))))
+            cropped_gray_list.append(self.transforms(gray_image.crop((startx, starty, endx, endy))))
+        output = {}
+        output['cropped_rgb'] = torch.stack(cropped_rgb_list)
+        output['cropped_gray'] = torch.stack(cropped_gray_list)
+        output['full_rgb'] = torch.stack(full_rgb_list)
+        output['full_gray'] = torch.stack(full_gray_list)
+        output['box_info'] = torch.from_numpy(box_info).type(torch.long)
+        output['box_info_2x'] = torch.from_numpy(box_info_2x).type(torch.long)
+        output['box_info_4x'] = torch.from_numpy(box_info_4x).type(torch.long)
+        output['box_info_8x'] = torch.from_numpy(box_info_8x).type(torch.long)
+        output['file_id'] = self.IMAGE_ID_LIST[index]
+        return output
+
     def __len__(self):
         return len(self.IMAGE_ID_LIST)
@@ -3,6 +3,19 @@
 from skimage import color
 import torch
 
+def gen_gray_color_pil(color_img_path):
+    '''
+    return: RGB and GRAY pillow image object
+    '''
+    rgb_img = Image.open(color_img_path)
+    if len(np.asarray(rgb_img).shape) == 2:
+        rgb_img = np.stack([np.asarray(rgb_img), np.asarray(rgb_img), np.asarray(rgb_img)], 2)
+        rgb_img = Image.fromarray(rgb_img)
+    gray_img = np.round(color.rgb2gray(np.asarray(rgb_img)) * 255.0).astype(np.uint8)
+    gray_img = np.stack([gray_img, gray_img, gray_img], -1)
+    gray_img = Image.fromarray(gray_img)
+    return rgb_img, gray_img
+
 def read_to_pil(img_path):
     '''
     return: pillow image object HxWx3
 
@@ -17,6 +17,7 @@
 from detectron2.config import get_cfg
 
 import torch
+from tqdm import tqdm
 
 cfg = get_cfg()
 cfg.merge_from_file(model_zoo.get_config_file("COCO-InstanceSegmentation/mask_rcnn_X_101_32x8d_FPN_3x.yaml"))
@@ -26,6 +27,7 @@
 
 parser = ArgumentParser()
 parser.add_argument("--test_img_dir", type=str, default='example', help='testing images folder')
+parser.add_argument('--filter_no_obj', action='store_true')
 args = parser.parse_args()
 
 input_dir = args.test_img_dir
@@ -35,7 +37,7 @@
     print('Create path: {0}'.format(output_npz_dir))
     os.makedirs(output_npz_dir)
 
-for image_path in image_list:
+for image_path in tqdm(image_list):
     img = cv2.imread(join(input_dir, image_path))
     lab_image = cv2.cvtColor(img, cv2.COLOR_BGR2LAB)
     l_channel, a_channel, b_channel = cv2.split(lab_image)
@@ -44,4 +46,8 @@
     save_path = join(output_npz_dir, image_path.split('.')[0])
     pred_bbox = outputs["instances"].pred_boxes.to(torch.device('cpu')).tensor.numpy()
     pred_scores = outputs["instances"].scores.cpu().data.numpy()
+    if args.filter_no_obj is True and pred_bbox.shape[0] == 0:
+        print('delete {0}'.format(image_path))
+        os.remove(join(input_dir, image_path))
+        continue
     np.savez(save_path, bbox = pred_bbox, scores = pred_scores)
@@ -41,7 +41,6 @@ def setup(self, opt, parser=None):
 
         if not self.isTrain or opt.load_model:
             self.load_networks(opt.which_epoch)
-        # self.print_networks(opt.verbose)
 
     # make models eval mode during test time
     def eval(self):
 
@@ -3,6 +3,7 @@
 from torch.nn import init
 import functools
 import torch.nn.functional as F
+from torch.optim import lr_scheduler
 
 
 def get_norm_layer(norm_type='instance'):
@@ -17,6 +18,21 @@ def get_norm_layer(norm_type='instance'):
     return norm_layer
 
 
+def get_scheduler(optimizer, opt):
+    if opt.lr_policy == 'lambda':
+        def lambda_rule(epoch):
+            lr_l = 1.0 - max(0, epoch + 1 + opt.epoch_count - opt.niter) / float(opt.niter_decay + 1)
+            return lr_l
+        scheduler = lr_scheduler.LambdaLR(optimizer, lr_lambda=lambda_rule)
+    elif opt.lr_policy == 'step':
+        scheduler = lr_scheduler.StepLR(optimizer, step_size=opt.lr_decay_iters, gamma=0.1)
+    elif opt.lr_policy == 'plateau':
+        scheduler = lr_scheduler.ReduceLROnPlateau(optimizer, mode='min', factor=0.2, threshold=0.01, patience=5)
+    else:
+        return NotImplementedError('learning rate policy [%s] is not implemented', opt.lr_policy)
+    return scheduler
+
+
 def init_weights(net, init_type='xavier', gain=0.02):
     def init_func(m):
         classname = m.__class__.__name__
@@ -65,6 +81,30 @@ def define_G(input_nc, output_nc, ngf, which_model_netG, norm='batch', use_dropo
     return init_net(netG, init_type, gpu_ids)
 
 
+class HuberLoss(nn.Module):
+    def __init__(self, delta=.01):
+        super(HuberLoss, self).__init__()
+        self.delta=delta
+
+    def __call__(self, in0, in1):
+        mask = torch.zeros_like(in0)
+        mann = torch.abs(in0-in1)
+        eucl = .5 * (mann**2)
+        mask[...] = mann < self.delta
+
+        # loss = eucl*mask + self.delta*(mann-.5*self.delta)*(1-mask)
+        loss = eucl*mask/self.delta + (mann-.5*self.delta)*(1-mask)
+        return torch.sum(loss,dim=1,keepdim=True)
+
+
+class L1Loss(nn.Module):
+    def __init__(self):
+        super(L1Loss, self).__init__()
+
+    def __call__(self, in0, in1):
+        return torch.sum(torch.abs(in0-in1),dim=1,keepdim=True)
+
+
 class SIGGRAPHGenerator(nn.Module):
     def __init__(self, input_nc, output_nc, norm_layer=nn.BatchNorm2d, use_tanh=True, classification=True):
         super(SIGGRAPHGenerator, self).__init__()
 
@@ -0,0 +1,182 @@
+import os
+
+import torch
+from collections import OrderedDict
+from util.image_pool import ImagePool
+from util import util
+from .base_model import BaseModel
+from . import networks
+import numpy as np
+from skimage import io
+from skimage import img_as_ubyte
+
+import matplotlib.pyplot as plt
+import math
+from matplotlib import colors
+
+
+class TrainModel(BaseModel):
+    def name(self):
+        return 'TrainModel'
+
+    @staticmethod
+    def modify_commandline_options(parser, is_train=True):
+        return parser
+
+    def initialize(self, opt):
+        BaseModel.initialize(self, opt)
+        self.loss_names = ['G', 'L1']
+        # load/define networks
+        num_in = opt.input_nc + opt.output_nc + 1
+        self.optimizers = []
+        if opt.stage == 'full' or opt.stage == 'instance':
+            self.model_names = ['G']
+            self.netG = networks.define_G(num_in, opt.output_nc, opt.ngf,
+                                        'siggraph', opt.norm, not opt.no_dropout, opt.init_type, self.gpu_ids,
+                                        use_tanh=True, classification=opt.classification)
+            self.optimizer_G = torch.optim.Adam(self.netG.parameters(),
+                                                lr=opt.lr, betas=(opt.beta1, 0.999))
+            self.optimizers.append(self.optimizer_G)
+        elif opt.stage == 'fusion':
+            self.model_names = ['G', 'GF', 'GComp']
+            self.netG = networks.define_G(num_in, opt.output_nc, opt.ngf,
+                                        'instance', opt.norm, not opt.no_dropout, opt.init_type, self.gpu_ids,
+                                        use_tanh=True, classification=False)
+            self.netG.eval()
+            
+            self.netGF = networks.define_G(num_in, opt.output_nc, opt.ngf,
+                                        'fusion', opt.norm, not opt.no_dropout, opt.init_type, self.gpu_ids,
+                                        use_tanh=True, classification=False)
+            self.netGF.eval()
+
+            self.netGComp = networks.define_G(num_in, opt.output_nc, opt.ngf,
+                                        'siggraph', opt.norm, not opt.no_dropout, opt.init_type, self.gpu_ids,
+                                        use_tanh=True, classification=opt.classification)
+            self.netGComp.eval()
+            self.optimizer_G = torch.optim.Adam(list(self.netGF.module.weight_layer.parameters()) +
+                                                list(self.netGF.module.weight_layer2.parameters()) +
+                                                list(self.netGF.module.weight_layer3.parameters()) +
+                                                list(self.netGF.module.weight_layer4.parameters()) +
+                                                list(self.netGF.module.weight_layer5.parameters()) +
+                                                list(self.netGF.module.weight_layer6.parameters()) +
+                                                list(self.netGF.module.weight_layer7.parameters()) +
+                                                list(self.netGF.module.weight_layer8_1.parameters()) +
+                                                list(self.netGF.module.weight_layer8_2.parameters()) +
+                                                list(self.netGF.module.weight_layer9_1.parameters()) +
+                                                list(self.netGF.module.weight_layer9_2.parameters()) +
+                                                list(self.netGF.module.weight_layer10_1.parameters()) +
+                                                list(self.netGF.module.weight_layer10_2.parameters()) +
+                                                list(self.netGF.module.model10.parameters()) +
+                                                list(self.netGF.module.model_out.parameters()),
+                                                lr=opt.lr, betas=(opt.beta1, 0.999))
+            self.optimizers.append(self.optimizer_G)
+        else:
+            print('Error Stage!')
+            exit()
+        self.criterionL1 = networks.HuberLoss(delta=1. / opt.ab_norm)
+        # self.criterionL1 = networks.L1Loss()
+
+        # initialize average loss values
+        self.avg_losses = OrderedDict()
+        self.avg_loss_alpha = opt.avg_loss_alpha
+        self.error_cnt = 0
+        for loss_name in self.loss_names:
+            self.avg_losses[loss_name] = 0
+        
+    def set_input(self, input):
+        AtoB = self.opt.which_direction == 'AtoB'
+        self.real_A = input['A' if AtoB else 'B'].to(self.device)
+        self.real_B = input['B' if AtoB else 'A'].to(self.device)
+        self.hint_B = input['hint_B'].to(self.device)
+        
+        self.mask_B = input['mask_B'].to(self.device)
+        self.mask_B_nc = self.mask_B + self.opt.mask_cent
+
+        self.real_B_enc = util.encode_ab_ind(self.real_B[:, :, ::4, ::4], self.opt)
+    
+    def set_fusion_input(self, input, box_info):
+        AtoB = self.opt.which_direction == 'AtoB'
+        self.full_real_A = input['A' if AtoB else 'B'].to(self.device)
+        self.full_real_B = input['B' if AtoB else 'A'].to(self.device)
+
+        self.full_hint_B = input['hint_B'].to(self.device)
+        self.full_mask_B = input['mask_B'].to(self.device)
+
+        self.full_mask_B_nc = self.full_mask_B + self.opt.mask_cent
+        self.full_real_B_enc = util.encode_ab_ind(self.full_real_B[:, :, ::4, ::4], self.opt)
+        self.box_info_list = box_info
+
+    def forward(self):
+        if self.opt.stage == 'full' or self.opt.stage == 'instance':
+            (_, self.fake_B_reg) = self.netG(self.real_A, self.hint_B, self.mask_B)
+        elif self.opt.stage == 'fusion':
+            (_, self.comp_B_reg) = self.netGComp(self.full_real_A, self.full_hint_B, self.full_mask_B)
+            (_, feature_map) = self.netG(self.real_A, self.hint_B, self.mask_B)
+            self.fake_B_reg = self.netGF(self.full_real_A, self.full_hint_B, self.full_mask_B, feature_map, self.box_info_list)
+        else:
+            print('Error! Wrong stage selection!')
+            exit()
+
+    def optimize_parameters(self):
+        self.forward()
+        self.optimizer_G.zero_grad()
+        if self.opt.stage == 'full' or self.opt.stage == 'instance':
+            self.loss_L1 = torch.mean(self.criterionL1(self.fake_B_reg.type(torch.cuda.FloatTensor),
+                                                        self.real_B.type(torch.cuda.FloatTensor)))
+            self.loss_G = 10 * torch.mean(self.criterionL1(self.fake_B_reg.type(torch.cuda.FloatTensor),
+                                                        self.real_B.type(torch.cuda.FloatTensor)))
+        elif self.opt.stage == 'fusion':
+            self.loss_L1 = torch.mean(self.criterionL1(self.fake_B_reg.type(torch.cuda.FloatTensor),
+                                                        self.full_real_B.type(torch.cuda.FloatTensor)))
+            self.loss_G = 10 * torch.mean(self.criterionL1(self.fake_B_reg.type(torch.cuda.FloatTensor),
+                                                        self.full_real_B.type(torch.cuda.FloatTensor)))
+        else:
+            print('Error! Wrong stage selection!')
+            exit()
+        self.loss_G.backward()
+        self.optimizer_G.step()
+
+    def get_current_visuals(self):
+        from collections import OrderedDict
+        visual_ret = OrderedDict()
+        if self.opt.stage == 'full' or self.opt.stage == 'instance':
+            visual_ret['gray'] = util.lab2rgb(torch.cat((self.real_A.type(torch.cuda.FloatTensor), torch.zeros_like(self.real_B).type(torch.cuda.FloatTensor)), dim=1), self.opt)
+            visual_ret['real'] = util.lab2rgb(torch.cat((self.real_A.type(torch.cuda.FloatTensor), self.real_B.type(torch.cuda.FloatTensor)), dim=1), self.opt)
+            visual_ret['fake_reg'] = util.lab2rgb(torch.cat((self.real_A.type(torch.cuda.FloatTensor), self.fake_B_reg.type(torch.cuda.FloatTensor)), dim=1), self.opt)
+
+            visual_ret['hint'] = util.lab2rgb(torch.cat((self.real_A.type(torch.cuda.FloatTensor), self.hint_B.type(torch.cuda.FloatTensor)), dim=1), self.opt)
+            visual_ret['real_ab'] = util.lab2rgb(torch.cat((torch.zeros_like(self.real_A.type(torch.cuda.FloatTensor)), self.real_B.type(torch.cuda.FloatTensor)), dim=1), self.opt)
+            visual_ret['fake_ab_reg'] = util.lab2rgb(torch.cat((torch.zeros_like(self.real_A.type(torch.cuda.FloatTensor)), self.fake_B_reg.type(torch.cuda.FloatTensor)), dim=1), self.opt)
+            
+        elif self.opt.stage == 'fusion':
+            visual_ret['gray'] = util.lab2rgb(torch.cat((self.full_real_A.type(torch.cuda.FloatTensor), torch.zeros_like(self.full_real_B).type(torch.cuda.FloatTensor)), dim=1), self.opt)
+            visual_ret['real'] = util.lab2rgb(torch.cat((self.full_real_A.type(torch.cuda.FloatTensor), self.full_real_B.type(torch.cuda.FloatTensor)), dim=1), self.opt)
+            visual_ret['comp_reg'] = util.lab2rgb(torch.cat((self.full_real_A.type(torch.cuda.FloatTensor), self.comp_B_reg.type(torch.cuda.FloatTensor)), dim=1), self.opt)
+            visual_ret['fake_reg'] = util.lab2rgb(torch.cat((self.full_real_A.type(torch.cuda.FloatTensor), self.fake_B_reg.type(torch.cuda.FloatTensor)), dim=1), self.opt)
+
+            self.instance_mask = torch.nn.functional.interpolate(torch.zeros([1, 1, 176, 176]), size=visual_ret['gray'].shape[2:], mode='bilinear').type(torch.cuda.FloatTensor)
+            visual_ret['box_mask'] = torch.cat((self.instance_mask, self.instance_mask, self.instance_mask), 1)
+            visual_ret['real_ab'] = util.lab2rgb(torch.cat((torch.zeros_like(self.full_real_A.type(torch.cuda.FloatTensor)), self.full_real_B.type(torch.cuda.FloatTensor)), dim=1), self.opt)
+            visual_ret['comp_ab_reg'] = util.lab2rgb(torch.cat((torch.zeros_like(self.full_real_A.type(torch.cuda.FloatTensor)), self.comp_B_reg.type(torch.cuda.FloatTensor)), dim=1), self.opt)
+            visual_ret['fake_ab_reg'] = util.lab2rgb(torch.cat((torch.zeros_like(self.full_real_A.type(torch.cuda.FloatTensor)), self.fake_B_reg.type(torch.cuda.FloatTensor)), dim=1), self.opt)
+        else:
+            print('Error! Wrong stage selection!')
+            exit()
+        return visual_ret
+
+    # return training losses/errors. train.py will print out these errors as debugging information
+    def get_current_losses(self):
+        self.error_cnt += 1
+        errors_ret = OrderedDict()
+        for name in self.loss_names:
+            if isinstance(name, str):
+                # float(...) works for both scalar tensor and float number
+                self.avg_losses[name] = float(getattr(self, 'loss_' + name)) + self.avg_loss_alpha * self.avg_losses[name]
+                errors_ret[name] = (1 - self.avg_loss_alpha) / (1 - self.avg_loss_alpha**self.error_cnt) * self.avg_losses[name]
+        return errors_ret
+
+    def save_fusion_epoch(self, epoch):
+        path = '{0}/{1}_net_GF.pth'.format(os.path.join(self.opt.checkpoints_dir, self.opt.name), epoch)
+        latest_path = '{0}/latest_net_GF.pth'.format(os.path.join(self.opt.checkpoints_dir, self.opt.name))
+        torch.save(self.netGF.state_dict(), path)
+        torch.save(self.netGF.state_dict(), latest_path)
@@ -4,6 +4,10 @@
 class TrainOptions(BaseOptions):
     def initialize(self, parser):
         BaseOptions.initialize(self, parser)
+        parser.add_argument('--stage', type=str, default='full', help='only full, instance or fusion')
+        parser.add_argument('--train_img_dir', type=str, default='train_data/train2017', help='training images folder')
+        parser.add_argument('--model', type=str, default='train', help='only train_model need to be used')
+        parser.add_argument('--name', type=str, default='coco_mask', help='name of the experiment. It decides where to store samples and models')
         parser.add_argument('--display_freq', type=int, default=2000, help='frequency of showing training results on screen')
         parser.add_argument('--display_ncols', type=int, default=5, help='if positive, display all images in a single visdom web panel with certain number of images per row.')
         parser.add_argument('--update_html_freq', type=int, default=10000, help='frequency of saving training results to html')
 
@@ -0,0 +1,5 @@
+DATASET_DIR="train_data"
+
+python download.py --mode cocostuff --dataset_dir $DATASET_DIR
+echo "Finish download."
+unzip "$DATASET_DIR/cocostuff/train.zip" -d "$DATASET_DIR"
@@ -0,0 +1,3 @@
+DATASET_DIR=train_data/train2017
+
+python inference_bbox.py --test_img_dir $DATASET_DIR --filter_no_obj
@@ -0,0 +1,18 @@
+DATASET_DIR=train_data/train2017
+
+# Stage 1: Training Full Image Colorization
+mkdir ./checkpoints/coco_full
+cp ./checkpoints/siggraph_retrained/latest_net_G.pth ./checkpoints/coco_full/
+python train.py --stage full --name coco_full --sample_p 1.0 --niter 100 --niter_decay 50 --load_model --lr 0.0005 --model train --fineSize 256 --batch_size 16 --display_ncols 3 --display_freq 1600 --print_freq 1600 --train_img_dir $DATASET_DIR
+
+# Stage 2: Training Instance Image Colorization
+mkdir ./checkpoints/coco_instance
+cp ./checkpoints/coco_full/latest_net_G.pth ./checkpoints/coco_instance/
+python train.py --stage instance --name coco_instance --sample_p 1.0 --niter 100 --niter_decay 50 --load_model --lr 0.0005 --model train --fineSize 256 --batch_size 16 --display_ncols 3 --display_freq 1600 --print_freq 1600 --train_img_dir $DATASET_DIR
+
+# Stage 3: Training Fusion Module
+mkdir ./checkpoints/coco_mask
+cp ./checkpoints/coco_full/latest_net_G.pth ./checkpoints/coco_mask/latest_net_GF.pth
+cp ./checkpoints/coco_instance/latest_net_G.pth ./checkpoints/coco_mask/latest_net_G.pth
+cp ./checkpoints/coco_full/latest_net_G.pth ./checkpoints/coco_mask/latest_net_GComp.pth
+python train.py --stage fusion --name coco_mask --sample_p 1.0 --niter 10 --niter_decay 20 --lr 0.00005 --model train --load_model --display_ncols 4 --fineSize 256 --batch_size 1 --display_freq 500 --print_freq 500 --train_img_dir $DATASET_DIR
@@ -0,0 +1,113 @@
+import time
+from options.train_options import TrainOptions
+from models import create_model
+from util.visualizer import Visualizer
+
+import torch
+import torchvision
+import torchvision.transforms as transforms
+from tqdm import trange, tqdm
+
+from fusion_dataset import *
+from util import util
+import os
+
+if __name__ == '__main__':
+    opt = TrainOptions().parse()
+    if opt.stage == 'full':
+        dataset = Training_Full_Dataset(opt)
+    elif opt.stage == 'instance':
+        dataset = Training_Instance_Dataset(opt)
+    elif opt.stage == 'fusion':
+        dataset = Training_Fusion_Dataset(opt)
+    else:
+        print('Error! Wrong stage selection!')
+        exit()
+    dataset_loader = torch.utils.data.DataLoader(dataset, batch_size=opt.batch_size, shuffle=True, num_workers=8)
+
+    dataset_size = len(dataset)
+    print('#training images = %d' % dataset_size)
+
+    model = create_model(opt)
+    model.setup(opt)
+
+    opt.display_port = 8098
+    visualizer = Visualizer(opt)
+    total_steps = 0
+
+    if opt.stage == 'full' or opt.stage == 'instance':
+        for epoch in trange(opt.epoch_count, opt.niter + opt.niter_decay, desc='epoch', dynamic_ncols=True):
+            epoch_iter = 0
+
+            for data_raw in tqdm(dataset_loader, desc='batch', dynamic_ncols=True, leave=False):
+                total_steps += opt.batch_size
+                epoch_iter += opt.batch_size
+
+                data_raw['rgb_img'] = [data_raw['rgb_img']]
+                data_raw['gray_img'] = [data_raw['gray_img']]
+
+                input_data = util.get_colorization_data(data_raw['gray_img'], opt, p=1.0, ab_thresh=0)
+                gt_data = util.get_colorization_data(data_raw['rgb_img'], opt, p=1.0, ab_thresh=10.0)
+                if gt_data is None:
+                    continue
+                if(gt_data['B'].shape[0] < opt.batch_size):
+                    continue
+                input_data['B'] = gt_data['B']
+                input_data['hint_B'] = gt_data['hint_B']
+                input_data['mask_B'] = gt_data['mask_B']
+
+                visualizer.reset()
+                model.set_input(input_data)
+                model.optimize_parameters()
+
+                if total_steps % opt.display_freq == 0:
+                    save_result = total_steps % opt.update_html_freq == 0
+                    visualizer.display_current_results(model.get_current_visuals(), epoch, save_result)
+
+                if total_steps % opt.print_freq == 0:
+                    losses = model.get_current_losses()
+                    if opt.display_id > 0:
+                        visualizer.plot_current_losses(epoch, float(epoch_iter) / dataset_size, opt, losses)
+
+            if epoch % opt.save_epoch_freq == 0:
+                model.save_networks('latest')
+                model.save_networks(epoch)
+            model.update_learning_rate()
+    elif opt.stage == 'fusion':
+        for epoch in trange(opt.epoch_count, opt.niter + opt.niter_decay, desc='epoch', dynamic_ncols=True):
+            epoch_iter = 0
+
+            for data_raw in tqdm(dataset_loader, desc='batch', dynamic_ncols=True, leave=False):
+                total_steps += opt.batch_size
+                epoch_iter += opt.batch_size
+                box_info = data_raw['box_info'][0]
+                box_info_2x = data_raw['box_info_2x'][0]
+                box_info_4x = data_raw['box_info_4x'][0]
+                box_info_8x = data_raw['box_info_8x'][0]
+                cropped_input_data = util.get_colorization_data(data_raw['cropped_gray'], opt, p=1.0, ab_thresh=0)
+                cropped_gt_data = util.get_colorization_data(data_raw['cropped_rgb'], opt, p=1.0, ab_thresh=10.0)
+                full_input_data = util.get_colorization_data(data_raw['full_gray'], opt, p=1.0, ab_thresh=0)
+                full_gt_data = util.get_colorization_data(data_raw['full_rgb'], opt, p=1.0, ab_thresh=10.0)
+                if cropped_gt_data is None or full_gt_data is None:
+                    continue
+                cropped_input_data['B'] = cropped_gt_data['B']
+                full_input_data['B'] = full_gt_data['B']
+                visualizer.reset()
+                model.set_input(cropped_input_data)
+                model.set_fusion_input(full_input_data, [box_info, box_info_2x, box_info_4x, box_info_8x])
+                model.optimize_parameters()
+
+                if total_steps % opt.display_freq == 0:
+                    save_result = total_steps % opt.update_html_freq == 0
+                    visualizer.display_current_results(model.get_current_visuals(), epoch, save_result)
+
+                if total_steps % opt.print_freq == 0:
+                    losses = model.get_current_losses()
+                    if opt.display_id > 0:
+                        visualizer.plot_current_losses(epoch, float(epoch_iter) / dataset_size, opt, losses)
+            if epoch % opt.save_epoch_freq == 0:
+                model.save_fusion_epoch(epoch)
+            model.update_learning_rate()
+    else:
+        print('Error! Wrong stage selection!')
+        exit()
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	`+DATASET_DIR=train_data/train2017`
	`2`	`+`
	`3`	`+python inference_bbox.py --test_img_dir $DATASET_DIR --filter_no_obj`