Skip to content

Code for the IJCAI 2019 paper "Learning Shared Knowledge for Deep Lifelong Learning using Deconvolutional Networks" by Seungwon Lee, James Stokes, Eric Eaton

License

Notifications You must be signed in to change notification settings

NeuroDataDesign/DF-CNN

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Learning Shared Knowledge for Deep Lifelong Learning using Deconvolutional Networks

This is code of Deconvolutional Factorized CNN, proposed in IJCAI 2019 paper "Learning Shared Knowledge for Deep Lifelong Learning using Deconvolutional Networks".

Version and Dependencies

This code is compatible with both Python 2.7 and Python 3.5. It requires numpy, tensorflow and scikit-image. Pre-processed data is stored and loaded by .pkl file, but please be aware that pickle file generated by Python 3 is NOT compatible to Python 2. Currently, MATLAB is required to load summary of experiment result because it is saved in .mat file.

Data

  • MNIST (MTL)

    • MNIST data have 10 labels, so we split them into 5 binary classification tasks (0 vs 1, 2 vs 3, and so on) for heterogeneous task distribution and into 10 one-vs-all classification tasks for homogeneous task distribution.
    • Baseline models have good accuracy, so we used 3%, 5%, 7%, 10% and 30% of the provided data for training/validation set per a task. Test data have 1800/2000 instances per a task for heterogeneous and homogeneous task distribution respectively.
    • We didn't use any method for data augmentation, but we rescaled range of image value to 0~1.
  • CIFAR-10 (MTL)

    • CIFAR-10 data have 10 labels, so we split them into 5 binary classification tasks (0 vs 1, 2 vs 3, and so on) for heterogeneous task distribution and into 10 one-vs-all classification tasks for homogeneous task distribution.
    • We trained/tested models at following cases: training and validation data are 4%, 10%, 30%, 50% and 70% of the provided dataset for training.
    • Test data have 2000 instances per a task.
    • We didn't use any method for data augmentation, but we did normalization.
  • CIFAR-100 (Lifelong)

    • Similar to CIFAR-10, but having 100 classes.
    • Each task is 10-class classification task, and there are 10 tasks for the lifelong learning task with heterogeneous task distribution (disjoint set of image classes for these sub-tasks).
    • We trained models by using only 4% of the available dataset.
    • We normalized images.
  • Office-Home (Lifelong)

    • We used images in Product and Real-World domains.
    • Each task is 13-class classification task, and image classes of sub-tasks are randomly chosen without repetition (but distinguishing classes from Product domain and those from Real-World domain).
    • Images are rescaled to 128x128 size and rescaled range of pixel value to 0~1, but not normalized or augmented.

Proposed Model

  • DF-CNN model (Deconvolutional_Factorized_CNN model in the code)

  • Ablated model 1: DF-CNN.direct model (Deconvolutional_Factorized_CNN_Direct model in the code)

  • Ablated model 2: DF-CNN.tc2 model (Deconvolutional_Factorized_CNN_tc2 model in the code)

Baseline Model

  • Single Task model

    • Construct independent models as many as the number of tasks, and train them independently.
  • Single Neural Net model

    • Construct a single neural network, and treat data of all task as same.
  • Hard-parameter Shared model

    • Neural networks for tasks share convolution layers, and have independent fully-connected layers for output.
  • Tensor Factorization model

    • Factorize parameter of each layer into multiplication of several tensors, and share all but one over different tasks. (Details in the paper Yang, Yongxin, and Timothy Hospedales. "Deep multi-task representation learning: A tensor factorisation approach." arXiv preprint arXiv:1605.06391 (2016).)
    • We used Tucker decomposition because it worked better than others.
  • Dynamically Expandable Network model

    • Extended hard-parameter shared model by retraining some neurons selectively/adding new neurons/splitting neurons into disjoint groups for different set of tasks according to the given data.
    • The code (cnn_den_model.py) is almost the same as code provided by authors.

How to use code

  • Prepare data:

    • Download raw MNIST, CIFAR-10, CIFAR-100 and/or Office-Home dataset, and place it in ./Data directory.
  • Run main_train_cl.py with following arguments:

    • gpu: index of GPU to use
    • data_type : name of dataset and number of tasks (e.g. MNIST5/MNIST10/CIFAR10_5/CIFAR10_10/CIFAR100_10/CIFAR100_20/OfficeHome)
    • data_percent : the percent of original dataset to be used for training. Please check utils/utils_env_cl.py prior to use.
    • model_type : type of architecture
    • lifelong : flag to set the framework of training as lifelong learning (only one task is available at every update)
    • save_mat_name : the name of .mat file storing all information computed during training
  • example command:

    • python main_train_cl.py --gpu 0 --data_type MNIST10 --data_percent 3 --model_type STL --save_mat_name MNIST10_result_3p_STL.mat
    • python main_train_cl.py --gpu 1 --data_type OfficeHome --model_type DFCNN --lifelong --save_mat_name OfficeHome_result_DFCNN_lifelong.mat

About

Code for the IJCAI 2019 paper "Learning Shared Knowledge for Deep Lifelong Learning using Deconvolutional Networks" by Seungwon Lee, James Stokes, Eric Eaton

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%