Skip to content

Latest commit

 

History

History
executable file
·
89 lines (62 loc) · 4.84 KB

README.md

File metadata and controls

executable file
·
89 lines (62 loc) · 4.84 KB

Colab Implement: ConVIRT model

Contrastive VIsual Representation Learning from Text

Deep neural networks learn from a large amount of data to obtain the correct parameters to perform a specific task. However, in practice, we often encounter a problem: insufficient amount of labeled data. However, if your data contains pairs of images and text, you can solve the problem with contrastive learning.

Contrastive learning is a kind of self-supervised learning method. It does not require specialized labels, but rather a method to learn the correct parameters from the unlabeled data itself. It aims to learn an encoder that makes the encoding results of similar classes of data similar and makes the encoding results of different classes of data as different as possible. Typical contrast learning is done based on comparisons between two images. However, if we have paired image and text data, contrast learning can also be applied between images and text.

Based on this repository, we can implement various paired-image-text Contrastive Learning tasks on Google Colab, which enable you to train effective pre-training models for transfer learning with insufficient data volume. With this pre-trained model, you can train with less labeled data to get a good performing model.

Usage

1. Data Preparation

Before starting training, we need to download the training data and make them can be read in pairs.

There are two example of data preparation:

After preparation, there should be a CSV file which contains image path and text file path for each paired-image-text. (Or we can save the text content in the CSV file directly.)

2. Define Configuration

In config.yaml, we need to define the training hyperperemeter, the data path, and the base models. Here is an example:

batch_size: 32
epochs: 1000
eval_every_n_epochs: 5
fine_tune_from: Jan16_02-27-36_edu-GPU-Linux
log_every_n_steps: 2
learning_rate: 1e-4
weight_decay: 1e-6
fp16_precision: True
truncation: True

model:
  out_dim: 512
  res_base_model: "resnet50"
  bert_base_model: 'emilyalsentzer/Bio_ClinicalBERT'
  freeze_layers: [0,1,2,3,4,5]
  do_lower_case: False
  
dataset:
  s: 1
  input_shape: (224,224,3)
  num_workers: 4
  valid_size: 0.1
  csv_file: 'path/for/CSV_containing_paths_for_images_and_text.csv' 
  text_from_files: True # If 'True' the text input will be read from .txt files, if 'False' it will be loaded direct from the CSV File 
  img_root_dir: '/your/root/images/directory'
  text_root_dir: '/your/root/text/directory' # The root directory for the text files if "text_from_files" is True
  img_path_col: 0 # index for the image path column in the CSV dataframe.
  text_col: 1 # index for the text column in the CSV dataframe. If text_from_files is 'True' it should contain the relative path for the files from the 'text_root_dir', if text_from_files is 'False' this column should contain the respective input text in its own cells.

loss:
  temperature: 0.1
  use_cosine_similarity: True
  alpha_weight: 0.75

The models used [res_base_model, bert_base_model] refers to the models provided by transformers.

3. Training

For training in the Colab, please open Setup.ipynb, then follow the introduction inside.

After run the code python run.py in the notebook, you can open another notebook tensorboard.ipynb to monitor the training process.

4. After Training

At the end of training, the final model and the corresponding config.yaml will be saved to ./runs/. Please use this model for transfer learning.

Others

The repository is a Colab implementation of the architecture descibed in the ConVIRT paper: Contrastive Learning of Medical Visual Representations from Paired Images and Text. The authors of paper are Yuhao Zhang, Hang Jiang, Yasuhide Miura, Christopher D. Manning, Curtis P. Langlotz.

This repository was originally modified from https://github.com/sthalles/SimCLR.

References: