Skip to content

Latest commit

 

History

History
60 lines (47 loc) · 2.66 KB

dataset_formats.md

File metadata and controls

60 lines (47 loc) · 2.66 KB

Image Caption Dataset Formats

Many of the invoke-training training methods require datasets consisting of image-caption pairs. This document explains the supported dataset formats, and how they can be prepared.

invoke-training uses Hugging Face Datasets to handle datasets. Image caption datasets can either be downloaded from the Hugging Face Hub, or loaded from a directory using the ImageFolder format. More details are included in the following sections.

Hugging Face Hub Datasets

The easiest way to get started with invoke-training is to use a publicly available dataset on Hugging Face Hub. You can filter for the Text-to-Image task to find relevant datasets that contain both an image column and a caption column. lambdalabs/pokemon-blip-captions is a popular choice if you're not sure where to start.

Once you've selected a dataset from the Hugging Face Hub, you can use it in training by setting the following parameters in your training config:

  • dataset_name
  • image_column
  • caption_column

Here's a sample configuration:

dataset:
  dataset_name: lambdalabs/pokemon-blip-captions
  image_column: image
  caption_column: text

# ...

See data_config.py for full documentation of the ImageCaptionDataLoaderConfig.

ImageFolder Datasets

If you want to create custom datasets, then you will most likely want to use the ImageFolder dataset format.

An ImageFolder dataset should have the following directory structure:

my_custom_dataset/
├── metadata.jsonl
└── train/
    ├── 0001.png
    ├── 0002.png
    ├── 0003.png
    └── ...

The contents of metadata.jsonl should be:

{"file_name": "train/0001.png", "text": "This is a caption describing image 0001."}
{"file_name": "train/0002.png", "text": "This is a caption describing image 0002."}
{"file_name": "train/0003.png", "text": "This is a caption describing image 0003."}

To use a custom ImageFolder dataset, set the following parameters in your training config:

  • dataset_dir
  • caption_column

Here's a sample configuration:

dataset:
  dataset_dir: /data/my_custom_dataset
  caption_column: text

# ...

See data_config.py for full documentation of the ImageCaptionDataLoaderConfig.