Skip to content

Latest commit

 

History

History
248 lines (184 loc) · 4.69 KB

DATASETS.md

File metadata and controls

248 lines (184 loc) · 4.69 KB

How to Install Datasets

$DATA denotes the location where datasets are installed, e.g.

$DATA/
    office31/
    office_home/
    visda17/
    ...

Domain Adaptation

Domain Generalization

Semi-Supervised Learning

Domain Adaptation

Office-31

Download link: https://people.eecs.berkeley.edu/~jhoffman/domainadapt/#datasets_code.

File structure:

office31/
    amazon/
        back_pack/
        bike/
        ...
    dslr/
        back_pack/
        bike/
        ...
    webcam/
        back_pack/
        bike/
        ...

Office-Home

Download link: http://hemanthdv.org/OfficeHome-Dataset/.

File structure:

office_home/
    art/
    clipart/
    product/
    real_world/

VisDA17

Download link: http://ai.bu.edu/visda-2017/.

The dataset can also be downloaded using our script at datasets/da/visda17.sh. Run the following command in your terminal under Dassl.pytorch/datasets/da,

sh visda17.sh $DATA

Once the download is finished, the file structure will look like

visda17/
    train/
    test/
    validation/

CIFAR10-STL10

Run the following command in your terminal under Dassl.pytorch/datasets/da,

python cifar_stl.py $DATA/cifar_stl

This will create a folder named cifar_stl under $DATA. The file structure will look like

cifar_stl/
    cifar/
        train/
        test/
    stl/
        train/
        test/

Digit-5

Create a folder $DATA/digit5 and download to this folder the dataset from here. This should give you

digit5/
    Digit-Five/

Then, run the following command in your terminal under Dassl.pytorch/datasets/da,

python digit5.py $DATA/digit5

This will extract the data and organize the file structure as

digit5/
    Digit-Five/
    mnist/
    mnist_m/
    usps/
    svhn/
    syn/

DomainNet

Download link: http://ai.bu.edu/M3SDA/. (Please download the cleaned version of split files)

File structure:

domainnet/
    clipart/
    infograph/
    painting/
    quickdraw/
    real/
    sketch/
    splits/
        clipart_train.txt
        clipart_test.txt
        ...

miniDomainNet

You need to download the DomainNet dataset first. The miniDomainNet's split files can be downloaded at this google drive. After the zip file is extracted, you should have the folder $DATA/domainnet/splits_mini/.

Domain Generalization

PACS

Download link: google drive.

File structure:

pacs/
    images/
    splits/

It is ok to not manually download this dataset because once you run tools/train.py, the code will detect if the dataset exists or not and automatically download the dataset to $DATA if missing. This applies to PACS, Office-Home and Digits-DG.

Office-Home-DG

Download link: google drive.

File structure:

office_home_dg/
    art/
    clipart/
    product/
    real_world/

Digits-DG

Download link: google driv.

File structure:

digits_dg/
    mnist/
    mnist_m/
    svhn/
    syn/

Digit-Single

Follow the steps for Digit-5 to organize the dataset.

Semi-Supervised Learning

CIFAR10/100 and SVHN

Run the following command in your terminal under Dassl.pytorch/datasets/ssl,

python cifar10_cifar100_svhn.py $DATA

This will create three folders under $DATA, i.e.

ssl_cifar10/
    train/
    test/
ssl_cifar100/
    train/
    test/
ssl_svhn/
    train/
    test/

STL10

Run the following command in your terminal under Dassl.pytorch/datasets/ssl,

python stl10.py $DATA/stl10

This will create a folder named stl10 under $DATA and extract the data into three folders, i.e. train, test and unlabeled. Then, download from http://ai.stanford.edu/~acoates/stl10/ the "Binary files" and extract it under stl10.

The file structure will look like

stl10/
    train/
    test/
    unlabeled/
    stl10_binary/