Skip to content

Latest commit

 

History

History
66 lines (51 loc) · 2.03 KB

File metadata and controls

66 lines (51 loc) · 2.03 KB

Datasets

In this package you will find scripts to process or generate the datasets from the paper:

Loading and saving

We work either with dense or sparse numpy arrays. The module multi_categorical_gans.datasets.formats presents some functions to operate with both data formats in an abstract way.

Train and test split

Examples of how to split a dataset into 90% train and 10% test:

python multi_categorical_gans/datasets/train_test_split.py \
    data/synthetic/fixed_2/synthetic.features.npz \
    0.9 \
    data/synthetic/fixed_2/synthetic-train.features.npz \
    data/synthetic/fixed_2/synthetic-test.features.npz
python multi_categorical_gans/datasets/train_test_split.py \
    data/synthetic/fixed_10/synthetic.features.npz \
    0.9 \
    data/synthetic/fixed_10/synthetic-train.features.npz \
    data/synthetic/fixed_10/synthetic-test.features.npz
python multi_categorical_gans/datasets/train_test_split.py \
    data/synthetic/mix_small/synthetic.features.npz \
    0.9 \
    data/synthetic/mix_small/synthetic-train.features.npz \
    data/synthetic/mix_small/synthetic-test.features.npz
python multi_categorical_gans/datasets/train_test_split.py \
    data/synthetic/mix_big/synthetic.features.npz \
    0.9 \
    data/synthetic/mix_big/synthetic-train.features.npz \
    data/synthetic/mix_big/synthetic-test.features.npz
python multi_categorical_gans/datasets/train_test_split.py \
    data/uscensus/USCensus1990.features.npz \
    0.9 \
    data/uscensus/USCensus1990-train.features.npz \
    data/uscensus/USCensus1990-test.features.npz

For more information about the split run:

python multi_categorical_gans/datasets/train_test_split.py -h

The dataset wrapper

The class multi_categorical_gans.datasets.dataset.Dataset can wrap a dense numpy array to provide simple operations for training, like split(proportion) (useful for validation) or batch_iterator(batch_size, shuffle=True).