Datasets

In this package you will find scripts to process or generate the datasets from the paper:

Synthetic data generation
US Census 1990

Loading and saving

We work either with dense or sparse numpy arrays. The module multi_categorical_gans.datasets.formats presents some functions to operate with both data formats in an abstract way.

Train and test split

Examples of how to split a dataset into 90% train and 10% test:

python multi_categorical_gans/datasets/train_test_split.py \
    data/synthetic/fixed_2/synthetic.features.npz \
    0.9 \
    data/synthetic/fixed_2/synthetic-train.features.npz \
    data/synthetic/fixed_2/synthetic-test.features.npz

python multi_categorical_gans/datasets/train_test_split.py \
    data/synthetic/fixed_10/synthetic.features.npz \
    0.9 \
    data/synthetic/fixed_10/synthetic-train.features.npz \
    data/synthetic/fixed_10/synthetic-test.features.npz

python multi_categorical_gans/datasets/train_test_split.py \
    data/synthetic/mix_small/synthetic.features.npz \
    0.9 \
    data/synthetic/mix_small/synthetic-train.features.npz \
    data/synthetic/mix_small/synthetic-test.features.npz

python multi_categorical_gans/datasets/train_test_split.py \
    data/synthetic/mix_big/synthetic.features.npz \
    0.9 \
    data/synthetic/mix_big/synthetic-train.features.npz \
    data/synthetic/mix_big/synthetic-test.features.npz

python multi_categorical_gans/datasets/train_test_split.py \
    data/uscensus/USCensus1990.features.npz \
    0.9 \
    data/uscensus/USCensus1990-train.features.npz \
    data/uscensus/USCensus1990-test.features.npz

For more information about the split run:

python multi_categorical_gans/datasets/train_test_split.py -h

The dataset wrapper

The class multi_categorical_gans.datasets.dataset.Dataset can wrap a dense numpy array to provide simple operations for training, like split(proportion) (useful for validation) or batch_iterator(batch_size, shuffle=True).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Datasets

Loading and saving

Train and test split

The dataset wrapper

Files

README.md

Latest commit

History

README.md

File metadata and controls

Datasets

Loading and saving

Train and test split

The dataset wrapper