Skip to content

Commit

Permalink
Update datasets README.
Browse files Browse the repository at this point in the history
  • Loading branch information
rcamino committed Jul 9, 2018
1 parent 0e6c87b commit 9f5e566
Show file tree
Hide file tree
Showing 2 changed files with 30 additions and 1 deletion.
29 changes: 29 additions & 0 deletions multi_categorical_gans/datasets/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,34 @@
# Datasets

In this package you will find scripts to process or generate the datasets from the paper:

- [Synthetic data generation](synthetic/)
- [US Census 1990](uscensus/)

## Loading and saving

We work either with dense or sparse numpy arrays. The module `multi_categorical_gans.datasets.formats` presents some
functions to operate with both data formats in an abstract way.

## Train and test split

Example of how to split a dataset into 90% train and 10% test:

```bash
python multi_categorical_gans/datasets/train_test_split.py \
data/uscensus/USCensus1990.features.npz \
--percent 90 \
data/uscensus/USCensus1990-train.features.npz \
data/uscensus/USCensus1990-test.features.npz
```

For more information about the split run:

```bash
python multi_categorical_gans/datasets/train_test_split.py -h
```

## The dataset wrapper

The class `multi_categorical_gans.datasets.dataset.Dataset` can wrap a dense numpy array to provide simple operations
for training, like `split(proportion)` (useful for validation) or `batch_iterator(batch_size, shuffle=True)`.
2 changes: 1 addition & 1 deletion multi_categorical_gans/datasets/synthetic/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ To generate a dataset similar to the one called `FIXED 2` in the paper:
python multi_categorical_gans/datasets/synthetic/generate.py 10000 9 \
data/synthetic/fixed_2/metadata.json \
data/synthetic/fixed_2/synthetic.features.npz \
-min_variable_size=2 --max_variable_size=2
--min_variable_size=2 --max_variable_size=2
```

To generate a dataset similar to the one called `FIXED 10` in the paper:
Expand Down

0 comments on commit 9f5e566

Please sign in to comment.