diff --git a/multi_categorical_gans/datasets/README.md b/multi_categorical_gans/datasets/README.md
index 36d5de5..609ce52 100644
--- a/multi_categorical_gans/datasets/README.md
+++ b/multi_categorical_gans/datasets/README.md
@@ -1,5 +1,34 @@
 # Datasets
+
 In this package you will find scripts to process or generate the datasets from the paper:
 
 - [Synthetic data generation](synthetic/)
 - [US Census 1990](uscensus/)
+
+## Loading and saving
+
+We work either with dense or sparse numpy arrays. The module  `multi_categorical_gans.datasets.formats` presents some
+functions to operate with both data formats in an abstract way.
+
+## Train and test split
+
+Example of how to split a dataset into 90% train and 10% test:
+
+```bash
+python multi_categorical_gans/datasets/train_test_split.py \
+    data/uscensus/USCensus1990.features.npz \
+    --percent 90 \
+    data/uscensus/USCensus1990-train.features.npz \
+    data/uscensus/USCensus1990-test.features.npz
+```
+
+For more information about the split run:
+
+```bash
+python multi_categorical_gans/datasets/train_test_split.py -h
+```
+
+## The dataset wrapper
+
+The class `multi_categorical_gans.datasets.dataset.Dataset` can wrap a dense numpy array to provide simple operations
+for training, like `split(proportion)` (useful for validation) or `batch_iterator(batch_size, shuffle=True)`.
\ No newline at end of file
diff --git a/multi_categorical_gans/datasets/synthetic/README.md b/multi_categorical_gans/datasets/synthetic/README.md
index d2a943b..8f6bb07 100644
--- a/multi_categorical_gans/datasets/synthetic/README.md
+++ b/multi_categorical_gans/datasets/synthetic/README.md
@@ -43,7 +43,7 @@ To generate a dataset similar to the one called `FIXED 2` in the paper:
 python multi_categorical_gans/datasets/synthetic/generate.py 10000 9 \
     data/synthetic/fixed_2/metadata.json \
     data/synthetic/fixed_2/synthetic.features.npz \
-    -min_variable_size=2 --max_variable_size=2
+    --min_variable_size=2 --max_variable_size=2
 ```
 
 To generate a dataset similar to the one called `FIXED 10` in the paper: