HowToRunScenic

This repo is to help the Google Cloud Users to run one awesome JAX-based computer vision research repo, i.e. Scenic

Prepare the code

We need some small modifications to make Scenic more friendly to the Google Cloud Users. The Scenic repo always uses the default dataset dir. However, for Google Cloud Users or other GPU users, the dataset location may be different. For instance, the Google Cloud Users usually store their dataset on Google Cloud Storage Budget. Therefore, we should edit the dataset loading function in Scenic. We use MNIST and ImageNet as examples here.

Before all, please fork the Scenic repo. And the you can edit your forked scenic repo.

Prepare MNIST code

Frist, edit the get_dataset() function in dataset_lib/mnist_dataset.py

remove this line of code (sometimes we do not need this since Scenic only have this line of code for a part of XXX_dataset.py):

del dataset_configs

pass dataset_configs.data_dir to dataset_builder:

  train_ds, train_ds_info = dataset_utils.load_split_from_tfds(
      'mnist',
      batch_size,
      split='train',
      data_dir=dataset_configs.data_dir, # Added by us.
      preprocess_example=preprocess_ex,
      shuffle_seed=shuffle_seed
)

do similar thing for eval_ds:

  eval_ds, _ = dataset_utils.load_split_from_tfds(
      'mnist', eval_batch_size,
      split='test', 
      data_dir=dataset_configs.data_dir,
      preprocess_example=preprocess_ex)
)

add data_dir in config file, we assume we want to use scenic/projects/baselines/configs/mnist/mnist_config.py for later training:

  # Dataset.
  config.dataset_name = 'mnist'
  config.dataset_configs = ml_collections.ConfigDict()
  config.dataset_configs.data_dir = 'YOUR_DATA_DIR'  # Added by us.
  config.data_dtype_str = 'float32'

The code for MNIST is ready now.

Prepare ImageNet code

Similarly, we edit the get_dataset() function in dataset_lib/imagenet_dataset.py to pass the data_dir into dataset_builder.

pass dataset_configs.data_dir to imagenet_load_split for train_ds:

  train_ds = imagenet_load_split(
      batch_size,
      train=True,
      onehot_labels=onehot_labels,
      dtype=dtype,
      shuffle_seed=shuffle_seed,
      data_augmentations=data_augmentations,
      data_dir=dataset_configs.data_dir  # added by us.
      )

pass dataset_configs.data_dir to imagenet_load_split for eval_ds:

  eval_ds = imagenet_load_split(eval_batch_size, train=False,
                                onehot_labels=onehot_labels,
                                dtype=dtype,
                                data_dir=dataset_configs.data_dir  # added by us
                                )
      )

edit imagenet_load_split function in dataset_lib/imagenet_dataset.py to support data_dir arg:

def imagenet_load_split(batch_size,
                        train,
                        onehot_labels,
                        dtype=tf.float32,
                        image_size=IMAGE_SIZE,
                        prefetch_buffer_size=10,
                        shuffle_seed=None,
                        data_augmentations=None,
                        data_dir=None # added by us.
                        ):

edit the code inside of imagenet_load_split func:

# replace dataset_builder = tfds.builder('imagenet2012:5.*.*') by:
dataset_builder = tfds.builder('imagenet2012:5.*.*', data_dir=data_dir)

edit the config file, i.e. projects/baselines/configs/imagenet/imagenet_vit_config.py

  config.dataset_configs = ml_collections.ConfigDict()
  config.dataset_configs.data_dir = 'gs://YOUR_BUDGET_NAME/imagenet' # added by us.

Okay, the code is ready for both MNIST and ImageNet now. If you want to try other different datasets, you can just follow these two examples. The core is to pass data_dir to tfds.builder manually.

Setup the environment.

Note that all the commands in this document should be run in the commandline of the TPU VM instance unless otherwise stated.

Please make sure you set your gcloud configs first:

Create a GCP project.
Install gcloud.

Associate your Google Account (Gmail account) with your GCP project by running:

export GCP_PROJECT=<GCP PROJECT ID>
gcloud auth login
gcloud auth application-default login
gcloud config set project $GCP_PROJECT

Create a staging bucket if you do not already have one. We use europe-west4-a as an example:

export GOOGLE_CLOUD_BUCKET_NAME=<GOOGLE_CLOUD_BUCKET_NAME>
export ZONE=europe-west4-a
gsutil mb -l $ZONE gs://$GOOGLE_CLOUD_BUCKET_NAME

Then, setup the TPU VM:

Create a Cloud TPU VM instance following this instruction. We recommend that you develop your workflow in a single v3-8 TPU (i.e., --accelerator-type=v3-8) and scale up to pod slices once the pipeline is ready. In this README, we focus on using a single v3-8 TPU. See here to learn more about TPU architectures.
With Cloud TPU VMs, you ssh directly into the host machine of the TPU VM. You can install packages, run your code run, etc. in the host machine. Once the TPU instance is created, ssh into it with
```
export TPU_NAME=v3-8
gcloud alpha compute tpus tpu-vm ssh ${TPU_NAME} --zone=${ZONE}
```
where TPU_NAME and ZONE are the name and the zone used above.

Run your code on the TPU VM (for MNIST):

Install the dependencies, one great way is installing t5x first. Most environment used in Scenic would be covered by that.

git clone --branch=main https://github.com/google-research/t5x
cd t5x

python3 -m pip install -e '.[tpu]' -f \
  https://storage.googleapis.com/jax-releases/libtpu_releases.html

Install Scenic and Run

git clone --branch=main https://github.com/YOUR_GITHUB_NAME/scenic.git
cd scenic
pip install .

export WORK_DIR=gs://${GOOGLE_CLOUD_BUCKET_NAME}/scenic/mnist
python3 scenic/main.py \
  --config=scenic/projects/baselines/configs/mnist/mnist_config.py \
  --workdir=$WORK_DIR

Run on multi-host TPU VMs (for ImageNet):

Sometimes, we may conduct larger scale experiments with more TPU chips (e.g. v3-128) for larger datasets like ImageNet. In this case, we can run the code in this way:

Combine the code in Section Run your code on the TPU VM into a single file run_scenic.sh. We provide an example in this repo.

Run

export TPU_NAME=v3-128
gcloud compute tpus tpu-vm scp run_scenic.sh $TPU_NAME: --worker=all --zone=$ZONE
gcloud alpha compute tpus tpu-vm ssh $TPU_NAME --zone=$ZONE --worker=all --command "bash run_scenic.sh"

Last

Thank you for your interest on this repo. If you found any issue when running Scenic on your own cluster, especially on Google Cloud Platform, please feel free to post an issue on this repo (preferred) or ping Fuzhao via email. I would be happy to help.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

HowToRunScenic

Prepare the code

Prepare MNIST code

Prepare ImageNet code

Setup the environment.

Run your code on the TPU VM (for MNIST):

Run on multi-host TPU VMs (for ImageNet):

Last

Files

README.md

Latest commit

History

README.md

File metadata and controls

HowToRunScenic

Prepare the code

Prepare MNIST code

Prepare ImageNet code

Setup the environment.

Run your code on the TPU VM (for MNIST):

Run on multi-host TPU VMs (for ImageNet):

Last