diff --git a/README.md b/README.md
index 099e5185..e2698296 100644
--- a/README.md
+++ b/README.md
@@ -16,7 +16,7 @@ request.
 	* Jupyter images for different versions of TensorFlow
 	* [TFServing](https://github.com/kubeflow/kubeflow/blob/master/user_guide.md#serve-a-model-using-tensorflow-serving) Docker images and K8s templates
 - [kubernetes](kubernetes) - Templates for running distributed TensorFlow on
-  Kubernetes.
+  Kubernetes. For the most upto-date examples, please also refer to the [distribution strategy](distribution_strategy) folder.
 - [marathon](marathon) - Templates for running distributed TensorFlow using
   Marathon, deployed on top of Mesos.
 - [hadoop](hadoop) - TFRecord file InputFormat/OutputFormat for Hadoop MapReduce
diff --git a/distribution_strategy/multi_worker_mirrored_strategy/README.md b/distribution_strategy/multi_worker_mirrored_strategy/README.md
new file mode 100644
index 00000000..1c68d154
--- /dev/null
+++ b/distribution_strategy/multi_worker_mirrored_strategy/README.md
@@ -0,0 +1,227 @@
+
+# MultiWorkerMirrored Training Strategy with examples
+
+The steps below are meant to train models using [MultiWorkerMirrored Strategy](https://www.tensorflow.org/api_docs/python/tf/distribute/experimental/MultiWorkerMirroredStrategy) using the tensorflow 2.x API on the Kubernetes platform.
+
+Reference programs such as [keras_mnist.py](examples/keras_mnist.py) and
+[custom_training_mnist.py](examples/custom_training_mnist.py) and [keras_resnet_cifar.py](examples/keras_resnet_cifar.py) are available in the examples directory.
+
+The Kubernetes manifest templates and other cluster specific configuration is available in the [kubernetes](kubernetes) directory
+
+## Prerequisites
+
+1. (Optional) It is recommended that you have a Google Cloud project. Either create a new project or use an existing one. Install
+    [gcloud commandline tools](https://cloud.google.com/functions/docs/quickstart)
+    on your system, login, set project and zone, etc.
+
+2. [Jinja templates](http://jinja.pocoo.org/) must be installed.
+
+3. A Kubernetes cluster running Kubernetes 1.15 or above must be available. To create a test
+cluster on the local machine, [follow steps here](https://kubernetes.io/docs/tutorials/kubernetes-basics/create-cluster/). Kubernetes clusters can also be created on all major cloud providers. For instance,
+here are instructions to [create GKE clusters](https://cloud.google.com/kubernetes-engine/docs/how-to/creating-a-regional-cluster). Make sure that you have atleast 12 G of RAM between all nodes in the clusters. This should also install the `kubectl` tool on your system
+
+4. Set context for `kubectl` so that `kubectl` knows which cluster to use:
+
+    ```bash
+    kubectl config use-context <cluster_name>
+    ```
+
+5. Install [Docker](https://docs.docker.com/get-docker/) for your system, while also creating an account that you can associate with your container images.
+
+6. For the mnist examples, for model storage and checkpointing, a [persistent-volume-claim](https://kubernetes.io/docs/concepts/storage/persistent-volumes/) needs to be available to mount onto the chief worker pod. The steps below include the yaml to create a persistent-volume-claim for GKE backed by GCEPersistentDisk.
+
+### Additional prerequisites for resnet56 example
+
+1. Create a
+    [service account](https://cloud.google.com/compute/docs/access/service-accounts) 
+    and download its key file in JSON format. Assign Storage Admin role for 
+    [Google Cloud Storage](https://cloud.google.com/storage/) to this service account:
+
+    ```bash
+    gcloud iam service-accounts create <service_account_id> --display-name="<display_name>"
+    ```
+
+    ```bash
+    gcloud projects add-iam-policy-binding <project-id> \
+    --member="serviceAccount:<service_account_id>@<project_id>.iam.gserviceaccount.com" \
+    --role="roles/storage.admin"
+    ```
+2. Create a Kubernetes secret from the JSON key file of your service account:
+
+    ```bash
+    kubectl create secret generic credential --from-file=key.json=<path_to_json_file>
+    ```
+
+3. For GPU based training, ensure your kubernetes cluster has a node-pool with gpu enabled. 
+   The steps to achieve this on GKE are available [here](https://cloud.google.com/kubernetes-engine/docs/how-to/gpus)
+
+## Steps to train mnist examples
+
+1. Follow the instructions for building and pushing the Docker image to a docker registry
+  in the [Docker README](examples/README.md).
+
+2. Copy the template file `MultiWorkerMirroredTemplate.yaml.jinja`:
+
+  ```sh
+     cp kubernetes/MultiWorkerMirroredTemplate.yaml.jinja myjob.template.jinja
+  ```
+
+3. Edit the `myjob.template.jinja` file to edit job parameters.
+   1. `script` - which training program needs to be run. This should be either
+      `keras_mnist.py` or `custom_training_mnist.py` or `your_own_training_example.py`
+
+   2. `name` - the prefix attached to all the Kubernetes jobs created
+
+   3. `worker_replicas` - number of parallel worker processes that train the example
+
+   4. `port` - the port used by tensorflow worker processes to communicate with each other
+
+   5. `checkpoint_pvc_name` - name of the persistent-volume-claim that will contain the checkpointed model.
+
+   6. `model_checkpoint_dir` - mount location for inspecting the trained model in the volume inspector pod. Meant to be set if Volume inspector pod is mounted.
+
+   7. `image` - name of the docker image created in step 2 that needs to be loaded onto the cluster
+
+   8. `deploy` - set to True when the manifest is actually expected to be deployed
+
+   9. `create_pvc_checkpoint` - Creates a ReadWriteOnce persistent volume claim to checkpoint the model if needed. The name of the claim `checkpoint_pvc_name` should also be specified.
+
+   10. `create_volume_inspector` - Create a pod to inspect the contents of the volume after the training job is complete. If this is `True`, `deploy` cannot be `True` since the checkpoint volume can be mounted as read-write by a single node. Inspection cannot happen when training is happenning.
+
+4. Run the job:
+   1. Create a namespace to run your training jobs
+   
+      ```sh
+      kubectl create namespace <namespace>
+      ```
+
+   2. [Optional: If Persistent volume does not already exist on cluster] First set `deploy` to `False`, `create_pvc_checkpoint` to `True` and set the name of `checkpoint_pvc_name` appropriately in the .jinja file. Then run
+
+      ```sh
+      python ../../render_template.py myjob.template.jinja | kubectl apply -n <namespace> -f -
+      ```
+
+      This will create a persistent volume claim where you can checkpoint your image. In GKE, this claim will auto-create a GCE persistent disk resource to back up the claim.
+
+   3. Set `deploy` to `True`, `create_pvc_checkpoint` to `False`, with all parameters specified in step 4 and then run
+
+      ```sh
+      python ../../render_template.py myjob.template.jinja | kubectl apply -n <namespace> -f -
+      ```
+
+      This will create the Kubernetes jobs on the clusters. Each Job has a single service-endpoint and a single pod that runs the training image. You can track the running jobs in the cluster by running
+
+       ```sh
+      kubectl get jobs -n <namespace>
+      kubectl describe jobs -n <namespace>   
+      ```
+
+      In order to inspect the trainining logs that are running in the jobs, run
+
+      ```sh
+      # Shows all the running pods 
+      kubectl get pods -n <namespace>
+      kubectl logs -n <namespace> -p <pod-name>
+      ```
+
+   4. Once the jobs are finished (based on the logs/output of kubectl get jobs),
+      the trained model can be inspected by a volume inspector pod. Set `deploy` to `False`
+      and `create_volume_inspector` to True. Also set `model_checkpoint_dir` to indicate location where trained model will be mounted. Then run
+
+      ```sh
+      python ../../render_template.py myjob.template.jinja | kubectl apply -n <namespace> -f -
+      ```
+
+      This will create the volume inspector pod. Then, access the pod through ssh
+
+      ```sh
+      kubectl get pods -n <namespace>
+      kubectl -n <namspace> exec --stdin --tty <volume-inspector-pod> -- /bin/sh
+      ```
+
+      The contents of the trained model are available for inspection at `model_checkpoint_dir`.
+
+## Steps to train resnet examples
+
+1. Follow the instructions for building and pushing the Docker image using `Dockerfile.gpu`  to a docker registry
+  in the [Docker README](examples/README.md).
+
+2. Copy the template file `EnhancedMultiWorkerMirroredTemplate.yaml.jinja`
+
+  ```sh
+     cp kubernetes/EnhancedMultiWorkerMirroredTemplate.yaml.jinja myjob.template.jinja
+  ```
+3.  Create three buckets for model data, checkpoints and training logs using either GCP web UI or gsutil tool (included with the gcloud tool you have installed above):
+
+    ```bash
+    gsutil mb gs://<bucket_name>
+    ```
+    You will use these bucket names to modify `data_dir`, `log_dir` and `model_dir` in step #4.
+
+
+4. Download CIFAR-10 data and place them in your data_dir bucket. Head to the [ResNet in TensorFlow](https://github.com/tensorflow/models/tree/r1.13.0/official/resnet#cifar-10) directory to obtain CIFAR-10 data. Alternatively, you can use this [direct link](https://www.cs.toronto.edu/~kriz/cifar-10-binary.tar.gz) to download and extract the data yourself as well. 
+
+    ```bash
+    python cifar10_download_and_extract.py
+    ```
+
+    Upload the contents of cifar-10-batches-bin directory to your `data_dir` bucket.
+
+    ```bash
+    gsutil -m cp cifar-10-batches-bin/* gs://<your_data_dir>/
+    ```
+
+5. Edit the `myjob.template.jinja` file to edit job parameters.
+   1. `script` - which training program needs to be run. This should be either
+      `keras_resnet_cifar.py` or `your_own_training_example.py`
+
+   2. `name` - the prefix attached to all the Kubernetes jobs created
+
+   3. `worker_replicas` - number of parallel worker processes that train the example
+
+   4. `port` - the port used by tensorflow worker processes to communicate with each other.
+
+   5. `model_dir` - the GCP bucket path that stores the model checkoints `gs://model_dir/`
+
+   6. `image` - name of the docker image created in step 2 that needs to be loaded onto the cluster
+
+   7. `log_dir` - the GCP bucket path that where the logs are stored `gs://log_dir/`
+
+   8. `data_dir` - the GCP bucket path for the Cifar-10 dataset  `gs://data_dir/`
+
+   9. `gcp_credential_secret` - the name of secret created in the kubernetes cluster that contains the service Account credentials
+   
+   10. `batch_size` - the global batch size used for training
+   
+   11. `num_train_epoch` - the number of training epochs
+
+4. Run the job:
+   1. Create a namespace to run your training jobs
+   
+      ```sh
+      kubectl create namespace <namespace>
+      ```
+
+   2. Deploy the training workloads in the cluster
+
+      ```sh
+      python ../../render_template.py myjob.template.jinja | kubectl apply -n <namespace> -f -
+      ```
+
+      This will create the Kubernetes jobs on the clusters. Each Job has a single service-endpoint and a single pod that runs the training image. You can track the running jobs in the cluster by running
+
+      ```sh
+      kubectl get jobs -n <namespace>
+      kubectl describe jobs -n <namespace>   
+      ```
+
+      By default, this also deploys tensorboard on the cluster. 
+
+      ```sh
+      kubectl get services -n <namespace> | grep tensorboard  
+      ``` 
+
+      Note the external-ip corresponding to the service and the previously configured `port` in the yaml
+      The tensorboard service should be accessible through the web at `http://tensorboard-external-ip:port`
+
+   3. The final model should be available in the GCP bucket corresponding to `model_dir` configured in the yaml
diff --git a/distribution_strategy/multi_worker_mirrored_strategy/examples/Dockerfile b/distribution_strategy/multi_worker_mirrored_strategy/examples/Dockerfile
new file mode 100644
index 00000000..36aa8034
--- /dev/null
+++ b/distribution_strategy/multi_worker_mirrored_strategy/examples/Dockerfile
@@ -0,0 +1,13 @@
+FROM tensorflow/tensorflow:nightly
+
+# Keeps Python from generating .pyc files in the container
+ENV PYTHONDONTWRITEBYTECODE=1
+
+# Turns off buffering for easier container logging
+ENV PYTHONUNBUFFERED=1
+
+WORKDIR /app
+
+COPY . /app/
+
+ENTRYPOINT ["python", "keras_mnist.py"]
diff --git a/distribution_strategy/multi_worker_mirrored_strategy/examples/Dockerfile.gpu b/distribution_strategy/multi_worker_mirrored_strategy/examples/Dockerfile.gpu
new file mode 100644
index 00000000..0ebb5928
--- /dev/null
+++ b/distribution_strategy/multi_worker_mirrored_strategy/examples/Dockerfile.gpu
@@ -0,0 +1,30 @@
+FROM tensorflow/tensorflow:2.3.1-gpu-jupyter
+
+RUN apt-get install -y python3 && \
+    apt install python3-pip
+
+RUN pip3 install absl-py && \
+    pip3 install portpicker
+
+# Install git
+RUN apt-get update && \
+    apt-get install -y git && \
+    apt-get install -y vim
+
+WORKDIR /app
+
+RUN git clone --single-branch --branch benchmark https://github.com/tensorflow/models.git && \
+    mv models tensorflow_models && \
+    git clone https://github.com/tensorflow/model-optimization.git && \
+    mv model-optimization tensorflow_model_optimization
+
+# Keeps Python from generating .pyc files in the container
+ENV PYTHONDONTWRITEBYTECODE=1
+# Turns off buffering for easier container logging
+ENV PYTHONUNBUFFERED=1
+
+COPY . /app/
+
+ENV PYTHONPATH "${PYTHONPATH}:/:/app/tensorflow_models"
+
+CMD ["python", "resnet_cifar_multiworker_strategy_keras.py"]
\ No newline at end of file
diff --git a/distribution_strategy/multi_worker_mirrored_strategy/examples/README.md b/distribution_strategy/multi_worker_mirrored_strategy/examples/README.md
new file mode 100644
index 00000000..4b5f5682
--- /dev/null
+++ b/distribution_strategy/multi_worker_mirrored_strategy/examples/README.md
@@ -0,0 +1,62 @@
+# TensorFlow Docker Images
+
+This directory contains examples of MultiWorkerMirrored Training along with the docker file to build them
+
+- [Dockerfile](Dockerfile) contains all dependenices required to build a container image using docker with the training examples
+- [Dockerfile.gpu](Dockerfile.gpu) contains all dependenices required to build a container image using docker with gpu and the tensorflow model garden
+- [keras_mnist.py](mnist.py) demonstrates how to train an MNIST classifier using
+  [tf.distribute.MultiWorkerMirroredStrategy and Keras Tensorflow 2.0 API](https://www.tensorflow.org/tutorials/distribute/multi_worker_with_keras).
+- [custom_training_mnist.py](mnist.py) demonstrates how to train a fashion MNIST classifier using
+  [tf.distribute.MultiWorkerMirroredStrategy and Tensorflow 2.0 Custom Training Loop APIs](https://www.tensorflow.org/tutorials/distribute/custom_training).
+- [keras_resnet_cifar.py](keras_resnet_cifar.py) demonstrates how to train the resnet56 model on the Cifar-10 dataset using
+  [tf.distribute.MultiWorkerMirroredStrategy and Keras Tensorflow 2.0 API](https://www.tensorflow.org/tutorials/distribute/multi_worker_with_keras).
+## Best Practices
+
+- Always pin the TensorFlow version with the Docker image tag. This ensures that
+  TensorFlow updates don't adversely impact your training program for future
+  runs.
+- When creating an image, specify version tags (see below). If you make code
+  changes, increment the version. Cluster managers will not pull an updated
+  Docker image if they have them cached. Also, versions ensure that you have
+  a single copy of the code running for each job.
+
+## Building the Docker Files
+
+Ensure that docker is installed on your system.
+
+First, pick an image name for the job. When running on a cluster manager, you
+will want to push your images to a container registry. Note that both the
+[Google Container Registry](https://cloud.google.com/container-registry/)
+and the [Amazon EC2 Container Registry](https://aws.amazon.com/ecr/) require
+special paths. We append `:v1` to version our images. Versioning images is
+strongly recommended for reasons described in the best practices section.
+
+```sh
+docker build -t <image_name>:v1 -f Dockerfile .
+# Use gcloud docker push instead if on Google Container Registry.
+docker push <image_name>:v1
+```
+
+If you make any updates to the code, increment the version and rerun the above
+commands with the new version.
+
+## Running the keras_mnist.py example
+
+The [keras_mnist.py](keras_mnist.py) example demonstrates how to train an MNIST classifier using
+[tf.distribute.MultiWorkerMirroredStrategy and Keras Tensorflow 2.0 API](https://www.tensorflow.org/tutorials/distribute/multi_worker_with_keras).
+The final model is saved to disk by the chief worker process. The disk is assumed to be mounted onto the running container by the cluster manager.
+It assumes that the cluster configuration is passed in through the `TF_CONFIG` environment variable when deployed in the cluster
+
+## Running the custom_training_mnist.py example
+
+The [custom_training_mnist.py](mnist.py) example demonstrates how to train a fashion MNIST classifier using
+[tf.distribute.MultiWorkerMirroredStrategy and Tensorflow 2.0 Custom Training Loop APIs](https://www.tensorflow.org/tutorials/distribute/custom_training).
+The final model is saved to disk by the chief worker process. The disk is assumed to be mounted onto the running container by the cluster manager.
+It assumes that the cluster configuration is passed in through the `TF_CONFIG` environment variable when deployed in the cluster.
+
+## Running the keras_resnet_cifar.py example
+
+The [keras_resnet_cifar.py](keras_resnet_cifar.py) example demonstrates how to train a Resnet56 model on the cifar-10 dataset using
+[tf.distribute.MultiWorkerMirroredStrategy and Keras Tensorflow 2.0 API](https://www.tensorflow.org/tutorials/distribute/multi_worker_with_keras).
+The final model is saved to the GCP storage bucket.
+It assumes that the cluster configuration is passed in through the `TF_CONFIG` environment variable when deployed in the cluster.
diff --git a/distribution_strategy/multi_worker_mirrored_strategy/examples/custom_training_mnist.py b/distribution_strategy/multi_worker_mirrored_strategy/examples/custom_training_mnist.py
new file mode 100644
index 00000000..7a0c2e05
--- /dev/null
+++ b/distribution_strategy/multi_worker_mirrored_strategy/examples/custom_training_mnist.py
@@ -0,0 +1,168 @@
+# ==============================================================================
+# Copyright 2021 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+
+# This code serves as an example of using Tensorflow 2.x to build and train a CNN model on the
+# Fashion MNIST dataset using the tf.distribute.MultiWorkerMirroredStrategy described here 
+# https://www.tensorflow.org/api_docs/python/tf/distribute/experimental/MultiWorkerMirroredStrategy 
+# using a custom training loop. This code is very similar to the example provided here
+# https://www.tensorflow.org/tutorials/distribute/custom_training
+# Assumptions: 
+#   1) The code assumes that the cluster configuration needed for the TF distribute strategy is available through the 
+#   TF_CONFIG environment variable. See the link provided above for details
+#   2) The model is checkpointed and saved in /pvcmnt by the chief worker process.
+
+import tensorflow as tf
+import numpy as np
+import os
+
+# Used to run example using CPU only. Untested on GPU
+os.environ["CUDA_VISIBLE_DEVICES"] = "-1"
+MAIN_MODEL_PATH = '/pvcmnt'
+
+EPOCHS = 10
+GLOBAL_BATCH_SIZE = 128
+
+def _is_chief(task_type, task_id):
+  # If `task_type` is None, this may be operating as single worker, which works
+  # effectively as chief.
+  return task_type is None or task_type == 'chief' or (
+            task_type == 'worker' and task_id == 0)
+
+def _get_temp_dir(task_id):
+  base_dirpath = 'workertemp_' + str(task_id)
+  temp_dir = os.path.join("/tmp", base_dirpath)
+  os.makedirs(temp_dir)
+  return temp_dir
+
+def write_filepath(strategy):
+  task_type, task_id = strategy.cluster_resolver.task_type, strategy.cluster_resolver.task_id
+  if not _is_chief(task_type, task_id):
+    checkpoint_dir = _get_temp_dir(task_id)
+  else:
+    base_dirpath = 'workertemp_' + str(task_id)
+    checkpoint_dir = os.path.join(MAIN_MODEL_PATH, base_dirpath)
+    if not os.path.exists(checkpoint_dir):
+      os.makedirs(checkpoint_dir)
+  return checkpoint_dir
+
+def create_model():
+  model = tf.keras.Sequential([
+      tf.keras.layers.Conv2D(32, 3, activation='relu'),
+      tf.keras.layers.MaxPooling2D(),
+      tf.keras.layers.Conv2D(64, 3, activation='relu'),
+      tf.keras.layers.MaxPooling2D(),
+      tf.keras.layers.Flatten(),
+      tf.keras.layers.Dense(64, activation='relu'),
+      tf.keras.layers.Dense(10)
+    ])
+  return model
+
+def get_dist_data_set(strategy, batch_size):
+    fashion_mnist = tf.keras.datasets.fashion_mnist
+    (train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()
+    # Adding a dimension to the array -> new shape == (28, 28, 1)
+    # We are doing this because the first layer in our model is a convolutional
+    # layer and it requires a 4D input (batch_size, height, width, channels).
+    # batch_size dimension will be added later on.
+    train_images = train_images[..., None]
+    test_images = test_images[..., None]
+    # Getting the images in [0, 1] range.
+    train_images = train_images / np.float32(255)
+    test_images = test_images / np.float32(255)
+    train_dataset = tf.data.Dataset.from_tensor_slices((train_images, train_labels)).shuffle(60000).batch(batch_size) 
+    test_dataset = tf.data.Dataset.from_tensor_slices((test_images, test_labels)).batch(batch_size)
+    train_dist_dataset = strategy.experimental_distribute_dataset(train_dataset)
+    test_dist_dataset = strategy.experimental_distribute_dataset(test_dataset)
+    return train_dist_dataset, test_dist_dataset
+
+def main():
+    global GLOBAL_BATCH_SIZE
+    strategy = tf.distribute.MultiWorkerMirroredStrategy()
+    train_dist_dataset, test_dist_dataset = get_dist_data_set(strategy, GLOBAL_BATCH_SIZE)
+    checkpoint_pfx = write_filepath(strategy)
+    with strategy.scope():
+        model = create_model()
+        optimizer = tf.keras.optimizers.Adam()
+        checkpoint = tf.train.Checkpoint(optimizer=optimizer, model=model)
+        loss_object = tf.keras.losses.SparseCategoricalCrossentropy(
+                from_logits=True,
+                reduction=tf.keras.losses.Reduction.NONE)
+        test_loss = tf.keras.metrics.Mean(name='test_loss')
+        train_accuracy = tf.keras.metrics.SparseCategoricalAccuracy(
+            name='train_accuracy')
+        test_accuracy = tf.keras.metrics.SparseCategoricalAccuracy(
+            name='test_accuracy')
+
+    def compute_loss(labels, predictions):
+        per_example_loss = loss_object(labels, predictions)
+        return tf.nn.compute_average_loss(per_example_loss, global_batch_size=GLOBAL_BATCH_SIZE)
+    
+    def test_step(inputs):
+        images, labels = inputs
+        predictions = model(images, training=False)
+        t_loss = loss_object(labels, predictions)
+        test_loss.update_state(t_loss)
+        test_accuracy.update_state(labels, predictions)
+    
+    def train_step(inputs):
+        images, labels = inputs
+        with tf.GradientTape() as tape:
+            predictions = model(images, training=True)
+            loss = compute_loss(labels, predictions)
+        gradients = tape.gradient(loss, model.trainable_variables)
+        optimizer.apply_gradients(zip(gradients, model.trainable_variables))
+        train_accuracy.update_state(labels, predictions)
+        return loss
+
+    # `run` replicates the provided computation and runs it
+    # with the distributed input.
+    @tf.function
+    def distributed_train_step(dataset_inputs):
+        per_replica_losses = strategy.run(train_step, args=(dataset_inputs,))
+        return strategy.reduce(tf.distribute.ReduceOp.SUM, per_replica_losses,
+                            axis=None)
+
+    @tf.function
+    def distributed_test_step(dataset_inputs):
+        return strategy.run(test_step, args=(dataset_inputs,))
+
+    for epoch in range(EPOCHS):
+        # TRAIN LOOP
+        total_loss = 0.0
+        num_batches = 0
+        for x in train_dist_dataset:
+            total_loss += distributed_train_step(x)
+            num_batches += 1
+        train_loss = total_loss / num_batches
+
+        # TEST LOOP
+        for x in test_dist_dataset:
+            distributed_test_step(x)
+        if epoch % 2 == 0:
+            checkpoint.save(checkpoint_pfx)
+
+        template = ("Epoch {}, Loss: {}, Accuracy: {}, Test Loss: {}, "
+                    "Test Accuracy: {}")
+        print (template.format(epoch+1, train_loss,
+                                train_accuracy.result()*100, test_loss.result(),
+                                test_accuracy.result()*100))
+
+        test_loss.reset_states()
+        train_accuracy.reset_states()
+        test_accuracy.reset_states()
+
+if __name__=="__main__":
+    main()
diff --git a/distribution_strategy/multi_worker_mirrored_strategy/examples/keras_mnist.py b/distribution_strategy/multi_worker_mirrored_strategy/examples/keras_mnist.py
new file mode 100644
index 00000000..41882c74
--- /dev/null
+++ b/distribution_strategy/multi_worker_mirrored_strategy/examples/keras_mnist.py
@@ -0,0 +1,110 @@
+# ==============================================================================
+# Copyright 2021 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+
+# This code serves as an example of using Tensorflow 2.x Keras API to build and train a CNN model on the 
+# MNIST dataset using the tf.distribute.MultiWorkerMirroredStrategy described here 
+# https://www.tensorflow.org/api_docs/python/tf/distribute/experimental/MultiWorkerMirroredStrategy.
+# This code is very similar to the example provided here
+# https://www.tensorflow.org/tutorials/distribute/multi_worker_with_keras
+# Assumptions: 
+#   1) The code assumes that the cluster configuration needed for the TF distribute strategy is available through the 
+#   TF_CONFIG environment variable. See the link provided above for details
+#   2) The model is checkpointed and saved in /pvcmnt by the chief worker process.
+
+
+from __future__ import print_function
+
+import math
+import os
+import tensorflow as tf
+import numpy as np
+import json
+
+# Used to run example using CPU only. Untested on GPU
+os.environ["CUDA_VISIBLE_DEVICES"] = "-1"
+
+# Model save directory
+MAIN_MODEL_PATH = '/pvcmnt'
+
+GLOBAL_BATCH_SIZE = 128
+
+def _is_chief(task_type, task_id):
+  # If `task_type` is None, this may be operating as single worker, which works
+  # effectively as chief.
+  return task_type is None or task_type == 'chief' or (
+            task_type == 'worker' and task_id == 0)
+
+def _get_temp_dir(task_id):
+  base_dirpath = 'workertemp_' + str(task_id)
+  temp_dir = os.path.join("/tmp", base_dirpath)
+  os.makedirs(temp_dir)
+  return temp_dir
+
+def write_filepath(strategy):
+  task_type, task_id = strategy.cluster_resolver.task_type, strategy.cluster_resolver.task_id
+  if not _is_chief(task_type, task_id):
+    checkpoint_dir = _get_temp_dir(task_id)
+  else:
+    base_dirpath = 'workertemp_' + str(task_id)
+    checkpoint_dir = os.path.join(MAIN_MODEL_PATH, base_dirpath)
+    if not os.path.exists(checkpoint_dir):
+      os.makedirs(checkpoint_dir)
+  return checkpoint_dir
+
+def mnist_dataset(batch_size):
+  (x_train, y_train), _ = tf.keras.datasets.mnist.load_data()
+  # The `x` arrays are in uint8 and have values in the range [0, 255].
+  # You need to convert them to float32 with values in the range [0, 1]
+  x_train = x_train / np.float32(255)
+  y_train = y_train.astype(np.int64)
+  train_dataset = tf.data.Dataset.from_tensor_slices(
+      (x_train, y_train)).shuffle(60000).repeat().batch(batch_size)
+  return train_dataset
+
+def build_and_compile_cnn_model():
+  model = tf.keras.Sequential([
+      tf.keras.Input(shape=(28, 28)),
+      tf.keras.layers.Reshape(target_shape=(28, 28, 1)),
+      tf.keras.layers.Conv2D(32, 3, activation='relu'),
+      tf.keras.layers.Flatten(),
+      tf.keras.layers.Dense(128, activation='relu'),
+      tf.keras.layers.Dense(10)
+  ])
+  model.compile(
+      loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
+      optimizer=tf.keras.optimizers.SGD(learning_rate=0.001),
+      metrics=['accuracy'])
+  return model
+
+def main():
+  tf_config = json.loads(os.environ['TF_CONFIG'])
+  num_workers = len(tf_config['cluster']['worker'])
+  strategy = tf.distribute.MultiWorkerMirroredStrategy()
+  
+  multi_worker_dataset = mnist_dataset(GLOBAL_BATCH_SIZE)
+  
+  # missing needs to be fixed
+  # multi_worker_dataset = strategy.distribute_datasets_from_function(mnist_dataset(global_batch_size))  
+  
+  callbacks = [tf.keras.callbacks.experimental.BackupAndRestore(backup_dir=write_filepath(strategy))]
+  with strategy.scope():
+      multi_worker_model = build_and_compile_cnn_model()
+  multi_worker_model.fit(multi_worker_dataset, epochs=10, steps_per_epoch=70,
+                          callbacks=callbacks)
+  multi_worker_model.save(filepath=write_filepath(strategy))
+
+if __name__=="__main__":
+  main()
diff --git a/distribution_strategy/multi_worker_mirrored_strategy/examples/keras_resnet_cifar.py b/distribution_strategy/multi_worker_mirrored_strategy/examples/keras_resnet_cifar.py
new file mode 100644
index 00000000..ab0f0318
--- /dev/null
+++ b/distribution_strategy/multi_worker_mirrored_strategy/examples/keras_resnet_cifar.py
@@ -0,0 +1,373 @@
+# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Runs a ResNet model on the Cifar-10 dataset."""
+
+# This code serves as an example of using Tensorflow 2.0 Keras API to build and train a Resnet50 model on
+# the Cifar 10 dataset using the tf.distribute.MultiWorkerMirroredStrategy described here
+# https://www.tensorflow.org/api_docs/python/tf/distribute/experimental/MultiWorkerMirroredStrategy.
+# This code is largely borrowed from
+# https://github.com/tensorflow/models/blob/benchmark/official/benchmark/models/resnet_cifar_model.py
+# with some minor tweaks to allow for training using GPU
+# Assumptions:
+#   1) The code assumes that the cluster configuration needed for the TF distribute strategy is available through the
+#   TF_CONFIG environment variable. See the link provided above for details
+#   2) The libraries required to test this model are packaged into ./Dockerfile.gpu. Please refer to it
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+# Import libraries
+from absl import app
+from absl import flags
+from absl import logging
+import numpy as np
+import tensorflow as tf
+from tensorflow_models.official.benchmark.models import cifar_preprocessing
+from tensorflow_models.official.benchmark.models import resnet_cifar_model
+from tensorflow_models.official.benchmark.models import synthetic_util
+from tensorflow_models.official.common import distribute_utils
+from tensorflow_models.official.utils.flags import core as flags_core
+#from tensorflow_models.official.utils.misc import keras_utils
+from tensorflow_models.official.vision.image_classification.resnet import common
+import multiprocessing
+import os
+
+MAIN_MODEL_PATH = '/pvcmnt'
+
+# remove: duplicate function from keras_utils
+def set_session_config(enable_xla=False):
+  """Sets the session config."""
+  if enable_xla:
+    tf.config.optimizer.set_jit(True)
+
+# remove: duplicate function from keras_utils
+def set_gpu_thread_mode_and_count(gpu_thread_mode, datasets_num_private_threads,
+                                  num_gpus, per_gpu_thread_count):
+  """Set GPU thread mode and count, and adjust dataset threads count."""
+  cpu_count = multiprocessing.cpu_count()
+  logging.info('Logical CPU cores: %s', cpu_count)
+
+  # Allocate private thread pool for each GPU to schedule and launch kernels
+  per_gpu_thread_count = per_gpu_thread_count or 2
+  os.environ['TF_GPU_THREAD_MODE'] = gpu_thread_mode
+  os.environ['TF_GPU_THREAD_COUNT'] = str(per_gpu_thread_count)
+  logging.info('TF_GPU_THREAD_COUNT: %s', os.environ['TF_GPU_THREAD_COUNT'])
+  logging.info('TF_GPU_THREAD_MODE: %s', os.environ['TF_GPU_THREAD_MODE'])
+
+  # Limit data preprocessing threadpool to CPU cores minus number of total GPU
+  # private threads and memory copy threads.
+  total_gpu_thread_count = per_gpu_thread_count * num_gpus
+  num_runtime_threads = num_gpus
+  if not datasets_num_private_threads:
+    datasets_num_private_threads = min(
+        cpu_count - total_gpu_thread_count - num_runtime_threads, num_gpus * 8)
+    logging.info('Set datasets_num_private_threads to %s',
+                 datasets_num_private_threads)    
+
+def _is_chief(task_type, task_id):
+  # If `task_type` is None, this may be operating as single worker, which works
+  # effectively as chief.
+  return task_type is None or task_type == 'chief' or (
+            task_type == 'worker' and task_id == 0)
+
+def _get_temp_dir(task_id):
+  base_dirpath = 'workertemp_' + str(task_id)
+  temp_dir = os.path.join("/tmp", base_dirpath)
+  os.makedirs(temp_dir)
+  return temp_dir
+
+def write_filepath(strategy):
+  task_type, task_id = strategy.cluster_resolver.task_type, strategy.cluster_resolver.task_id
+  if not _is_chief(task_type, task_id):
+    checkpoint_dir = _get_temp_dir(task_id)
+  else:
+    base_dirpath = 'workertemp_' + str(task_id)
+    checkpoint_dir = os.path.join(MAIN_MODEL_PATH, base_dirpath)
+    if not os.path.exists(checkpoint_dir):
+      os.makedirs(checkpoint_dir)
+  return checkpoint_dir
+
+
+
+LR_SCHEDULE = [  # (multiplier, epoch to start) tuples
+    (0.1, 91), (0.01, 136), (0.001, 182)
+]
+
+
+def learning_rate_schedule(current_epoch,
+                           current_batch,
+                           batches_per_epoch,
+                           batch_size):
+  """Handles linear scaling rule and LR decay.
+  Scale learning rate at epoch boundaries provided in LR_SCHEDULE by the
+  provided scaling factor.
+  Args:
+    current_epoch: integer, current epoch indexed from 0.
+    current_batch: integer, current batch in the current epoch, indexed from 0.
+    batches_per_epoch: integer, number of steps in an epoch.
+    batch_size: integer, total batch sized.
+  Returns:
+    Adjusted learning rate.
+  """
+  del current_batch, batches_per_epoch  # not used
+  initial_learning_rate = common.BASE_LEARNING_RATE * batch_size / 128
+  learning_rate = initial_learning_rate
+  for mult, start_epoch in LR_SCHEDULE:
+    if current_epoch >= start_epoch:
+      learning_rate = initial_learning_rate * mult
+    else:
+      break
+  return learning_rate
+
+
+class LearningRateBatchScheduler(tf.keras.callbacks.Callback):
+  """Callback to update learning rate on every batch (not epoch boundaries).
+  N.B. Only support Keras optimizers, not TF optimizers.
+  Attributes:
+      schedule: a function that takes an epoch index and a batch index as input
+          (both integer, indexed from 0) and returns a new learning rate as
+          output (float).
+  """
+
+  def __init__(self, schedule, batch_size, steps_per_epoch):
+    super(LearningRateBatchScheduler, self).__init__()
+    self.schedule = schedule
+    self.steps_per_epoch = steps_per_epoch
+    self.batch_size = batch_size
+    self.epochs = -1
+    self.prev_lr = -1
+
+  def on_epoch_begin(self, epoch, logs=None):
+    if not hasattr(self.model.optimizer, 'learning_rate'):
+      raise ValueError('Optimizer must have a "learning_rate" attribute.')
+    self.epochs += 1
+
+  def on_batch_begin(self, batch, logs=None):
+    """Executes before step begins."""
+    lr = self.schedule(self.epochs,
+                       batch,
+                       self.steps_per_epoch,
+                       self.batch_size)
+    if not isinstance(lr, (float, np.float32, np.float64)):
+      raise ValueError('The output of the "schedule" function should be float.')
+    if lr != self.prev_lr:
+      self.model.optimizer.learning_rate = lr  # lr should be a float here
+      self.prev_lr = lr
+      logging.debug(
+          'Epoch %05d Batch %05d: LearningRateBatchScheduler '
+          'change learning rate to %s.', self.epochs, batch, lr)
+
+
+def run(flags_obj):
+  """Run ResNet Cifar-10 training and eval loop using native Keras APIs.
+  Args:
+    flags_obj: An object containing parsed flag values.
+  Raises:
+    ValueError: If fp16 is passed as it is not currently supported.
+  Returns:
+    Dictionary of training and eval stats.
+  """
+  #keras_utils.set_session_config(
+  #    enable_xla=flags_obj.enable_xla)
+  set_session_config(enable_xla=True)
+  
+  # Execute flag override logic for better model performance
+  """
+  if flags_obj.tf_gpu_thread_mode:
+    keras_utils.set_gpu_thread_mode_and_count(
+        per_gpu_thread_count=flags_obj.per_gpu_thread_count,
+        gpu_thread_mode=flags_obj.tf_gpu_thread_mode,
+        num_gpus=flags_obj.num_gpus,
+        datasets_num_private_threads=flags_obj.datasets_num_private_threads)
+  """
+  if flags_obj.tf_gpu_thread_mode:
+    set_gpu_thread_mode_and_count(
+        per_gpu_thread_count=flags_obj.per_gpu_thread_count,
+        gpu_thread_mode=flags_obj.tf_gpu_thread_mode,
+        num_gpus=flags_obj.num_gpus,
+        datasets_num_private_threads=flags_obj.datasets_num_private_threads)    
+
+  common.set_cudnn_batchnorm_mode()
+
+  dtype = flags_core.get_tf_dtype(flags_obj)
+  if dtype == 'fp16':
+    raise ValueError('dtype fp16 is not supported in Keras. Use the default '
+                     'value(fp32).')
+
+  data_format = flags_obj.data_format
+  if data_format is None:
+    data_format = ('channels_first' if tf.config.list_physical_devices('GPU')
+                   else 'channels_last')
+  tf.keras.backend.set_image_data_format(data_format)
+
+  """
+  strategy = distribute_utils.get_distribution_strategy(
+      distribution_strategy=flags_obj.distribution_strategy,
+      num_gpus=flags_obj.num_gpus,
+      all_reduce_alg=flags_obj.all_reduce_alg,
+      num_packs=flags_obj.num_packs)
+  """
+  strategy = tf.distribute.experimental.MultiWorkerMirroredStrategy()  
+
+  if strategy:
+    # flags_obj.enable_get_next_as_optional controls whether enabling
+    # get_next_as_optional behavior in DistributedIterator. If true, last
+    # partial batch can be supported.
+    strategy.extended.experimental_enable_get_next_as_optional = (
+        flags_obj.enable_get_next_as_optional
+    )
+
+  strategy_scope = distribute_utils.get_strategy_scope(strategy)
+
+  if flags_obj.use_synthetic_data:
+    synthetic_util.set_up_synthetic_data()
+    input_fn = common.get_synth_input_fn(
+        height=cifar_preprocessing.HEIGHT,
+        width=cifar_preprocessing.WIDTH,
+        num_channels=cifar_preprocessing.NUM_CHANNELS,
+        num_classes=cifar_preprocessing.NUM_CLASSES,
+        dtype=flags_core.get_tf_dtype(flags_obj),
+        drop_remainder=True)
+  else:
+    synthetic_util.undo_set_up_synthetic_data()
+    input_fn = cifar_preprocessing.input_fn
+
+  train_input_dataset = input_fn(
+      is_training=True,
+      data_dir=flags_obj.data_dir,
+      batch_size=flags_obj.batch_size,
+      parse_record_fn=cifar_preprocessing.parse_record,
+      datasets_num_private_threads=flags_obj.datasets_num_private_threads,
+      dtype=dtype,
+      # Setting drop_remainder to avoid the partial batch logic in normalization
+      # layer, which triggers tf.where and leads to extra memory copy of input
+      # sizes between host and GPU.
+      drop_remainder=(not flags_obj.enable_get_next_as_optional))
+
+  eval_input_dataset = None
+  if not flags_obj.skip_eval:
+    eval_input_dataset = input_fn(
+        is_training=False,
+        data_dir=flags_obj.data_dir,
+        batch_size=flags_obj.batch_size,
+        parse_record_fn=cifar_preprocessing.parse_record)
+
+  steps_per_epoch = (
+      cifar_preprocessing.NUM_IMAGES['train'] // flags_obj.batch_size)
+  lr_schedule = 0.1
+  if flags_obj.use_tensor_lr:
+    initial_learning_rate = common.BASE_LEARNING_RATE * flags_obj.batch_size / 128
+    lr_schedule = tf.keras.optimizers.schedules.PiecewiseConstantDecay(
+        boundaries=list(p[1] * steps_per_epoch for p in LR_SCHEDULE),
+        values=[initial_learning_rate] +
+        list(p[0] * initial_learning_rate for p in LR_SCHEDULE))
+
+  with strategy_scope:
+    optimizer = common.get_optimizer(lr_schedule)
+    model = resnet_cifar_model.resnet56(classes=cifar_preprocessing.NUM_CLASSES)
+    model.compile(
+        loss='sparse_categorical_crossentropy',
+        optimizer=optimizer,
+        metrics=(['sparse_categorical_accuracy']
+                 if flags_obj.report_accuracy_metrics else None),
+        run_eagerly=flags_obj.run_eagerly)
+
+  train_epochs = flags_obj.train_epochs
+
+  callbacks = common.get_callbacks()
+
+  if not flags_obj.use_tensor_lr:
+    lr_callback = LearningRateBatchScheduler(
+        schedule=learning_rate_schedule,
+        batch_size=flags_obj.batch_size,
+        steps_per_epoch=steps_per_epoch)
+    callbacks.append(lr_callback)
+    
+  tensorboard_callback = tf.keras.callbacks.TensorBoard(
+    log_dir="gs://shankgan-tf-exp-train-log-dir/")
+  callbacks.append(tensorboard_callback)
+
+  # if mutliple epochs, ignore the train_steps flag.
+  if train_epochs <= 1 and flags_obj.train_steps:
+    steps_per_epoch = min(flags_obj.train_steps, steps_per_epoch)
+    train_epochs = 1
+
+  num_eval_steps = (cifar_preprocessing.NUM_IMAGES['validation'] //
+                    flags_obj.batch_size)
+
+  validation_data = eval_input_dataset
+  if flags_obj.skip_eval:
+    if flags_obj.set_learning_phase_to_train:
+      # TODO(haoyuzhang): Understand slowdown of setting learning phase when
+      # not using distribution strategy.
+      tf.keras.backend.set_learning_phase(1)
+    num_eval_steps = None
+    validation_data = None
+
+  if not strategy and flags_obj.explicit_gpu_placement:
+    # TODO(b/135607227): Add device scope automatically in Keras training loop
+    # when not using distribition strategy.
+    no_dist_strat_device = tf.device('/device:GPU:0')
+    no_dist_strat_device.__enter__()
+
+  logging.info("Beginning to fit the model.....")
+  history = model.fit(train_input_dataset,
+                      epochs=train_epochs,
+                      steps_per_epoch=steps_per_epoch,
+                      callbacks=callbacks,
+                      validation_steps=num_eval_steps,
+                      validation_data=validation_data,
+                      validation_freq=flags_obj.epochs_between_evals,
+                      verbose=2)
+  eval_output = None
+  if not flags_obj.skip_eval:
+    eval_output = model.evaluate(eval_input_dataset,
+                                 steps=num_eval_steps,
+                                 verbose=2)
+
+  if not strategy and flags_obj.explicit_gpu_placement:
+    no_dist_strat_device.__exit__()
+
+  stats = common.build_stats(history, eval_output, callbacks)
+  return stats
+
+
+def define_cifar_flags():
+
+  common.define_keras_flags()
+  data_dir = os.getenv("DATA_DIR")
+  model_dir = os.getenv("MODEL_DIR")
+  batch_size = int(os.getenv("BATCH_SIZE", default=512))
+  num_train_epoch = int(os.getenv("NUM_TRAIN_EPOCH", default=100))
+
+  if not data_dir or not model_dir:
+    raise Exception("Data directory and Model Directory need to be specified!")
+
+  flags_core.set_defaults(data_dir=data_dir,
+                          model_dir=model_dir,
+                          train_epochs=num_train_epoch,
+                          epochs_between_evals=20,
+                          batch_size=batch_size,
+                          use_synthetic_data=False) # Changed the batch size
+
+def main(_):
+  return run(flags.FLAGS)
+
+
+if __name__ == '__main__':
+  logging.set_verbosity(logging.INFO)
+  define_cifar_flags()
+  app.run(main)
\ No newline at end of file
diff --git a/distribution_strategy/multi_worker_mirrored_strategy/kubernetes/EnhancedMultiWorkerMirroredTemplate.j2 b/distribution_strategy/multi_worker_mirrored_strategy/kubernetes/EnhancedMultiWorkerMirroredTemplate.j2
new file mode 100644
index 00000000..8ea5e5ab
--- /dev/null
+++ b/distribution_strategy/multi_worker_mirrored_strategy/kubernetes/EnhancedMultiWorkerMirroredTemplate.j2
@@ -0,0 +1,142 @@
+{%- set name = "" -%}
+{%- set image = "" -%}
+{%- set worker_replicas = 2 -%}
+{%- set script = "" -%}
+{%- set gcp_credential_secret = "" %}
+{%- set log_dir = "" %}
+{%- set data_dir = "" %}
+{%- set model_dir = "" %}
+{%- set batch_size = 256 %}
+{%- set num_train_epoch = 100 %}
+{%- set port = 5000 -%}
+{%- set run_tensorboard = true %}
+
+
+{%- macro worker_hosts() -%}
+  {%- for i in range(worker_replicas) -%}
+    {%- if not loop.first -%},{%- endif -%}
+    "{{ name }}-worker-{{ i }}:{{ port }}"
+  {%- endfor -%}
+{%- endmacro -%}
+
+{%- for i in range(worker_replicas) -%}
+kind: Service
+apiVersion: v1
+metadata:
+  name: {{ name }}-worker-{{ i }}
+spec:
+  selector:
+    name: {{ name }}
+    job: worker
+    task: "{{ i }}"
+  ports:
+  - port: {{ port }}
+---
+kind: Job
+apiVersion: batch/v1
+metadata:
+  name: {{ name }}-worker-{{ i }}
+spec:
+  ttlSecondsAfterFinished: 600
+  template:
+    metadata:
+      labels:
+        name: {{ name }}
+        job: worker
+        task: "{{ i }}"
+    spec:
+      restartPolicy: Never
+      containers:
+      - name: tensorflow
+        image: {{ image }}
+        ports:
+        - containerPort: {{ port }}
+        command:
+        - "python"
+        - "{{ script }}"
+        env:
+        - name: TF_CONFIG
+          value: '{"cluster": {"worker": [{{ worker_hosts() }}]}, "task": {"type": "worker", "index": {{ i }}}}'
+        - name: GOOGLE_APPLICATION_CREDENTIALS
+          value: "/var/secrets/google/key.json"
+        - name: DATA_DIR
+          value: "{{ data_dir }}"
+        - name: MODEL_DIR
+          value: "{{ model_dir }}"
+        - name: NUM_TRAIN_EPOCH
+          value: "{{ num_train_epoch }}"
+        - name: BATCH_SIZE
+          value: "{{ batch_size }}"
+        ports:
+        - containerPort: {{ port }}
+        resources:
+          limits: 
+           nvidia.com/gpu: 1
+        volumeMounts:
+        - name: credential
+          mountPath: /var/secrets/google
+      volumes:
+      - name: credential
+        secret:
+          secretName: {{ gcp_credential_secret }}          
+---
+{% endfor %}
+
+{% if run_tensorboard %}
+kind: Service
+apiVersion: v1
+metadata:
+  name: resnet-tensorboard-0
+spec:
+  type: LoadBalancer
+  selector:
+    name: resnet
+    job: tensorboard
+    task: "0"
+  ports:
+  - port: {{ port }}
+---
+kind: Deployment
+apiVersion: apps/v1
+metadata:
+  name: resnet-tensorboard-0
+spec:
+  replicas: 1
+  selector:
+    matchLabels:
+      name: resnet
+      job: tensorboard
+      task: "0"
+  template:
+    metadata:
+      labels:
+        name: resnet
+        job: tensorboard
+        task: "0"
+    spec:
+      containers:
+      - name: tensorflow
+        image: tensorflow/tensorflow
+        env:
+        - name: GOOGLE_APPLICATION_CREDENTIALS
+          value: "/var/secrets/google/key.json"
+        ports:
+        - containerPort: {{ port }}
+        command:
+        - "tensorboard"
+        args:
+        - '--logdir= {{ log_dir }}'
+        - "--port={{ port }}"
+        - "--host=0.0.0.0"
+        volumeMounts:
+        - name: credential
+          mountPath: /var/secrets/google
+      volumes:
+      - name: credential
+        secret:
+          secretName: {{ gcp_credential_secret }}
+---
+{% endif %}
+
+
+
diff --git a/distribution_strategy/multi_worker_mirrored_strategy/kubernetes/MultiWorkerMirroredTemplate.jinja b/distribution_strategy/multi_worker_mirrored_strategy/kubernetes/MultiWorkerMirroredTemplate.jinja
new file mode 100644
index 00000000..e4a7799c
--- /dev/null
+++ b/distribution_strategy/multi_worker_mirrored_strategy/kubernetes/MultiWorkerMirroredTemplate.jinja
@@ -0,0 +1,111 @@
+{%- set name = "tf-learning" -%}
+{%- set image = "image-name" -%}
+{%- set worker_replicas = 2 -%}
+{%- set script = "keras_mnist.py" -%}
+{%- set model_checkpoint_dir = "/pvcmnt" -%}
+{%- set checkpoint_pvc_name = "pvc-demo" -%}
+{%- set port = 5000 -%}
+{%- set create_pvc_checkpoint = True -%}
+{%- set create_volume_inspector = True -%}
+{%- set deploy = False -%}
+
+
+{% if deploy %}
+
+{%- macro worker_hosts() -%}
+  {%- for i in range(worker_replicas) -%}
+    {%- if not loop.first -%},{%- endif -%}
+    "{{ name }}-worker-{{ i }}:{{ port }}"
+  {%- endfor -%}
+{%- endmacro -%}
+
+{%- for i in range(worker_replicas) -%}
+kind: Service
+apiVersion: v1
+metadata:
+  name: {{ name }}-worker-{{ i }}
+spec:
+  selector:
+    name: {{ name }}
+    job: worker
+    task: "{{ i }}"
+  ports:
+  - port: {{ port }}
+---
+kind: Job
+apiVersion: batch/v1
+metadata:
+  name: {{ name }}-worker-{{ i }}
+spec:
+  ttlSecondsAfterFinished: 600
+  template:
+    metadata:
+      labels:
+        name: {{ name }}
+        job: worker
+        task: "{{ i }}"
+    spec:
+      restartPolicy: Never
+      containers:
+      - name: tensorflow
+        image: {{ image }}
+        ports:
+        - containerPort: {{ port }}
+        command:
+        - "python"
+        - "{{ script }}"
+        env:
+        - name: TF_CONFIG
+          value: '{"cluster": {"worker": [{{ worker_hosts() }}]}, "task": {"type": "worker", "index": {{ i }}}}'
+{% if i == 0 %}
+        volumeMounts:
+        - mountPath: /pvcmnt
+          name: pvc-mount
+      volumes:
+      - name: pvc-mount
+        persistentVolumeClaim:
+          claimName: {{ checkpoint_pvc_name }}  
+{% endif %}---
+{% endfor %}
+
+{% endif %}
+{% if create_pvc_checkpoint %}
+apiVersion: v1
+kind: PersistentVolumeClaim
+metadata:
+  name: {{ checkpoint_pvc_name }}
+spec:
+  accessModes:
+    - ReadWriteOnce
+  resources:
+    requests:
+      storage: 10G
+---
+{% endif %}
+{% if create_volume_inspector %}
+kind: Pod
+apiVersion: v1
+metadata:
+  name: volume-inspector
+spec:
+  volumes:
+    - name: volume-to-inspect
+      persistentVolumeClaim:
+       claimName: {{ checkpoint_pvc_name }}
+  containers:
+    - name: debugger
+      image: busybox
+      command: ['sleep', '3600']
+      volumeMounts:
+        - mountPath: {{ model_checkpoint_dir }}
+          name: volume-to-inspect
+      resources:
+        limits:
+          memory: 512Mi
+---
+{% endif %}
+
+
+
+
+
diff --git a/kubernetes/README.md b/kubernetes/README.md
index 7c5af8d5..1aefd277 100644
--- a/kubernetes/README.md
+++ b/kubernetes/README.md
@@ -1,9 +1,11 @@
 # Running Distributed TensorFlow on Kubernetes
 
 This directory contains a template for running distributed TensorFlow on
-Kubernetes.
+Kubernetes. For newer examples, refer to the [distribution strategy](../distribution_strategy)
 
-## Prerequisites
+## Steps to train [mnist.py](../docker/mnist.py)
+
+### Prerequisites
 
 1. You must be running Kubernetes 1.3 or above. If you are running an earlier
    version, the DNS addon must be enabled. See the
@@ -12,7 +14,7 @@ Kubernetes.
 
 2. [Jinja templates](http://jinja.pocoo.org/) must be installed.
 
-## Steps to Run the job
+### Steps to Run the job
 
 1. Follow the instructions for creating the training program in the parent
    [README](../README.md).
@@ -43,7 +45,7 @@ write to Google Cloud Storage. See the Google Cloud Storage section below.
   python render_template.py myjob.template.jinja | kubectl delete -f -
   ```
 
-## Google Cloud Storage
+### Google Cloud Storage
 
 To support reading and writing to Google Cloud Storage, you need to set up
 a [Kubernetes secret](http://kubernetes.io/docs/user-guide/secrets/) with the
@@ -62,4 +64,4 @@ credentials.
 
 3. In your template, set `credential_secret_name` to `"credential"` (as
    specified above) and `credential_secret_key` to the `"[json_filename]"` in
-   the template.
+   the template.
\ No newline at end of file