[2.5] Fix TF examples (NVIDIA#3038)

* Fix TF examples * Fix link * Update some texts * Undo changes
YuanTingHsieh · Nov 27, 2024 · bad662d · bad662d
1 parent da6f808
commit bad662d
Show file tree

Hide file tree

Showing 5 changed files with 23 additions and 72 deletions.
diff --git a/examples/advanced/job_api/tf/README.md b/examples/advanced/job_api/tf/README.md
@@ -7,9 +7,8 @@ All examples in this folder are based on using [TensorFlow](https://tensorflow.o
 
 ## Simulated Federated Learning with CIFAR10 Using Tensorflow
 
-This example shows `Tensorflow`-based classic Federated Learning
-algorithms, namely FedAvg and FedOpt on CIFAR10
-dataset. This example is analogous to [the example using `Pytorch`
+This example demonstrates TensorFlow-based federated learning algorithms on the CIFAR-10 dataset.
+This example is analogous to [the example using `Pytorch`
 backend](https://github.com/NVIDIA/NVFlare/tree/main/examples/advanced/cifar10/cifar10-sim)
 on the same dataset, where same experiments
 were conducted and analyzed. You should expect the same
@@ -21,7 +20,7 @@ client-side training logics (details in file
 and the new
 [`FedJob`](https://github.com/NVIDIA/NVFlare/blob/main/nvflare/job_config/api.py)
 APIs were used to programmatically set up an
-`nvflare` job to be exported or ran by simulator (details in file
+NVFlare job to be exported or ran by simulator (details in file
 [`tf_fl_script_runner_cifar10.py`](tf_fl_script_runner_cifar10.py)),
 alleviating the need of writing job config files, simplifying
 development process.
@@ -65,12 +64,8 @@ script.
 > `export TF_FORCE_GPU_ALLOW_GROWTH=true && export
 > TF_GPU_ALLOCATOR=cuda_malloc_asyncp`
 
-The set-up of all experiments in this example are kept the same as
-[the example using `Pytorch`
-backend](https://github.com/NVIDIA/NVFlare/tree/main/examples/advanced/cifar10/cifar10-sim). Refer
-to the `Pytorch` example for more details. Similar to the Pytorch
-example, we here also use Dirichelet sampling on CIFAR10 data labels
-to simulate data heterogeneity among data splits for different client
+We use Dirichelet sampling (implementation from FedMA (https://github.com/IBM/FedMA)) on
+CIFAR10 data labels to simulate data heterogeneity among data splits for different client
 sites, controlled by an alpha value, ranging from 0 (not including 0)
 to 1. A high alpha value indicates less data heterogeneity, i.e., an
 alpha value equal to 1.0 would result in homogeneous data distribution

diff --git a/examples/advanced/job_api/tf/run_jobs.sh b/examples/advanced/job_api/tf/run_jobs.sh
@@ -25,7 +25,7 @@ GPU_INDX=0
 WORKSPACE=/tmp
 
 # Run centralized training job
-python ./tf_fl_script_executor_cifar10.py \
+python ./tf_fl_script_runner_cifar10.py \
        --algo centralized \
        --n_clients 1 \
        --num_rounds 25 \
@@ -39,7 +39,7 @@ python ./tf_fl_script_executor_cifar10.py \
 # Run FedAvg with different alpha values
 for alpha in 1.0 0.5 0.3 0.1; do
 
-    python ./tf_fl_script_executor_cifar10.py \
+    python ./tf_fl_script_runner_cifar10.py \
        --algo fedavg \
        --n_clients 8 \
        --num_rounds 50 \
@@ -53,7 +53,7 @@ done
 
 
 # Run FedOpt job
-python ./tf_fl_script_executor_cifar10.py \
+python ./tf_fl_script_runner_cifar10.py \
        --algo fedopt \
        --n_clients 8 \
        --num_rounds 50 \
@@ -65,7 +65,7 @@ python ./tf_fl_script_executor_cifar10.py \
 
 
 # Run FedProx job.
-python ./tf_fl_script_executor_cifar10.py \
+python ./tf_fl_script_runner_cifar10.py \
        --algo fedprox \
        --n_clients 8 \
        --num_rounds 50 \
@@ -77,11 +77,11 @@ python ./tf_fl_script_executor_cifar10.py \
 
 
 # Run scaffold job
-python ./tf_fl_script_executor_cifar10.py \
+python ./tf_fl_script_runner_cifar10.py \
        --algo scaffold \
        --n_clients 8 \
        --num_rounds 50 \
        --batch_size 64 \
        --epochs 4 \
        --alpha 0.1 \
-       --gpu $GPU_INDX   
+       --gpu $GPU_INDX
diff --git a/examples/getting_started/tf/README.md b/examples/getting_started/tf/README.md
@@ -1,26 +1,21 @@
 # Getting Started with NVFlare (TensorFlow)
 [![TensorFlow Logo](https://upload.wikimedia.org/wikipedia/commons/a/ab/TensorFlow_logo.svg)](https://tensorflow.org/)
 
-We provide several examples to quickly get you started using NVFlare's Job API. 
+We provide several examples to help you quickly get started with NVFlare.
 All examples in this folder are based on using [TensorFlow](https://tensorflow.org/) as the model training framework.
 
 ## Simulated Federated Learning with CIFAR10 Using Tensorflow
 
-This example shows `Tensorflow`-based classic Federated Learning
-algorithms, namely FedAvg and FedOpt on CIFAR10
-dataset. This example is analogous to [the example using `Pytorch`
-backend](https://github.com/NVIDIA/NVFlare/tree/main/examples/advanced/cifar10/cifar10-sim)
-on the same dataset, where same experiments
-were conducted and analyzed. You should expect the same
-experimental results when comparing this example with the `Pytorch` one.
+This example demonstrates TensorFlow-based federated learning algorithms,
+FedAvg and FedOpt, on the CIFAR-10 dataset.
 
 In this example, the latest Client APIs were used to implement
 client-side training logics (details in file
 [`cifar10_tf_fl_alpha_split.py`](src/cifar10_tf_fl_alpha_split.py)),
 and the new
 [`FedJob`](https://github.com/NVIDIA/NVFlare/blob/main/nvflare/job_config/api.py)
 APIs were used to programmatically set up an
-`nvflare` job to be exported or ran by simulator (details in file
+NVFlare job to be exported or ran by simulator (details in file
 [`tf_fl_script_runner_cifar10.py`](tf_fl_script_runner_cifar10.py)),
 alleviating the need of writing job config files, simplifying
 development process.
@@ -64,12 +59,8 @@ script.
 > `export TF_FORCE_GPU_ALLOW_GROWTH=true && export
 > TF_GPU_ALLOCATOR=cuda_malloc_asyncp`
 
-The set-up of all experiments in this example are kept the same as
-[the example using `Pytorch`
-backend](https://github.com/NVIDIA/NVFlare/tree/main/examples/advanced/cifar10/cifar10-sim). Refer
-to the `Pytorch` example for more details. Similar to the Pytorch
-example, we here also use Dirichelet sampling on CIFAR10 data labels
-to simulate data heterogeneity among data splits for different client
+We use Dirichelet sampling (implementation from FedMA (https://github.com/IBM/FedMA)) on
+CIFAR10 data labels to simulate data heterogeneity among data splits for different client
 sites, controlled by an alpha value, ranging from 0 (not including 0)
 to 1. A high alpha value indicates less data heterogeneity, i.e., an
 alpha value equal to 1.0 would result in homogeneous data distribution
@@ -111,11 +102,11 @@ for alpha in 1.0 0.5 0.3 0.1; do
 done
 ```
 
-## 2. Results
+## 3. Results
 
 Now let's compare experimental results.
 
-### 2.1 Centralized training vs. FedAvg for homogeneous split
+### 3.1 Centralized training vs. FedAvg for homogeneous split
 Let's first compare FedAvg with homogeneous data split
 (i.e. `alpha=1.0`) and centralized training. As can be seen from the
 figure and table below, FedAvg can achieve similar performance to
@@ -129,7 +120,7 @@ no difference in data distributions among different clients.
 
 ![Central vs. FedAvg](./figs/fedavg-vs-centralized.png)
 
-### 2.2 Impact of client data heterogeneity
+### 3.2 Impact of client data heterogeneity
 
 Here we compare the impact of data heterogeneity by varying the
 `alpha` value, where lower values cause higher heterogeneity. As can
@@ -145,7 +136,7 @@ as data heterogeneity becomes higher.
 
 ![Impact of client data
 heterogeneity](./figs/fedavg-diff-alphas.png)
- 
+
 > [!NOTE]
 > More examples can be found at https://nvidia.github.io/NVFlare.
 
diff --git a/examples/getting_started/tf/run_jobs.sh b/examples/getting_started/tf/run_jobs.sh
@@ -50,38 +50,3 @@ for alpha in 1.0 0.5 0.3 0.1; do
        --workspace $WORKSPACE
 
 done
-
-
-# Run FedOpt job
-python ./tf_fl_script_runner_cifar10.py \
-       --algo fedopt \
-       --n_clients 8 \
-       --num_rounds 50 \
-       --batch_size 64 \
-       --epochs 4 \
-       --alpha 0.1 \
-       --gpu $GPU_INDX \
-       --workspace $WORKSPACE
-
-
-# Run FedProx job.
-python ./tf_fl_script_runner_cifar10.py \
-       --algo fedprox \
-       --n_clients 8 \
-       --num_rounds 50 \
-       --batch_size 64 \
-       --epochs 4 \
-       --fedprox_mu 1e-5 \
-       --alpha 0.1 \
-       --gpu $GPU_INDX
-
-
-# Run scaffold job
-python ./tf_fl_script_runner_cifar10.py \
-       --algo scaffold \
-       --n_clients 8 \
-       --num_rounds 50 \
-       --batch_size 64 \
-       --epochs 4 \
-       --alpha 0.1 \
-       --gpu $GPU_INDX   
diff --git a/examples/hello-world/hello-tf/README.md b/examples/hello-world/hello-tf/README.md
@@ -4,7 +4,7 @@ Example of using [NVIDIA FLARE](https://nvflare.readthedocs.io/en/main/index.htm
 using federated averaging ([FedAvg](https://arxiv.org/abs/1602.05629))
 and [TensorFlow](https://tensorflow.org/) as the deep learning training framework.
 
-> **_NOTE:_** This example uses the [MNIST](http://yann.lecun.com/exdb/mnist/) handwritten digits dataset and will load its data within the trainer code.
+> **_NOTE:_** This example uses the [MNIST](https://www.tensorflow.org/datasets/catalog/mnist) handwritten digits dataset and will load its data within the trainer code.
 
 See the [Hello TensorFlow](https://nvflare.readthedocs.io/en/main/examples/hello_tf_job_api.html#hello-tf-job-api) example documentation page for details on this
 example.
@@ -48,7 +48,7 @@ In scenarios where multiple clients are involved, you have to prevent TensorFlow
 by setting the following flags.
 
 ```bash
-TF_FORCE_GPU_ALLOW_GROWTH=true TF_GPU_ALLOCATOR=cuda_malloc_async
+TF_FORCE_GPU_ALLOW_GROWTH=true TF_GPU_ALLOCATOR=cuda_malloc_async python3 fedavg_script_runner_tf.py
 ```
 
 If you possess more GPUs than clients, a good strategy is to run one client on each GPU.