[2.6] Update flwr example (#3580)

YuanTingHsieh · gslama12 · chesterxgchen · web-flow · commit 8e837013db4a · 2025-08-05T19:08:23.000-07:00
Update flwr example ### Description Update flwr example and cherry-pick #3495 and #3550 ### Types of changes  - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [ ] New tests added to cover the changes. - [ ] Quick tests passed locally by running `./runtest.sh`. - [ ] In-line docstrings updated. - [ ] Documentation updated. --------- Co-authored-by: Georg Slamanig <148696483+gslama12@users.noreply.github.com> Co-authored-by: Chester Chen <512707+chesterxgchen@users.noreply.github.com>
diff --git a/docs/publications_and_talks.rst b/docs/publications_and_talks.rst
@@ -32,7 +32,7 @@ Publications: 2022
 * **2022-10** `Auto-FedRL: Federated Hyperparameter Optimization for Multi-institutional Medical Image Segmentation <https://arxiv.org/abs/2203.06338>`__ (`ECCV 2022 <https://eccv2022.ecva.net/>`__)
 * **2022-10** `Joint Multi Organ and Tumor Segmentation from Partial Labels Using Federated Learning <https://link.springer.com/chapter/10.1007/978-3-031-18523-6_6>`__ (`DeCaF @ MICCAI 2022 <https://decaf-workshop.github.io/decaf-2022/>`__)
 * **2022-10** `Split-U-Net: Preventing Data Leakage in Split Learning for Collaborative Multi-modal Brain Tumor Segmentation <https://arxiv.org/abs/2208.10553>`__ (`DeCaF @ MICCAI 2022 <https://decaf-workshop.github.io/decaf-2022/>`__)
-* **2022-06** `Closing the Generalization Gap of Cross-silo Federated Medical Image Segmentation <https://openaccess.thecvf.com/content/CVPR2022/papers/Xu_Closing_the_Generalization_Gap_of_Cross-Silo_Federated_Medical_Image_Segmentation_CVPR_2022_paper.pdf>`__ (`CVPR 2022 <https://cvpr2022.thecvf.com/>`__)
+* **2022-06** `Closing the Generalization Gap of Cross-silo Federated Medical Image Segmentation <https://openaccess.thecvf.com/content/CVPR2022/papers/Xu_Closing_the_Generalization_Gap_of_Cross-Silo_Federated_Medical_Image_Segmentation_CVPR_2022_paper.pdf>`__ (CVPR 2022)
 * **2022-02** `Do Gradient Inversion Attacks Make Federated Learning Unsafe? <https://arxiv.org/abs/2202.06924>`__ (Preprint)
 
 Publications: 2021
diff --git a/examples/hello-world/hello-flower/README.md b/examples/hello-world/hello-flower/README.md
@@ -1,12 +1,13 @@
 # Flower App (PyTorch) in NVIDIA FLARE
 
-In this example, we run 2 Flower clients and Flower Server in parallel using NVFlare's simulator.
+In this example, we run 2 clients and 1 server using NVFlare's simulator.
 
 ## Preconditions
 
-To run Flower code in NVFlare, we created a job, including an app with the following custom folder content 
+Following https://github.com/adap/flower/tree/main/examples/quickstart-pytorch we prepare the following flower app: 
+
 ```bash
-$ tree jobs/hello-flwr-pt/app/custom
+$ tree flwr-pt
 
 ├── flwr_pt
 │   ├── client.py   # <-- contains `ClientApp`
@@ -15,38 +16,42 @@ $ tree jobs/hello-flwr-pt/app/custom
 │   └── task.py     # <-- task-specific code (model, data)
 └── pyproject.toml  # <-- Flower project file
 ```
-Note, this code is adapted from Flower's [app-pytorch](https://github.com/adap/flower/tree/main/examples/app-pytorch) example.
+
+To be run inside NVFlare, we need to add the following sections to "pyproject.toml":
+```
+[tool.flwr.app.config]
+num-server-rounds = 3
+
+[tool.flwr.federations]
+default = "local-simulation"
+
+[tool.flwr.federations.local-simulation]
+options.num-supernodes = 2
+address = "127.0.0.1:9093"
+insecure = true
+```
+
+You can adjust the num-server-rounds.
+The number `options.num-supernodes` should match the number of NVFlare clients defined in [job.py](./job.py), e.g., `job.simulator_run(args.workdir, gpu="0", n_clients=2)`.
 
 ## 1. Install dependencies
 If you haven't already, we recommend creating a virtual environment.
 ```bash
 python3 -m venv nvflare_flwr
 source nvflare_flwr/bin/activate
+pip install -r ./requirements.txt
 ```
-We recommend installing an older version of NumPy as torch/torchvision doesn't support NumPy 2 at this time.
-```bash
-pip install numpy==1.26.4
-```
-## 2.1 Run a simulation
 
-To run flwr-pt job with NVFlare, we first need to install its dependencies.
-```bash
-pip install ./flwr-pt/
-```
+## 2.1 Run flwr-pt with NVFlare simulation
 
-Next, we run 2 Flower clients and Flower Server in parallel using NVFlare's simulator.
+We run 2 Flower clients and Flower Server in parallel using NVFlare's simulator.
 ```bash
 python job.py --job_name "flwr-pt" --content_dir "./flwr-pt"
 ```
 
-## 2.2 Run a simulation with TensorBoard streaming
-
-To run flwr-pt_tb_streaming job with NVFlare, we first need to install its dependencies.
-```bash
-pip install ./flwr-pt-tb/
-```
+## 2.2 Run flwr-pt with NVFlare simulation and NVFlare's TensorBoard streaming
 
-Next, we run 2 Flower clients and Flower Server in parallel using NVFlare while streaming 
+We run 2 Flower clients and Flower Server in parallel using NVFlare while streaming 
 the TensorBoard metrics to the server at each iteration using NVFlare's metric streaming.
 
 ```bash
@@ -59,16 +64,19 @@ tensorboard --logdir /tmp/nvflare/hello-flower
 ```
 ![tensorboard training curve](./train.png)
 
-## Notes
-Make sure your `pyproject.toml` files in the Flower apps contain an "address" field. This needs to be present as the `--federation-config` option of the `flwr run` command tries to override the `“address”` field.
-Your `pyproject.toml` should include a section similar to this:
+
+## 3. Run with real deployment
+
+First, check real-world deployment guide: https://nvflare.readthedocs.io/en/main/real_world_fl/overview.html
+
+Second, export the corresponding NVFlare job:
+```bash
+python job.py --job_name "flwr-pt" --content_dir "./flwr-pt" --export_job --export_dir "./jobs"
 ```
-[tool.flwr.federations]
-default = "xxx"
 
-[tool.flwr.federations.xxx]
-options.num-supernodes = 2
-address = "127.0.0.1:9093"
-insecure = false
+An NVFlare job will be generated at "./jobs" folder.
+
+Then you can copy it inside the admin console's transfer folder and then run:
+```bash
+submit_job flwr-pt
 ```
-The number `options.num-supernodes` should match the number of NVFlare clients defined in [job.py](./job.py), e.g., `job.simulator_run(args.workdir, gpu="0", n_clients=2)`.
diff --git a/examples/hello-world/hello-flower/flwr-pt-tb/flwr_pt_tb/client.py b/examples/hello-world/hello-flower/flwr-pt-tb/flwr_pt_tb/client.py
@@ -15,7 +15,7 @@
 
 from flwr.client import ClientApp, NumPyClient
 from flwr.common import Context
-from flwr.common.record import MetricsRecord, RecordSet
+from flwr.common.record import MetricRecord, RecordDict
 
 from .task import DEVICE, Net, get_weights, load_data, set_weights, test, train
 
@@ -36,16 +36,16 @@ def __init__(self, context: Context):
         self.writer = SummaryWriter()
         self.flwr_context = context
 
-        if "step" not in context.state.metrics_records:
+        if "step" not in context.state.metric_records:
             self.set_step(0)
 
     def set_step(self, step: int):
-        record = RecordSet()
-        record["step"] = MetricsRecord({"step": step})
+        record = RecordDict()
+        record["step"] = MetricRecord({"step": step})
         self.flwr_context.state = record
 
     def get_step(self):
-        return int(self.flwr_context.state.metrics_records["step"]["step"])
+        return int(self.flwr_context.state.metric_records["step"]["step"])
 
     def fit(self, parameters, config):
         step = self.get_step()
diff --git a/examples/hello-world/hello-flower/flwr-pt-tb/flwr_pt_tb/server.py b/examples/hello-world/hello-flower/flwr-pt-tb/flwr_pt_tb/server.py
@@ -53,16 +53,16 @@ def weighted_average(metrics: List[Tuple[int, Metrics]]) -> Metrics:
     initial_parameters=parameters,
 )
 
-# Define config
-config = ServerConfig(num_rounds=3)
-
 
 # Flower ServerApp
 def server_fn(context: Context):
-    return ServerAppComponents(
-        strategy=strategy,
-        config=config,
-    )
+    # Read from config
+    num_rounds = context.run_config["num-server-rounds"]
+
+    # Define config
+    config = ServerConfig(num_rounds=num_rounds)
+    return ServerAppComponents(strategy=strategy, config=config)
 
 
+# Create ServerApp
 app = ServerApp(server_fn=server_fn)
diff --git a/examples/hello-world/hello-flower/flwr-pt-tb/pyproject.toml b/examples/hello-world/hello-flower/flwr-pt-tb/pyproject.toml
@@ -2,6 +2,13 @@
 requires = ["hatchling"]
 build-backend = "hatchling.build"
 
+# Tested with:
+#   flwr==1.20.0
+#   nvflare==2.6.1
+#   torch==2.7.1
+#   torchvision==0.22.1
+#   tensorboard==2.20.0
+
 [project]
 name = "flwr_pt_tb"
 version = "1.0.0"
@@ -34,4 +41,4 @@ default = "local-simulation"
 [tool.flwr.federations.local-simulation]
 options.num-supernodes = 2
 address = "127.0.0.1:9093"
-insecure = true
+insecure = true
diff --git a/examples/hello-world/hello-flower/flwr-pt/pyproject.toml b/examples/hello-world/hello-flower/flwr-pt/pyproject.toml
@@ -2,6 +2,13 @@
 requires = ["hatchling"]
 build-backend = "hatchling.build"
 
+# Tested with:
+#   flwr==1.20.0
+#   nvflare==2.6.1
+#   torch==2.7.1
+#   torchvision==0.22.1
+#   tensorboard==2.20.0
+
 [project]
 name = "flwr_pt"
 version = "1.0.0"
@@ -34,4 +41,4 @@ default = "local-simulation"
 [tool.flwr.federations.local-simulation]
 options.num-supernodes = 2
 address = "127.0.0.1:9093"
-insecure = true
+insecure = true
diff --git a/examples/hello-world/hello-flower/job.py b/examples/hello-world/hello-flower/job.py
@@ -26,6 +26,7 @@ def main():
     parser.add_argument("--content_dir", type=str, required=True)
     parser.add_argument("--stream_metrics", action="store_true")
     parser.add_argument("--use_client_api", action="store_true")
+    parser.add_argument("--export_job", action="store_true")
     parser.add_argument("--export_dir", type=str, default="jobs")
     parser.add_argument("--workdir", type=str, default="/tmp/nvflare/hello-flower")
     args = parser.parse_args()
@@ -36,15 +37,20 @@ def main():
         # only external client api works with the current flower integration
         env = {CLIENT_API_TYPE_KEY: ClientAPIType.EX_PROCESS_API.value}
 
+    num_of_clients = 2
+
     job = FlowerPyTorchJob(
         name=args.job_name,
         flower_content=args.content_dir,
         stream_metrics=args.stream_metrics,
+        min_clients=num_of_clients,
         extra_env=env,
     )
 
-    job.export_job(args.export_dir)
-    job.simulator_run(os.path.join(args.workdir, job.name), gpu="0", n_clients=2)
+    if args.export_job:
+        job.export_job(args.export_dir)
+    else:
+        job.simulator_run(os.path.join(args.workdir, job.name), gpu="0", n_clients=num_of_clients)
 
 
 if __name__ == "__main__":
diff --git a/examples/hello-world/hello-flower/requirements.txt b/examples/hello-world/hello-flower/requirements.txt
@@ -0,0 +1,5 @@
+flwr[simulation]>=1.16,<2.0
+nvflare>=2.6.0
+torch
+torchvision
+tensorboard
diff --git a/nvflare/app_opt/flower/flower_job.py b/nvflare/app_opt/flower/flower_job.py
@@ -94,6 +94,8 @@ def __init__(
         self.to_clients(obj=flower_content)
 
         if not stream_metrics:
+            conf = ExternalConfigurator(component_ids=[])
+            self.to_clients(conf, "client_api_config_preparer")
             return
 
         # add required components for metrics streaming
diff --git a/research/fed-sm/README.md b/research/fed-sm/README.md
@@ -3,7 +3,7 @@
 This directory contains the code for the personalized federated learning algorithm FedSM described in
 
 ### Closing the Generalization Gap of Cross-silo Federated Medical Image Segmentation ([arXiv:2203.10144](https://arxiv.org/abs/2203.10144))
-Accepted to [CVPR2022](https://cvpr2022.thecvf.com/).
+Accepted to CVPR2022.
 
 ###### Abstract:
 
@@ -88,3 +88,5 @@ BibTeX
   year={2022}
 }
 ```
+
+