NVIDIA
diff --git a/‎docs/programming_guide/controllers/model_controller.rst
+48-4 b/‎docs/programming_guide/controllers/model_controller.rst
+48-4
diff --git a/‎docs/release_notes/flare_250.rst
+2-2 b/‎docs/release_notes/flare_250.rst
+2-2
diff --git a/‎examples/advanced/job_api/pt/src/cifar10_lightning_fl.py
+1-4 b/‎examples/advanced/job_api/pt/src/cifar10_lightning_fl.py
+1-4
diff --git a/‎examples/getting_started/pt/nvflare_lightning_getting_started.ipynb
+40-10 b/‎examples/getting_started/pt/nvflare_lightning_getting_started.ipynb
+40-10
diff --git a/‎examples/getting_started/pt/nvflare_pt_getting_started.ipynb
+52-22 b/‎examples/getting_started/pt/nvflare_pt_getting_started.ipynb
+52-22
diff --git a/‎examples/getting_started/pt/src/cifar10_lightning_fl.py
+1-4 b/‎examples/getting_started/pt/src/cifar10_lightning_fl.py
+1-4
@@ -7,13 +7,13 @@ ModelController API
 The FLARE :mod:`ModelController<nvflare.app_common.workflows.model_controller>` API provides an easy way for users to write and customize FLModel-based controller workflows.
 
 * Highly flexible with a simple API (run routine and basic communication and utility functions)
-* :ref:`fl_model`for the communication data structure, everything else is pure Python
+* :ref:`fl_model` for the communication data structure, everything else is pure Python
 * Option to support pre-existing components and FLARE-specific functionalities
 
 .. note::
 
     The ModelController API is a high-level API meant to simplify writing workflows.
-    If users prefer or need the full flexibility of the Controller with all the capabilites of FLARE functions, refer to the :ref:`controllers`.
+    If users prefer or need the full flexibility of the Controller with all the capabilities of FLARE functions, refer to the :ref:`controllers`.
 
 
 Core Concepts
@@ -70,7 +70,7 @@ Here is an example of the FedAvg workflow using the :class:`BaseFedAvg<nvflare.a
                   results, aggregate_fn=self.aggregate_fn
               )  # using default aggregate_fn with `WeightedAggregationHelper`. Can overwrite self.aggregate_fn with signature Callable[List[FLModel], FLModel]
 
-              # update global model with agggregation results
+              # update global model with aggregation results
               model = self.update_model(model, aggregate_results)
 
               # save model (by default uses persistor, can provide custom method)
@@ -119,7 +119,7 @@ The :ref:`fl_model` is standardized data structure object that is sent along wit
 
     The :ref:`fl_model` object can be any type of data depending on the specific task.
     For example, in the "train" and "validate" tasks we send the model parameters along with the task so the target clients can train and validate the model.
-    However in many other tasks that do not involve sending the model (e.g. "submit_model"), the :ref:`fl_model` can contain any type of data (e.g. metadata, metrics etc.) or may be not be needed at all.
+    However in many other tasks that do not involve sending the model (e.g. "submit_model"), the :ref:`fl_model` can contain any type of data (e.g. metadata, metrics etc.) or may not be needed at all.
 
 
 send_model_and_wait
@@ -142,6 +142,50 @@ A callback with the signature ``Callable[[FLModel], None]`` can be passed in, wh
 The task is standing until either ``min_responses`` have been received, or ``timeout`` time has passed.
 Since this call is asynchronous, the Controller :func:`get_num_standing_tasks<nvflare.apis.impl.controller.Controller.get_num_standing_tasks>` method can be used to get the number of standing tasks for synchronization purposes.
 
+For example, in the :github_nvflare_link:`CrossSiteEval <app_common/workflows/cross_site_eval.py>` workflow, the tasks are asynchronously sent with :func:`send_model<nvflare.app_common.workflows.model_controller.ModelController.send_model>` to get each client's model.
+Then through a callback, the clients' models are sent to the other clients for validation.
+Finally, the workflow waits for all standing tasks to complete with :func:`get_num_standing_tasks<nvflare.apis.impl.controller.Controller.get_num_standing_tasks>`.
+Below is an example of how these functions can be used. For more details view the implementation of :github_nvflare_link:`CrossSiteEval <app_common/workflows/cross_site_eval.py>`.
+
+
+.. code-block:: python
+
+    class CrossSiteEval(ModelController):
+        ...
+        def run(self) -> None:
+            ...
+            # Create submit_model task and broadcast to all participating clients
+            self.send_model(
+                task_name=AppConstants.TASK_SUBMIT_MODEL,
+                data=data,
+                targets=self._participating_clients,
+                timeout=self._submit_model_timeout,
+                callback=self._receive_local_model_cb,
+            )
+            ...
+            # Wait for all standing tasks to complete, since we used non-blocking `send_model()`
+            while self.get_num_standing_tasks():
+                if self.abort_signal.triggered:
+                    self.info("Abort signal triggered. Finishing cross site validation.")
+                    return
+                self.debug("Checking standing tasks to see if cross site validation finished.")
+                time.sleep(self._task_check_period)
+
+            self.save_results()
+            self.info("Stop Cross-Site Evaluation.")
+
+        def _receive_local_model_cb(self, model: FLModel):
+            # Send this model to all clients to validate
+            model.meta[AppConstants.MODEL_OWNER] = model_name
+            self.send_model(
+                task_name=AppConstants.TASK_VALIDATION,
+                data=model,
+                targets=self._participating_clients,
+                timeout=self._validation_timeout,
+                callback=self._receive_val_result_cb,
+            )
+        ...
+
 
 Saving & Loading
 ================
 
@@ -9,7 +9,7 @@ scientists' experience working with FLARE. The new API covers client, server and
 
 Model Controller API
 --------------------
-The new Model Controller API greatly simplifies the experience of developing new federated learning workflows. Users can simply subclass
+The new :ref:`model_controller` greatly simplifies the experience of developing new federated learning workflows. Users can simply subclass
 the ModelController to develop new workflows. The new API doesn't require users to know the details of NVFlare constructs except for FLModel
 class, where it is simply a data structure that contains model weights, optimization parameters and metadata. 
 
@@ -104,7 +104,7 @@ federated stats will be very helpful.
 
 FedAvg Early Stopping Example
 ------------------------------
-The `FedAvg Early Stopping example <https://github.com/NVIDIA/NVFlare/pull/2648>`_ tries to demonstrate that with the new server-side model
+The :github_nvflare_link:`FedAvg Early Stopping example <examples/hello-world/hello-fedavg>` tries to demonstrate that with the new server-side model
 controller API, it is very easy to change the control conditions and adjust workflows with a few lines of python code.
 
 Tensorflow Algorithms & Examples
 
@@ -71,10 +71,7 @@ def predict_dataloader(self):
 def main():
     model = LitNet()
     cifar10_dm = CIFAR10DataModule()
-    if torch.cuda.is_available():
-        trainer = Trainer(max_epochs=1, accelerator="gpu", devices=1 if torch.cuda.is_available() else None)
-    else:
-        trainer = Trainer(max_epochs=1, devices=None)
+    trainer = Trainer(max_epochs=1, devices=1, accelerator="gpu" if torch.cuda.is_available() else "cpu")
 
     # (2) patch the lightning trainer
     flare.patch(trainer)
 
@@ -333,7 +333,7 @@
     "from nvflare.job_config.script_runner import ScriptRunner\n",
     "from nvflare.app_common.workflows.fedavg import FedAvg\n",
     "\n",
-    "job = FedJob(name=\"cifar10_fedavg_lightning\")"
+    "job = FedJob(name=\"cifar10_lightning_fedavg\")"
    ]
   },
   {
@@ -412,16 +412,46 @@
     "That completes the components that need to be defined on the server."
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "32686782",
+   "metadata": {},
+   "source": [
+    "#### OPTIONAL: Define a FedAvgJob\n",
+    "\n",
+    "Alternatively, we can replace steps 2-7 and instead use the `FedAvgJob`.\n",
+    "The `FedAvgJob` automatically configures the `FedAvg`` server controller, along the other components for model persistence and model selection."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "02fde3ae",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from nvflare.app_opt.pt.job_config.fed_avg import FedAvgJob\n",
+    "\n",
+    "n_clients = 2\n",
+    "\n",
+    "# Create FedAvg Job with initial model\n",
+    "job = FedAvgJob(\n",
+    "    name=\"cifar10_lightning_fedavg\",\n",
+    "    num_rounds=2,\n",
+    "    n_clients=n_clients,\n",
+    "    initial_model=LitNet(),\n",
+    ")"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "548966c2-90bf-47ad-91d2-5c6c22c3c4f0",
    "metadata": {},
    "source": [
-    "#### 5. Add clients\n",
+    "#### 6. Add client ScriptRunners\n",
     "Next, we can use the `ScriptRunner` and send it to each of the clients to run our training script.\n",
     "\n",
-    "Note that our script could have additional input arguments, such as batch size or data path, but we don't use them here for simplicity.\n",
-    "We can also specify, which GPU should be used to run this client, which is helpful for simulated environments."
+    "Note that our script could have additional input arguments, such as batch size or data path, but we don't use them here for simplicity."
    ]
   },
   {
@@ -432,10 +462,10 @@
    "outputs": [],
    "source": [
     "for i in range(n_clients):\n",
-    "    executor = ScriptRunner(\n",
+    "    runner = ScriptRunner(\n",
     "        script=\"src/cifar10_lightning_fl.py\", script_args=\"\"  # f\"--batch_size 32 --data_path /tmp/data/site-{i}\"\n",
     "    )\n",
-    "    job.to(executor, f\"site-{i+1}\")"
+    "    job.to(runner, f\"site-{i+1}\")"
    ]
   },
   {
@@ -445,7 +475,7 @@
    "source": [
     "That's it!\n",
     "\n",
-    "#### 6. Optionally export the job\n",
+    "#### 7. Optionally export the job\n",
     "Now, we could export the job and submit it to a real NVFlare deployment using the [Admin client](https://nvflare.readthedocs.io/en/main/real_world_fl/operation.html) or [FLARE API](https://nvflare.readthedocs.io/en/main/real_world_fl/flare_api.html). "
    ]
   },
@@ -464,8 +494,8 @@
    "id": "9ac3f0a8-06bb-4bea-89d3-4a5fc5b76c63",
    "metadata": {},
    "source": [
-    "#### 7. Run FL Simulation\n",
-    "Finally, we can run our FedJob in simulation using NVFlare's [simulator](https://nvflare.readthedocs.io/en/main/user_guide/nvflare_cli/fl_simulator.html) under the hood. The results will be saved in the specified `workdir`."
+    "#### 8. Run FL Simulation\n",
+    "Finally, we can run our FedJob in simulation using NVFlare's [simulator](https://nvflare.readthedocs.io/en/main/user_guide/nvflare_cli/fl_simulator.html) under the hood. We can also specify which GPU should be used to run this client, which is helpful for simulated environments. The results will be saved in the specified `workdir`."
    ]
   },
   {
@@ -495,7 +525,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "! nvflare simulator -w /tmp/nvflare/jobs/workdir -n 2 -t 2 -gpu 0 /tmp/nvflare/jobs/job_config/cifar10_fedavg_lightning"
+    "! nvflare simulator -w /tmp/nvflare/jobs/workdir -n 2 -t 2 -gpu 0 /tmp/nvflare/jobs/job_config/cifar10_lightning_fedavg"
    ]
   }
  ],
 
@@ -275,7 +275,7 @@
     "from nvflare.job_config.script_runner import ScriptRunner\n",
     "from nvflare.app_common.workflows.fedavg import FedAvg\n",
     "\n",
-    "job = FedJob(name=\"cifar10_fedavg\")"
+    "job = FedJob(name=\"cifar10_pt_fedavg\")"
    ]
   },
   {
@@ -378,51 +378,81 @@
   },
   {
    "cell_type": "markdown",
-   "id": "548966c2-90bf-47ad-91d2-5c6c22c3c4f0",
+   "id": "6059b304",
    "metadata": {},
    "source": [
-    "#### 7. Add clients\n",
-    "Next, we can use the `ScriptRunner` and send it to each of the clients to run our training script.\n",
-    "\n",
-    "Note that our script could have additional input arguments, such as batch size or data path, but we don't use them here for simplicity.\n",
-    "We can also specify, which GPU should be used to run this client, which is helpful for simulated environments."
+    "#### 7. Add TB Event\n",
+    "Add tensorboard logging to clients"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "ad5d36fe-9ae5-43c3-80bc-2cdc66bf7a7e",
+   "id": "51d8bcda",
    "metadata": {},
    "outputs": [],
    "source": [
+    "from nvflare.app_common.widgets.convert_to_fed_event import ConvertToFedEvent\n",
+    "\n",
     "for i in range(n_clients):\n",
-    "    executor = ScriptRunner(\n",
-    "        script=\"src/cifar10_fl.py\", script_args=\"\"  # f\"--batch_size 32 --data_path /tmp/data/site-{i}\"\n",
-    "    )\n",
-    "    job.to(id=\"event_to_fed\", obj=executor, target=f\"site-{i+1}\")"
+    "    component = ConvertToFedEvent(events_to_convert=[\"analytix_log_stats\"], fed_event_prefix=\"fed.\")\n",
+    "    job.to(id=\"event_to_fed\", obj=component, target=f\"site-{i+1}\")"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "a56abcd6-4e97-4a60-8894-2760f8815a03",
+   "id": "7c95e3f6",
    "metadata": {},
    "source": [
-    "#### 8. Add TB Event\n",
-    "Add tensorboard logging to clients"
+    "#### OPTIONAL: Define a FedAvgJob\n",
+    "\n",
+    "Alternatively, we can replace steps 2-7 and instead use the `FedAvgJob`.\n",
+    "The `FedAvgJob` automatically configures the `FedAvg`` server controller, along the other components for model persistence, model selection, and TensorBoard streaming.\n"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "a8a733e0-c0a9-4c36-b49d-16b20c2df7f6",
+   "id": "c4dfc3e7",
    "metadata": {},
    "outputs": [],
    "source": [
-    "from nvflare.app_common.widgets.convert_to_fed_event import ConvertToFedEvent\n",
+    "from nvflare.app_opt.pt.job_config.fed_avg import FedAvgJob\n",
+    "\n",
+    "n_clients = 2\n",
     "\n",
+    "# Create FedAvg Job with initial model\n",
+    "job = FedAvgJob(\n",
+    "    name=\"cifar10_pt_fedavg\",\n",
+    "    num_rounds=2,\n",
+    "    n_clients=n_clients,\n",
+    "    initial_model=Net(),\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "548966c2-90bf-47ad-91d2-5c6c22c3c4f0",
+   "metadata": {},
+   "source": [
+    "#### 8. Add client ScriptRunners\n",
+    "Next, we can use the `ScriptRunner` and send it to each of the clients to run our training script.\n",
+    "\n",
+    "Note that our script could have additional input arguments, such as batch size or data path, but we don't use them here for simplicity."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ad5d36fe-9ae5-43c3-80bc-2cdc66bf7a7e",
+   "metadata": {},
+   "outputs": [],
+   "source": [
     "for i in range(n_clients):\n",
-    "    component = ConvertToFedEvent(events_to_convert=[\"analytix_log_stats\"], fed_event_prefix=\"fed.\")\n",
-    "    job.to(component, f\"site-{i+1}\")"
+    "    runner = ScriptRunner(\n",
+    "        script=\"src/cifar10_fl.py\", script_args=\"\"  # f\"--batch_size 32 --data_path /tmp/data/site-{i}\"\n",
+    "    )\n",
+    "    job.to(runner, f\"site-{i+1}\")"
    ]
   },
   {
@@ -432,7 +462,7 @@
    "source": [
     "That's it!\n",
     "\n",
-    "#### 9 Optionally export the job\n",
+    "#### 9. Optionally export the job\n",
     "Now, we could export the job and submit it to a real NVFlare deployment using the [Admin client](https://nvflare.readthedocs.io/en/main/real_world_fl/operation.html) or [FLARE API](https://nvflare.readthedocs.io/en/main/real_world_fl/flare_api.html)."
    ]
   },
@@ -452,7 +482,7 @@
    "metadata": {},
    "source": [
     "#### 10. Run FL Simulation\n",
-    "Finally, we can run our FedJob in simulation using NVFlare's [simulator](https://nvflare.readthedocs.io/en/main/user_guide/nvflare_cli/fl_simulator.html) under the hood. The results will be saved in the specified `workdir`."
+    "Finally, we can run our FedJob in simulation using NVFlare's [simulator](https://nvflare.readthedocs.io/en/main/user_guide/nvflare_cli/fl_simulator.html) under the hood. We can also specify which GPU should be used to run this client, which is helpful for simulated environments. The results will be saved in the specified `workdir`."
    ]
   },
   {
@@ -482,7 +512,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "! nvflare simulator -w /tmp/nvflare/jobs/workdir -n 2 -t 2 -gpu 0 /tmp/nvflare/jobs/job_config/cifar10_fedavg"
+    "! nvflare simulator -w /tmp/nvflare/jobs/workdir -n 2 -t 2 -gpu 0 /tmp/nvflare/jobs/job_config/cifar10_pt_fedavg"
    ]
   },
   {
 
@@ -71,10 +71,7 @@ def predict_dataloader(self):
 def main():
     model = LitNet()
     cifar10_dm = CIFAR10DataModule()
-    if torch.cuda.is_available():
-        trainer = Trainer(max_epochs=1, accelerator="gpu", devices=1 if torch.cuda.is_available() else None)
-    else:
-        trainer = Trainer(max_epochs=1, devices=None)
+    trainer = Trainer(max_epochs=1, devices=1, accelerator="gpu" if torch.cuda.is_available() else "cpu")
 
     # (2) patch the lightning trainer
     flare.patch(trainer)