diff --git a/benchmark/configs/android/mnn.yml b/benchmark/configs/android/mnn.yml new file mode 100644 index 00000000..363cedcc --- /dev/null +++ b/benchmark/configs/android/mnn.yml @@ -0,0 +1,32 @@ +# Configuration file of running tensorflow backend + +# ========== Cluster configuration ========== +# ip address of the parameter server (need 1 GPU process) +ps_ip: localhost + +exp_path: $FEDSCALE_HOME/fedscale/cloud + +aggregator_entry: aggregation/aggregator_mnn.py + +auth: + ssh_user: "" + ssh_private_key: ~/.ssh/id_rsa + +# cmd to run before we can indeed run FAR (in order) +setup_commands: + - source $HOME/anaconda3/bin/activate fedscale + +# ========== Additional job configuration ========== +# Default parameters are specified in config_parser.py, wherein more description of the parameter can be found + +job_conf: + - job_name: android-mnn # Generate logs under this folder: log_path/job_name/time_stamp + - log_path: $FEDSCALE_HOME/benchmark # Path of log files + - experiment_mode: mobile + - num_participants: 1 # Number of participants per round, we use K=100 in our paper, large K will be much slower + - model: linear # Need to define the model in aggregator_mnn.py + - learning_rate: 0.01 + - batch_size: 32 + - input_shape: 32 32 3 + - num_classes: 10 + - test_bsz: 16 diff --git a/benchmark/configs/android/tflite.yml b/benchmark/configs/android/tflite.yml new file mode 100644 index 00000000..a853c2c3 --- /dev/null +++ b/benchmark/configs/android/tflite.yml @@ -0,0 +1,33 @@ +# Configuration file of running tensorflow backend + +# ========== Cluster configuration ========== +# ip address of the parameter server (need 1 GPU process) +ps_ip: localhost + +exp_path: $FEDSCALE_HOME/fedscale/cloud + +aggregator_entry: aggregation/aggregator_tflite.py + +auth: + ssh_user: "" + ssh_private_key: ~/.ssh/id_rsa + +# cmd to run before we can indeed run FAR (in order) +setup_commands: + - source $HOME/anaconda3/bin/activate fedscale + +# ========== Additional job configuration ========== +# Default parameters are specified in config_parser.py, wherein more description of the parameter can be found + +job_conf: + - job_name: android-tflite # Generate logs under this folder: log_path/job_name/time_stamp + - log_path: $FEDSCALE_HOME/benchmark # Path of log files + - experiment_mode: mobile + - num_participants: 1 # Number of participants per round, we use K=100 in our paper, large K will be much slower + - model: linear # Need to define the model in tf_aggregator.py + - learning_rate: 0.01 + - batch_size: 32 + - input_shape: 32 32 3 + - num_classes: 10 + - test_bsz: 16 + - engine: 'tensorflow' diff --git a/docs/fedscale-deploy.png b/docs/fedscale-deploy.png new file mode 100644 index 00000000..736e9037 Binary files /dev/null and b/docs/fedscale-deploy.png differ diff --git a/fedscale/cloud/aggregation/README.md b/fedscale/cloud/aggregation/README.md deleted file mode 100644 index f86e89f3..00000000 --- a/fedscale/cloud/aggregation/README.md +++ /dev/null @@ -1,49 +0,0 @@ -# Aggregation for Mobiles - -This document contains explanation and instruction of aggregation for mobiles. - -An example android aggregator accompanied by -- [MNN](https://github.com/SymbioticLab/FedScale/fedscale/edge/mnn/). The android app has [MNN](https://github.com/alibaba/MNN) backend support for training and testing. -- [TFLite](https://github.com/SymbioticLab/FedScale/fedscale/edge/tflite/). The android app has [TFLite](https://www.tensorflow.org/lite) backend support for training and testing. - -## MNN - -`fedscale/cloud/aggregation/aggregator_mnn.py` contains an inherited version of aggregator. While keeping all functionalities of the original [aggregator](https://github.com/SymbioticLab/FedScale/blob/master/fedscale/cloud/aggregation/aggregator.py), it adds support to do bijective conversion between PyTorch model and MNN model. It uses JSON to communicate with android client. - -**Note**: -MNN does not support direct conversion from MNN to PyTorch model, so we did a manual conversion from MNN to JSON, then from JSON to PyTorch model. We currently only support Convolution (including Linear) and BatchNorm conversion. We welcome contribution to support more conversion for operators with trainable parameters. - -`fedscale/utils/models/mnn_model_provider.py` contains all the code necessary for MNN<->PyTorch model conversion and currently supported PyTorch models that can be converted to MNN without bugs. - -In order to run this aggregator with default setting in order to test sample app, please run -``` -git clone https://github.com/SymbioticLab/FedScale.git -cd FedScale -source install.sh -pip install -e . -cd fedscale/cloud/aggregation -python3 aggregator_mnn.py --experiment_mode mobile --num_participants 1 --num_classes 10 --input_shape 32 32 3 --model linear -``` -and configure your android app according to the [tutorial](https://github.com/SymbioticLab/FedScale/fedscale/edge/mnn/README.md). - -## TFLite - -`fedscale/cloud/aggregation/aggregator_tflite.py` contains an inherited version of aggregator. While keeping all functionalities of the original [aggregator](https://github.com/SymbioticLab/FedScale/blob/master/fedscale/cloud/aggregation/aggregator.py), it adds support to do bijective conversion between tensorflow model and TFLite model. - -`fedscale/utils/models/tflite_model_provider.py` contains a simple linear model with Flatten->Dense->Dense, used for simple test of our sample android app. Please feel free to contribute to it and add more models. - -`fedscale/cloud/internal/tflite_model_adapter.py` defer from `fedscale/cloud/internal/tensorflow_model_adapter.py` in the way that TFLite adapter skip layers without weights, such as Flatten. - -In order to run this aggregator with default setting in order to test sample app, please run -``` -git clone https://github.com/SymbioticLab/FedScale.git -cd FedScale -source install.sh -pip install -e . -cd fedscale/cloud/aggregation -python3 aggregator_tflite.py --experiment_mode mobile --num_participants 1 --num_classes 10 --input_shape 32 32 3 --engine tensorflow --model [linear|mobilenetv3|resnet50|mobilenetv3_finetune|resnet50_finetune] --learning_rate 1e-2 -``` -and configure your android app according to the [tutorial](https://github.com/SymbioticLab/FedScale/fedscale/edge/tflite/README.md). - ---- -If you need any other help, feel free to contact FedScale team or the developer [website](https://continue-revolution.github.io) [email](mailto:continuerevolution@gmail.com) of this android aggregator. diff --git a/fedscale/edge/android/README.md b/fedscale/edge/android/README.md index 753e76bb..b1918a00 100644 --- a/fedscale/edge/android/README.md +++ b/fedscale/edge/android/README.md @@ -1,12 +1,53 @@ -# Android TFLite Sample App +# FedScale Deployment -This directory contains minimum files modified from [MNN Android Demo](https://github.com/alibaba/MNN/tree/master/project/android/demo) and [TFLite Android Demo](https://www.tensorflow.org/lite/examples/on_device_training/overview). The training and testing will be conducted by TFLite backend, while the task execution and communication with server will be managed by Java. The sample has been tested upon image classification with a simple linear model and a small subset of [ImageNet-MINI](https://www.kaggle.com/datasets/ifigotin/imagenetmini-1000). This documentation contains a step-by-step tutorial on how to download, build and config this app on your own device, and modify this app for your own implementation and deployment. +FedScale provides a cloud-based [aggregation service](https://github.com/SymbioticLab/FedScale/blob/master/fedscale/cloud/aggregation/README.md) and an SDK for smartphones on the edge that currently supports TensorflowLite and Alibaba MNN on Android (iOS support coming soon!). In this tutorial, we introduce how to: -## Download and build sample android app +- Initiate FedScale Cloud Service +- Import FedScale SDK to locally fine tune models +- Connect to FedScale cloud for federated training + +![fedscale deployment](../../../docs/fedscale-deploy.png) + + + +## FedScale Cloud Aggregation + +You may follow these steps to deploy and run the cloud server specific to [Alibaba MNN](https://github.com/SymbioticLab/FedScale/fedscale/edge/mnn/) or [TFLite](https://github.com/SymbioticLab/FedScale/fedscale/edge/tflite/). + +- Specify number of executors `num_participants: 1`. You may add more mobile participants to any number you want. We currently only support all participant in one single training backend. + +- Specify `model`. Currently we have only tested `linear` models for Alibaba MNN backend because Alibaba MNN does not support Dropout. However, you may choose one of `linear`|`mobilenetv3`|`resnet50`|`mobilenetv3_finetune`|`resnet50_finetune` models. `finetune` means that only the last 2 linear layers will be trained, but the backbone layers will be frozen. + +- Set `use_cuda` flag to `True` if you want to use GPU for aggregation. However, as aggregation process is sequential addition of several small tensors, GPU acceleration is very little. + +- Submit job + + ``` + cd $FEDSCALE_HOME/docker + python3 driver.py submit $FEDSCALE_HOME/benchmark/configs/android/mnn.yml # If you want to run MNN backend on mobile. + python3 driver.py submit $FEDSCALE_HOME/benchmark/configs/android/tflite.yml # If you want to run TFLite backend on mobile. + ``` + +- Check logs: FedScale will generate logs under `data_path` you provided by default. Keep in mind that k8s may load balancing your job to any node on the cluster, so make sure you are checking the `data_path` on the correct node. + +- Stop job + + ``` + cd $FEDSCALE_HOME/docker + python3 driver.py stop $YOUR_JOB + ``` + +## FedScale Mobile Runtime + +We provide a sample app which you can choose to +- Train/test models with TFLite or Alibaba MNN. +- Fine-tune models locally **after** receiving model from the cloud. + +Please follow these steps to download and build the sample android app. 1. Download and unzip [sample dataset (TrainTest.zip)](https://drive.google.com/file/d/1nfi3SVzjaE0LPxwj_5DNdqi6rK7BU8kb/view?usp=sharing) to `assets/` directory. Remove `TrainTest.zip` after unzip to save space on your mobile device. After unzip, you should see 3 files and 2 directories under `assets/`: - 1. `TrainSet`: Training set directory, contains 316 images. - 2. `TestSet`: Testing set directory, contains 34 images. + 1. `TrainSet`: Training set directory, contains 320 images. + 2. `TestSet`: Testing set directory, contains 32 images. 3. `conf.json`: Configuration file for mobile app. 4. `train_labels.txt`: Training label file with format `