Skip to content

Commit

Permalink
Merge pull request #237 from SymbioticLab/auxo
Browse files Browse the repository at this point in the history
[Example] Auxo (SoCC'23)
  • Loading branch information
fanlai0990 authored Sep 23, 2023
2 parents faab283 + dabf26a commit 731aa17
Show file tree
Hide file tree
Showing 25 changed files with 2,156 additions and 0 deletions.
52 changes: 52 additions & 0 deletions benchmark/configs/auxo/auxo.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# Configuration file of fed_hetero experiment

# ========== Cluster configuration ==========
# ip address of the parameter server (need 1 GPU process)
ps_ip: localhost
ps_port: 12345

# ip address of each worker:# of available gpus process on each gpu in this node
# Note that if we collocate ps and worker on same GPU, then we need to decrease this number of available processes on that GPU by 1
# E.g., master node has 4 available processes, then 1 for the ps, and worker should be set to: worker:3
worker_ips:
- localhost:[7,7,0,0] # worker_ip: [(# processes on gpu) for gpu in available_gpus] eg. 10.0.0.2:[4,4,4,4] This node has 4 gpus, each gpu has 4 processes.

exp_path: $FEDSCALE_HOME/examples/auxo

# Entry function of executor and aggregator under $exp_path
executor_entry: executor.py

aggregator_entry: aggregator.py

auth:
ssh_user: ""
ssh_private_key: ~/.ssh/id_rsa

# cmd to run before we can indeed run FAR (in order)
setup_commands:
- source $HOME/anaconda3/bin/activate fedscale

# ========== Additional job configuration ==========
# Default parameters are specified in config_parser.py, wherein more description of the parameter can be found

job_conf:
- job_name: auxo_femnist # Generate logs under this folder: log_path/job_name/time_stamp
- log_path: $FEDSCALE_HOME/benchmark # Path of log files
- num_participants: 200 # Number of participants per round, we use K=100 in our paper, large K will be much slower
- data_set: femnist # Dataset: openImg, google_speech, stackoverflow
- data_dir: $FEDSCALE_HOME/benchmark/dataset/data/ # Path of the dataset
- data_map_file: $FEDSCALE_HOME/benchmark/dataset/data/femnist/client_data_mapping/train.csv # Allocation of data to each client, turn to iid setting if not provided
- device_conf_file: $FEDSCALE_HOME/benchmark/dataset/data/device_info/client_device_capacity # Path of the client trace
- device_avail_file: $FEDSCALE_HOME/benchmark/dataset/data/device_info/client_behave_trace
- model: resnet18 # NOTE: Please refer to our model zoo README and use models for these small image (e.g., 32x32x3) inputs
# - model_zoo: fedscale-torch-zoo
- eval_interval: 20 # How many rounds to run a testing on the testing set
- rounds: 1000 # Number of rounds to run this training. We use 1000 in our paper, while it may converge w/ ~400 rounds
- filter_less: 0 # Remove clients w/ less than 21 samples
- num_loaders: 2
- local_steps: 10
- learning_rate: 0.05
- batch_size: 20
- test_bsz: 20
- use_cuda: True
- save_checkpoint: False
29 changes: 29 additions & 0 deletions examples/auxo/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# Use an official CUDA image as a parent image
FROM nvidia/cuda:11.0-base-ubuntu20.04

# Set the working directory inside the container
WORKDIR /app

# Install necessary system packages
RUN apt-get update && apt-get install -y python3.7 python3-pip

# Create a virtual environment and activate it
RUN python3.7 -m pip install virtualenv
RUN python3.7 -m virtualenv venv
RUN /bin/bash -c "source venv/bin/activate"

# Copy the requirements file into the container
COPY requirements.txt .

# Install the Python dependencies
RUN pip install --upgrade pip && pip install -r requirements.txt

# Copy the project files into the container (assuming your project is in the current directory)
COPY . .

# Install your project using pip
RUN pip install -e .

# Command to run when the container starts
CMD ["bash"]

72 changes: 72 additions & 0 deletions examples/auxo/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@


<div align="center">
<picture>
<img alt="Auxo logo" width="45%" src="fig/auxo.png">
</picture>
<h1>Auxo: Efficient Federated Learning via Scalable Client Clustering</h1>

</div>

Auxo is a heterogeneity manager in Federated Learning (FL) through scalable and efficient cohort-based training mechanisms.
For more details, refer to our academic paper on SoCC'23 [paper](https://arxiv.org/abs/2210.16656).


## Key Features

- **Scalable Cohort Identification**: Efficiently identifies cohorts even in large-scale FL deployments.

- **Cohort-Based Training**: Optimizes the performance of existing FL algorithms by reducing intra-cohort heterogeneity.

- **Resource Efficiency**: Designed to work in low-availability, resource-constrained settings without additional computational overhead.

- **Privacy Preservation**: Respects user privacy by avoiding the need for traditional clustering methods that require access to client data.


## Getting Started
### Install
Following the installation steps if you have not installed fedscale yet.
```commandline
docker build -t fedscale:auxo .
docker run --gpus all -it --name auxo -v $FEDSCALE_HOME:/workspace/FedScale fedscale:auxo /bin/bash
```

```
echo export FEDSCALE_HOME=$(pwd) >> ~/.bashrc
echo alias fedscale=\'bash ${FEDSCALE_HOME}/fedscale.sh\' >> ~/.bashrc
source ~/.bashrc
```

### Prepare dataset
After setting up the fedscale environment, you can download the dataset and partition each client dataset into train set and test set.

```commandline
fedscale dataset download femnist
cd $FEDSCALE_HOME/examples/auxo
python -m utils.prepare_test_train ../../benchmark/dataset/data/femnist/client_data_mapping/train.csv
python -m utils.prepare_test_train ../../benchmark/dataset/data/femnist/client_data_mapping/test.csv
python -m utils.prepare_test_train ../../benchmark/dataset/data/femnist/client_data_mapping/val.csv
```
### Run Auxo
```
cd $FEDSCALE_HOME
fedscale driver start benchmark/configs/auxo/auxo.yml
```

### Visualize continuous clustering algorithm
```commandline
cd $FEDSCALE_HOME/examples/auxo
python playground.py
```
Visualized clustering Results:

<p float="left">
<img src="fig/epoch_14.png" width="150" />
<img src="fig/epoch_100.png" width="150" />
<img src="fig/epoch_224.png" width="150" />
<img src="fig/epoch_300.png" width="150" />
<img src="fig/epoch_500.png" width="150" />
<img src="fig/epoch_700.png" width="150" />
</p>


Loading

0 comments on commit 731aa17

Please sign in to comment.