flower/baselines/depthfl at main · adap/flower · GitHub
Skip to content

Latest commit

 

History

History

depthfl

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
title url labels dataset
DepthFL:Depthwise Federated Learning for Heterogeneous Clients
image classification
system heterogeneity
cross-device
knowledge distillation
CIFAR-100

DepthFL: Depthwise Federated Learning for Heterogeneous Clients

Note: If you use this baseline in your work, please remember to cite the original authors of the paper as well as the Flower paper.

Paper: openreview.net/forum?id=pf8RIZTMU58

Authors: Minjae Kim, Sangyoon Yu, Suhyun Kim, Soo-Mook Moon

Abstract: Federated learning is for training a global model without collecting private local data from clients. As they repeatedly need to upload locally-updated weights or gradients instead, clients require both computation and communication resources enough to participate in learning, but in reality their resources are heterogeneous. To enable resource-constrained clients to train smaller local models, width scaling techniques have been used, which reduces the channels of a global model. Unfortunately, width scaling suffers from heterogeneity of local models when averaging them, leading to a lower accuracy than when simply excluding resource-constrained clients from training. This paper proposes a new approach based on depth scaling called DepthFL. DepthFL defines local models of different depths by pruning the deepest layers off the global model, and allocates them to clients depending on their available resources. Since many clients do not have enough resources to train deep local models, this would make deep layers partially-trained with insufficient data, unlike shallow layers that are fully trained. DepthFL alleviates this problem by mutual self-distillation of knowledge among the classifiers of various depths within a local model. Our experiments show that depth-scaled local models build a global model better than width-scaled ones, and that self-distillation is highly effective in training data-insufficient deep layers.

About this baseline

What’s implemented: The code in this directory replicates the experiments in DepthFL: Depthwise Federated Learning for Heterogeneous Clients (Kim et al., 2023) for CIFAR100, which proposed the DepthFL algorithm. Concretely, it replicates the results for CIFAR100 dataset in Table 2, 3 and 4.

Datasets: CIFAR100 from PyTorch's Torchvision

Hardware Setup: These experiments were run on a server with Nvidia 3090 GPUs. Any machine with 1x 8GB GPU or more would be able to run it in a reasonable amount of time. With the default settings, clients make use of 1.3GB of VRAM. Lower num_gpus in client_resources to train more clients in parallel on your GPU(s).

Contributors: Minjae Kim

Experimental Setup

Task: Image Classification

Model: ResNet18

Dataset: This baseline only includes the CIFAR100 dataset. By default it will be partitioned into 100 clients following IID distribution. The settings are as follow:

Dataset #classes #partitions partitioning method
CIFAR100 100 100 IID or Non-IID

Training Hyperparameters: The following table shows the main hyperparameters for this baseline with their default value (i.e. the value used if you run python -m depthfl.main directly)

Description Default Value
total clients 100
local epoch 5
batch size 50
number of rounds 1000
participation ratio 10%
learning rate 0.1
learning rate decay 0.998
client resources {'num_cpus': 1.0, 'num_gpus': 0.5 }
data partition IID
optimizer SGD with dynamic regularization
alpha 0.1

Environment Setup

To construct the Python environment follow these steps:

# Set python version
pyenv install 3.10.6
pyenv local 3.10.6

# Tell poetry to use python 3.10
poetry env use 3.10.6

# Install the base Poetry environment
poetry install

# Activate the environment
poetry shell

Running the Experiments

To run this DepthFL, first ensure you have activated your Poetry environment (execute poetry shell from this directory), then:

# this will run using the default settings in the `conf/config.yaml`
python -m depthfl.main  # 'accuracy' : accuracy of the ensemble model, 'accuracy_single' : accuracy of each classifier.

# you can override settings directly from the command line
python -m depthfl.main exclusive_learning=true model_size=1 # exclusive learning - 100% (a)
python -m depthfl.main exclusive_learning=true model_size=4 # exclusive learning - 25% (d)
python -m depthfl.main fit_config.feddyn=false fit_config.kd=false # DepthFL (FedAvg)
python -m depthfl.main fit_config.feddyn=false fit_config.kd=false fit_config.extended=false # InclusiveFL

To run using HeteroFL:

# since sbn takes too long, we test global model every 50 rounds. 
python -m depthfl.main --config-name="heterofl" # HeteroFL
python -m depthfl.main --config-name="heterofl" exclusive_learning=true model_size=1 # exclusive learning - 100% (a)

Stateful clients comment

To implement feddyn, stateful clients that store prev_grads information are needed. Since flwr does not yet officially support stateful clients, it was implemented as a temporary measure by loading prev_grads from disk when creating a client, and then storing it again on disk after learning. Specifically, there are files that store the state of each client in the prev_grads folder. When the strategy is instantiated (for both FedDyn and HeteroFL) the content of prev_grads is reset.

Expected Results

With the following command we run DepthFL (FedDyn / FedAvg), InclusiveFL, and HeteroFL to replicate the results of table 2,3,4 in DepthFL paper. Tables 2, 3, and 4 may contain results from the same experiment in multiple tables.

# table 2 (HeteroFL row)
python -m depthfl.main --config-name="heterofl" 
python -m depthfl.main --config-name="heterofl" --multirun exclusive_learning=true model.scale=false model_size=1,2,3,4 

# table 2 (DepthFL(FedAvg) row)
python -m depthfl.main fit_config.feddyn=false fit_config.kd=false 
python -m depthfl.main --multirun fit_config.feddyn=false fit_config.kd=false  exclusive_learning=true model_size=1,2,3,4

# table 2 (DepthFL row)
python -m depthfl.main
python -m depthfl.main --multirun exclusive_learning=true model_size=1,2,3,4

Table 2

100% (a), 75%(b), 50%(c), 25% (d) cases are exclusive learning scenario. 100% (a) exclusive learning means, the global model and every local model are equal to the smallest local model, and 100% clients participate in learning. Likewise, 25% (d) exclusive learning means, the global model and every local model are equal to the largest local model, and only 25% clients participate in learning.

Scaling Method Dataset Global Model 100% (a) 75% (b) 50% (c) 25% (d)
HeteroFL
DepthFL (FedAvg)
DepthFL
CIFAR100 57.61
72.67
76.06
64.39
67.08
69.68
66.08
70.78
73.21
62.03
68.41
70.29
51.99
59.17
60.32
# table 3 (Width Scaling - Duplicate results from table 2)
python -m depthfl.main --config-name="heterofl" 
python -m depthfl.main --config-name="heterofl" --multirun exclusive_learning=true model.scale=false model_size=1,2,3,4 

# table 3 (Depth Scaling : Exclusive Learning, DepthFL(FedAvg) rows - Duplicate results from table 2)
python -m depthfl.main fit_config.feddyn=false fit_config.kd=false 
python -m depthfl.main --multirun fit_config.feddyn=false fit_config.kd=false  exclusive_learning=true model_size=1,2,3,4

## table 3 (Depth Scaling - InclusiveFL row)
python -m depthfl.main fit_config.feddyn=false fit_config.kd=false fit_config.extended=false

Table 3

Accuracy of global sub-models compared to exclusive learning on CIFAR-100.

Method Algorithm Classifier 1/4 Classifier 2/4 Classifier 3/4 Classifier 4/4
Width Scaling Exclusive Learning
HeteroFL
64.39
51.08
66.08
55.89
62.03
58.29
51.99
57.61
Method Algorithm Classifier 1/4 Classifier 2/4 Classifier 3/4 Classifier 4/4
Depth Scaling Exclusive Learning
InclusiveFL
DepthFL (FedAvg)
67.08
47.61
66.18
68.00
53.88
67.56
66.19
59.48
67.97
56.78
60.46
68.01
# table 4
python -m depthfl.main --multirun fit_config.kd=true,false dataset_config.iid=true,false

Table 4

Accuracy of the global model with/without self distillation on CIFAR-100.

Distribution Dataset KD Classifier 1/4 Classifier 2/4 Classifier 3/4 Classifier 4/4 Ensemble
IID CIFAR100
70.13
71.74
69.63
73.35
68.92
73.57
68.92
73.55
74.48
76.06
non-IID CIFAR100
67.94
70.33
68.68
71.88
68.46
72.43
67.78
72.34
73.18
74.92