AdaBest: Minimizing Client Drift in Federated Learning via Adaptive Bias Estimation

Varno, Farshid; Saghayi, Marzie; Rafiee Sevyeri, Laya; Gupta, Sharut; Matwin, Stan; Havaei, Mohammad

doi:10.1007/978-3-031-20050-2_41

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13683))

Included in the following conference series:

European Conference on Computer Vision

2658 Accesses
9 Citations

Abstract

In Federated Learning (FL), a number of clients or devices collaborate to train a model without sharing their data. Models are optimized locally at each client and further communicated to a central hub for aggregation. While FL is an appealing decentralized training paradigm, heterogeneity among data from different clients can cause the local optimization to drift away from the global objective. In order to estimate and therefore remove this drift, variance reduction techniques have been incorporated into FL optimization recently. However, these approaches inaccurately estimate the clients’ drift and ultimately fail to remove it properly. In this work, we propose an adaptive algorithm that accurately estimates drift across clients. In comparison to previous works, our approach necessitates less storage and communication bandwidth, as well as lower compute costs. Additionally, our proposed methodology induces stability by constraining the norm of estimates for client drift, making it more practical for large scale FL. Experimental findings demonstrate that the proposed algorithm converges significantly faster and achieves higher accuracy than the baselines across various FL benchmarks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 12583; Price includes VAT (Japan)

Softcover Book: JPY 15729; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Beyond Random Selection: A Perspective from Model Inversion in Personalized Federated Learning

FedAR: Addressing Client Unavailability in Federated Learning with Local Update Approximation and Rectification

FedSmart: An Auto Updating Federated Learning Optimization Mechanism

Notes

1.
Oracle dataset refers to the hypothetical dataset formed by stacking all clients’ data. Oracle gradients are the full-batch gradients of the Oracle dataset.
2.
In contrast to cross-silo FL, cross-device FL is referred to a large-scale (in terms of number of clients) setting in which clients are devices such as smart-phones.
3.
Recall that Federated Learning is a sub-branch of distributed learning with specific characteristics geared towards practicality [17].
4.
FedDyn additionally compares with FedProx [13]; however, as shown in their benchmarks it performs closer to FedAvg than the other baselines.

References

Acar, D.A.E., Zhao, Y., Matas, R., Mattina, M., Whatmough, P., Saligrama, V.: Federated learning based on dynamic regularization. In: International Conference on Learning Representations (2020)
Google Scholar
Ajalloeian, A., Stich, S.U.: On the convergence of SGD with biased gradients. arXiv preprint arXiv:2008.00051 (2020)
Harikandeh, R.B., Ahmed, M.O., Virani, A., Schmidt, M., Konečnỳ, J., Sallinen, S.: Stopwasting my gradients: practical svrg. Adv. Neural Inf. Process. Syst. 28 (2015)
Google Scholar
Bi, J., Gunn, S.R.: A variance controlled stochastic method with biased estimation for faster non-convex optimization. In: Oliver, N., Pérez-Cruz, F., Kramer, S., Read, J., Lozano, J.A. (eds.) ECML PKDD 2021. LNCS (LNAI), vol. 12977, pp. 135–150. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86523-8_9
Chapter Google Scholar
Hsu, T.M.H., Qi, H., Brown, M.: Measuring the effects of non-identical data distribution for federated visual classification. arXiv preprint arXiv:1909.06335 (2019)
Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. Adv. Neural. Inf. Process. Syst. 26, 315–323 (2013)
Google Scholar
Karimireddy, S.P., Kale, S., Mohri, M., Reddi, S., Stich, S., Suresh, A.T.: SCAFFOLD: stochastic controlled averaging for federated learning. In: International Conference on Machine Learning, pp. 5132–5143. PMLR (2020)
Google Scholar
Konečnỳ, J., Liu, J., Richtárik, P., Takáč, M.: Mini-batch semi-stochastic gradient descent in the proximal setting. IEEE J. Sel. Top. Signal Process. 10(2), 242–255 (2015)
Article Google Scholar
Konečnỳ, J., McMahan, H.B., Ramage, D., Richtárik, P.: Federated optimization: distributed machine learning for on-device intelligence. arXiv preprint arXiv:1610.02527 (2016)
Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images. University of Toronto, Technical report (2009)
Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Article Google Scholar
Li, D., Wang, J.: FedMD: heterogenous federated learning via model distillation. arXiv preprint arXiv:1910.03581 (2019)
Li, T., Sahu, A.K., Zaheer, M., Sanjabi, M., Talwalkar, A., Smith, V.: Federated optimization in heterogeneous networks. Proc. Mach. Learn. Syst. 2, 429–450 (2020)
Google Scholar
Li, T., Sahu, A.K., Zaheer, M., Sanjabi, M., Talwalkar, A., Smithy, V.: FedDANE: a federated newton-type method. In: 2019 53rd Asilomar Conference on Signals, Systems, and Computers, pp. 1227–1231. IEEE (2019)
Google Scholar
Liang, X., Shen, S., Liu, J., Pan, Z., Chen, E., Cheng, Y.: Variance reduced local SGD with lower communication complexity. arXiv preprint arXiv:1912.12844 (2019)
Lin, T., Kong, L., Stich, S.U., Jaggi, M.: Ensemble distillation for robust model fusion in federated learning. Adv. Neural. Inf. Process. Syst. 33, 2351–2363 (2020)
Google Scholar
McMahan, B., Moore, E., Ramage, D., Hampson, S., Arcas, B.A.: Communication-efficient learning of deep networks from decentralized data. In: Artificial Intelligence and Statistics, pp. 1273–1282. PMLR (2017)
Google Scholar
Murata, T., Suzuki, T.: Bias-variance reduced local SGD for less heterogeneous federated learning. In: International Conference on Machine Learning, pp. 7872–7881. PMLR (2021)
Google Scholar
Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: SARAH: a novel method for machine learning problems using stochastic recursive gradient. In: International Conference on Machine Learning, pp. 2613–2621. PMLR (2017)
Google Scholar
Pathak, R., Wainwright, M.J.: FedSplit: an algorithmic framework for fast federated optimization. Adv. Neural. Inf. Process. Syst. 33, 7057–7066 (2020)
Google Scholar
Reddi, S.J., Konečnỳ, J., Richtárik, P., Póczós, B., Smola, A.: AIDE: fast and communication efficient distributed optimization. arXiv preprint arXiv:1608.06879 (2016)
Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence _rate for finite training sets. Adv. Neural Inf. Process. Syst. 25 (2012)
Google Scholar
Shalev-Shwartz, S., Zhang, T.: Stochastic dual coordinate ascent methods for regularized loss minimization. J. Mach. Learn. Res. 14(2) (2013)
Google Scholar
Shamir, O., Srebro, N., Zhang, T.: Communication-efficient distributed optimization using an approximate newton-type method. In: International Conference on Machine Learning, pp. 1000–1008. PMLR (2014)
Google Scholar
Stich, S.U.: Local SGD converges fast and communicates little. In: International Conference on Learning Representations (2018)
Google Scholar
Wang, J., Liu, Q., Liang, H., Joshi, G., Poor, H.V.: Tackling the objective inconsistency problem in heterogeneous federated optimization. Adv. Neural. Inf. Process. Syst. 33, 7611–7623 (2020)
Google Scholar
Wang, J., Tantia, V., Ballas, N., Rabbat, M.: SlowMo: improving communication-efficient distributed SGD with slow momentum. arXiv preprint arXiv:1910.00643 (2019)
Xiao, L., Zhang, T.: A proximal stochastic gradient method with progressive variance reduction. SIAM J. Optim. 24(4), 2057–2075 (2014)
Article MathSciNet MATH Google Scholar
Yu, H., Jin, R., Yang, S.: On the linear speedup analysis of communication efficient momentum SGD for distributed non-convex optimization. In: International Conference on Machine Learning, pp. 7184–7193. PMLR (2019)
Google Scholar
Zhang, X., Hong, M., Dhople, S., Yin, W., Liu, Y.: FedPD: a federated learning framework with optimal rates and adaptivity to Non-IID data. arXiv preprint arXiv:2005.11418 (2020)
Zhao, Y., Li, M., Lai, L., Suda, N., Civin, D., Chandra, V.: Federated learning with Non-IID data. arXiv preprint arXiv:1806.00582 (2018)
Zhu, L., Han, S.: Deep leakage from gradients. In: Federated Learning, pp. 17–31. Springer (2020)
Google Scholar
Zhu, Z., Hong, J., Zhou, J.: Data-free knowledge distillation for heterogeneous federated learning. In: International Conference on Machine Learning, pp. 12878–12889. PMLR (2021)
Google Scholar

Download references

Acknowledgments

The first author wishes to express gratitude for the financial support provided by MITACS and Research Nova Scotia. In addition, the fifth author acknowledges Natural Sciences and Engineering research Council of Canada, CHIST-ERA grant CHIST-ERA-19-XAI-0 and the Polish NCN Agency NCN(grant No. 2020/02/Y/ST6/00064). We are grateful to Sai Praneeth Karimireddy, the first author of [7], for enlightening us on the proper implementation of SCAFFOLD. William Taylor-Melanson is also acknowledged for reviewing this paper and providing numerous helpful comments.

Author information

Authors and Affiliations

Dalhousie University, Halifax, Canada
Farshid Varno, Marzie Saghayi & Stan Matwin
Imagia Cybernetics Inc., Montreal, Canada
Farshid Varno, Laya Rafiee Sevyeri, Sharut Gupta & Mohammad Havaei
Concordia University, Montreal, Canada
Laya Rafiee Sevyeri
Indian Institute of Technology Delhi, New Delhi, India
Sharut Gupta
Polish Academy of Sciences, Warsaw, Poland
Stan Matwin

Authors

Farshid Varno
View author publications
You can also search for this author in PubMed Google Scholar
Marzie Saghayi
View author publications
You can also search for this author in PubMed Google Scholar
Laya Rafiee Sevyeri
View author publications
You can also search for this author in PubMed Google Scholar
Sharut Gupta
View author publications
You can also search for this author in PubMed Google Scholar
Stan Matwin
View author publications
You can also search for this author in PubMed Google Scholar
Mohammad Havaei
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Farshid Varno .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 770 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Varno, F., Saghayi, M., Rafiee Sevyeri, L., Gupta, S., Matwin, S., Havaei, M. (2022). AdaBest: Minimizing Client Drift in Federated Learning via Adaptive Bias Estimation. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13683. Springer, Cham. https://doi.org/10.1007/978-3-031-20050-2_41

Download citation

DOI: https://doi.org/10.1007/978-3-031-20050-2_41
Published: 28 October 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20049-6
Online ISBN: 978-3-031-20050-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

AdaBest: Minimizing Client Drift in Federated Learning via Adaptive Bias Estimation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Beyond Random Selection: A Perspective from Model Inversion in Personalized Federated Learning

FedAR: Addressing Client Unavailability in Federated Learning with Local Update Approximation and Rectification

FedSmart: An Auto Updating Federated Learning Optimization Mechanism

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 770 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

AdaBest: Minimizing Client Drift in Federated Learning via Adaptive Bias Estimation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Beyond Random Selection: A Perspective from Model Inversion in Personalized Federated Learning

FedAR: Addressing Client Unavailability in Federated Learning with Local Update Approximation and Rectification

FedSmart: An Auto Updating Federated Learning Optimization Mechanism

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 770 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation