Fair detection of poisoning attacks in federated learning on non-i.i.d. data

Singh, Ashneet Khandpur; Blanco-Justicia, Alberto; Domingo-Ferrer, Josep

doi:10.1007/s10618-022-00912-6

Fair detection of poisoning attacks in federated learning on non-i.i.d. data

Published: 04 January 2023

Volume 37, pages 1998–2023, (2023)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Ashneet Khandpur Singh¹,
Alberto Blanco-Justicia¹ &
Josep Domingo-Ferrer ORCID: orcid.org/0000-0001-7213-4962¹

22k Accesses
3 Altmetric
Explore all metrics

Abstract

Reconciling machine learning with individual privacy is one of the main motivations behind federated learning (FL), a decentralized machine learning technique that aggregates partial models trained by clients on their own private data to obtain a global deep learning model. Even if FL provides stronger privacy guarantees to the participating clients than centralized learning collecting the clients’ data in a central server, FL is vulnerable to some attacks whereby malicious clients submit bad updates in order to prevent the model from converging or, more subtly, to introduce artificial bias in the classification (poisoning). Poisoning detection techniques compute statistics on the updates to identify malicious clients. A downside of anti-poisoning techniques is that they might lead to discriminate minority groups whose data are significantly and legitimately different from those of the majority of clients. This would not only be unfair, but would yield poorer models that would fail to capture the knowledge in the training data, especially when data are not independent and identically distributed (non-i.i.d.). In this work, we strive to strike a balance between fighting poisoning and accommodating diversity to help learning fairer and less discriminatory federated learning models. In this way, we forestall the exclusion of diverse clients while still ensuring detection of poisoning attacks. Empirical work on three data sets shows that employing our approach to tell legitimate from malicious updates produces models that are more accurate than those obtained with state-of-the-art poisoning detection techniques. Additionally, we explore the impact of our proposal on the performance of models on non-i.i.d local training data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

FLGuard: Byzantine-Robust Federated Learning via Ensemble of Contrastive Models

CONTRA: Defending Against Poisoning Attacks in Federated Learning

Secure Federated Learning

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

Choosing K is not trivial since it is not known how many clusters there are. One can vary K until obtaining useful results.
Note that fairness metrics computed in what follows refer to clients, more precisely to the decision made by the model manager to accept or reject a client’s update. This is different from fairness referred to subjects, when the decision is made by the classifier to classify a subject’s record into the positive or negative category (e.g. > $50K, resp. $\le $ $50K in the case of Adult). To avoid confusion between both types of fairness, we did not compute PE or EO in the centralized baseline tables (Tables 2, 3, and 4).
https://github.com/yjlee22/FedShare.

References

Bhagoji AN, Chakraborty S, Mittal P, Calo S (2019) Analyzing federated learning through an adversarial lens. In: International conference on machine learning, PMLR, pp 634–643
Blanchard P, Guerraoui R, Stainer J et al (2017) Machine learning with adversaries: byzantine tolerant gradient descent. In: Advances in neural information processing systems, pp 119–129
Blanco-Justicia A, Domingo-Ferrer J, Martínez S, Sánchez D, Flanagan A, Tan KE (2020) Achieving security and privacy in federated learning systems: survey, research challenges and future directions. arXiv preprint arXiv:2012.06810
Domingo-Ferrer J, Mateo-Sanz JM (2002) Practical data-oriented microaggregation for statistical disclosure control. IEEE Trans Knowl Data Eng 14(1):189–201
Article Google Scholar
Domingo-Ferrer J, Torra V (2005) Ordinal, continuous and heterogeneous k-anonymity through microaggregation. Data Min Knowl Discov 11(2):195–212
Article MathSciNet Google Scholar
Domingo-Ferrer J, Blanco-Justicia A, Sánchez D, Jebreel N (2020) Co-utile peer-to-peer decentralized computing. 2020 20th IEEE/ACM Int Symp Clust. Cloud and internet computing (CCGRID), IEEE, pp 31–40
Google Scholar
Du W, Xu D, Wu X, Tong H (2021) Fairness-aware agnostic federated learning. In: Proceedings of the 2021 SIAM international conference on data mining (SDM), SIAM, pp 181–189
Dua D, Graff C (2017) Uci machine learning repository. http://archive.ics.uci.edu/ml
Ester M, Kriegel HP, Sander J, Xu X (1996) Density-based spatial clustering of applications with noise. In: International conference knowledge discovery and data mining, vol 240, p 6
Fang M, Cao X, Jia J, Gong N (2020) Local model poisoning attacks to byzantine-robust federated learning. In: 29th USENIX security symposium (USENIX Security 20), pp 1605–1622
Gander M, Felderer M, Katt B, Tolbaru A, Breu R, Moschitti A (2012) Anomaly detection in the cloud: detecting security incidents via machine learning. In: International workshop on eternal systems, Springer, pp 103–116
George A, Vidyapeetham A (2012) Anomaly detection based on machine learning: dimensionality reduction using pca and classification using svm. Int J Comput Appl 47(21):5–8
Google Scholar
Griffin RH (2018) 120 years of Olympic history: athletes and results. https://www.kaggle.com/heesoo37/120-years-of-olympic-history-athletes-and-results
Hsieh K, Phanishayee A, Mutlu O, Gibbons P (2020) The non-iid data quagmire of decentralized machine learning. In: International conference on machine learning, PMLR, pp 4387–4398
Jeong E, Oh S, Kim H, Park J, Bennis M, Kim SL (2018) Communication-efficient on-device machine learning: federated distillation and augmentation under non-iid private data. arXiv preprint arXiv:1811.11479
Kairouz P, McMahan HB, Avent B, Bellet A, Bennis M, Bhagoji AN, Bonawitz K, Charles Z, Cormode G, Cummings R et al (2019) Advances and open problems in federated learning. arXiv preprint arXiv:1912.04977
Khandpur Singh A, Blanco-Justicia A, Domingo-Ferrer J, Sánchez D, Rebollo Monedero D (2020) Fair detection of poisoning attacks in federated learning. In: 32th IEEE international conference on tools with artificial intelligence – ICTAI 2020, IEEE, pp 224–229
Konečnỳ J, McMahan HB, Yu FX, Richtárik P, Suresh AT, Bacon D (2016) Federated learning: strategies for improving communication efficiency. arXiv preprint arXiv:1610.05492
Li S, Cheng Y, Liu Y, Wang W, Chen T (2019) Abnormal client behavior detection in federated learning. arXiv preprint arXiv:1910.09933
Li S, Cheng Y, Wang W, Liu Y, Chen T (2020) Learning to detect malicious clients for robust federated learning. arXiv preprint arXiv:2002.00211
Li T, Hu S, Beirami A, Smith V (2021) Ditto: Fair and robust federated learning through personalization. In: International conference on machine learning, PMLR, pp 6357–6368
Li X, Huang K, Yang W, Wang S, Zhang Z (2019) On the convergence of fedavg on non-iid data. arXiv preprint arXiv:1907.02189
Lyu L, Xu X, Wang Q, Yu H (2020) Collaborative fairness in federated learning. In: Federated learning, Springer, pp 189–204
McMahan B, Moore E, Ramage D, Hampson S, Aguera-y Arcas B (2017) Communication-efficient learning of deep networks from decentralized data. In: Artificial intelligence and statistics, PMLR, pp 1273–1282
Narayanan A (2018) Translation tutorial: 21 fairness definitions and their politics. In: Proceedings conference fairness accountability transparency, New York, USA, vol 1170
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V et al (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830
MathSciNet MATH Google Scholar
S Moro PC, Rita P (2014) Bank marketing dataset. https://archive.ics.uci.edu/ml/datasets/Bank+Marketing
Sander J, Ester M, Kriegel HP, Xu X (1998) Density-based clustering in spatial databases: the algorithm gdbscan and its applications. Data Min Knowl Discov 2(2):169–194
Article Google Scholar
Torra V (2004) Microaggregation for categorical variables: a median based approach. In: International workshop on privacy in statistical databases, Springer, pp 162–174
Verma S, Rubin J (2018) Fairness definitions explained. In: 2018 ieee/acm international workshop on software fairness (fairware), IEEE, pp 1–7
Wang J, Charles Z, Xu Z, Joshi G, McMahan HB, Aguera-y Arcas B, Al-Shedivat M, Andrew G, Avestimehr S, Daly K, Data D, Diggavi S, Eichner H, Gadhikar A, Garrett Z, Girgis AM, Hanzely F, Hard A, He C, Horvath S, Huo Z, Ingerman A, Jaggi M, Javidi T, Kairouz P, Kale S, Karimireddy SP, Konecny J, Koyejo S, Li T, Liu L, Mohri M, Qi H, Reddi SJ, Richtarik P, Singhal K, Smith V, Soltanolkotabi M, Song W, Suresh AT, Stich SU, Talwakar A, Wang H, Woodworth B, Wu S, Yu FX, Yuan H, Zaheer M, Zhang M, Zhang T, Zheng C, Zhu C, Zhu W (2021) A field guide to federated optimization. arXiv preprint arXiv:2107.06917v1
Yin D, Chen Y, Ramchandran K, Bartlett P (2018) Byzantine-robust distributed learning: towards optimal statistical rates. arXiv preprint arXiv:1803.01498
Zhang X, Hong M, Dhople S, Yin W, Liu Y (2020) Fedpd: A federated learning framework with optimal rates and adaptivity to non-iid data. arXiv preprint arXiv:2005.11418
Zhao Y, Li M, Lai L, Suda N, Civin D, Chandra V (2018) Federated learning with non-iid data. arXiv preprint arXiv:1806.00582

Download references

Acknowledgements

We are indebted to the late David Rebollo-Monedero for his helpful comments and contributions on an earlier version of this paper. The following funding sources are gratefully acknowledged: European Commission (projects H2020-871042 “SoBigData++” and H2020-101006879 “MobiDataLab”), the Government of Catalonia (ICREA Acadèmia Prize to J. Domingo-Ferrer), and the Spanish MCIN/AEI /10.13039/501100011033 /FEDER, UE under project PID2021-123637NB-I00 “CURLING”. The authors are with the UNESCO Chair in Data Privacy, but the views in this paper are their own and are not necessarily shared by UNESCO.

Author information

Authors and Affiliations

Department of Computer Engineering and Mathematics, Universitat Rovira i Virgili, CYBERCAT-Center for Cybersecurity Research of Catalonia, UNESCO Chair in Data Privacy, Av. Països Catalans 26, 43007, Tarragona, Catalonia, Spain
Ashneet Khandpur Singh, Alberto Blanco-Justicia & Josep Domingo-Ferrer

Authors

Ashneet Khandpur Singh
View author publications
You can also search for this author inPubMed Google Scholar
Alberto Blanco-Justicia
View author publications
You can also search for this author inPubMed Google Scholar
Josep Domingo-Ferrer
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Josep Domingo-Ferrer.

Additional information

Responsible editor: Toon Calders, Salvatore Ruggieri, Bodo Rosenhahn, Mykola Pechenizkiy and Eirini Ntoutsi.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Singh, A.K., Blanco-Justicia, A. & Domingo-Ferrer, J. Fair detection of poisoning attacks in federated learning on non-i.i.d. data. Data Min Knowl Disc 37, 1998–2023 (2023). https://doi.org/10.1007/s10618-022-00912-6

Download citation

Received: 21 June 2021
Accepted: 15 December 2022
Published: 04 January 2023
Issue Date: September 2023
DOI: https://doi.org/10.1007/s10618-022-00912-6

Keywords

Part of a collection:

Special Issue on Bias and Fairness

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Fair detection of poisoning attacks in federated learning on non-i.i.d. data

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

FLGuard: Byzantine-Robust Federated Learning via Ensemble of Contrastive Models

CONTRA: Defending Against Poisoning Attacks in Federated Learning

Secure Federated Learning

Explore related subjects

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now