Fair detection of poisoning attacks in federated learning on non-i.i.d. data | Data Mining and Knowledge Discovery Skip to main content
Log in

Fair detection of poisoning attacks in federated learning on non-i.i.d. data

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Reconciling machine learning with individual privacy is one of the main motivations behind federated learning (FL), a decentralized machine learning technique that aggregates partial models trained by clients on their own private data to obtain a global deep learning model. Even if FL provides stronger privacy guarantees to the participating clients than centralized learning collecting the clients’ data in a central server, FL is vulnerable to some attacks whereby malicious clients submit bad updates in order to prevent the model from converging or, more subtly, to introduce artificial bias in the classification (poisoning). Poisoning detection techniques compute statistics on the updates to identify malicious clients. A downside of anti-poisoning techniques is that they might lead to discriminate minority groups whose data are significantly and legitimately different from those of the majority of clients. This would not only be unfair, but would yield poorer models that would fail to capture the knowledge in the training data, especially when data are not independent and identically distributed (non-i.i.d.). In this work, we strive to strike a balance between fighting poisoning and accommodating diversity to help learning fairer and less discriminatory federated learning models. In this way, we forestall the exclusion of diverse clients while still ensuring detection of poisoning attacks. Empirical work on three data sets shows that employing our approach to tell legitimate from malicious updates produces models that are more accurate than those obtained with state-of-the-art poisoning detection techniques. Additionally, we explore the impact of our proposal on the performance of models on non-i.i.d local training data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. Choosing K is not trivial since it is not known how many clusters there are. One can vary K until obtaining useful results.

  2. Note that fairness metrics computed in what follows refer to clients, more precisely to the decision made by the model manager to accept or reject a client’s update. This is different from fairness referred to subjects, when the decision is made by the classifier to classify a subject’s record into the positive or negative category (e.g. > $50K, resp. \(\le \) $50K in the case of Adult). To avoid confusion between both types of fairness, we did not compute PE or EO in the centralized baseline tables (Tables 2, 3, and 4).

  3. https://github.com/yjlee22/FedShare.

References

  • Bhagoji AN, Chakraborty S, Mittal P, Calo S (2019) Analyzing federated learning through an adversarial lens. In: International conference on machine learning, PMLR, pp 634–643

  • Blanchard P, Guerraoui R, Stainer J et al (2017) Machine learning with adversaries: byzantine tolerant gradient descent. In: Advances in neural information processing systems, pp 119–129

  • Blanco-Justicia A, Domingo-Ferrer J, Martínez S, Sánchez D, Flanagan A, Tan KE (2020) Achieving security and privacy in federated learning systems: survey, research challenges and future directions. arXiv preprint arXiv:2012.06810

  • Domingo-Ferrer J, Mateo-Sanz JM (2002) Practical data-oriented microaggregation for statistical disclosure control. IEEE Trans Knowl Data Eng 14(1):189–201

    Article  Google Scholar 

  • Domingo-Ferrer J, Torra V (2005) Ordinal, continuous and heterogeneous k-anonymity through microaggregation. Data Min Knowl Discov 11(2):195–212

    Article  MathSciNet  Google Scholar 

  • Domingo-Ferrer J, Blanco-Justicia A, Sánchez D, Jebreel N (2020) Co-utile peer-to-peer decentralized computing. 2020 20th IEEE/ACM Int Symp Clust. Cloud and internet computing (CCGRID), IEEE, pp 31–40

    Google Scholar 

  • Du W, Xu D, Wu X, Tong H (2021) Fairness-aware agnostic federated learning. In: Proceedings of the 2021 SIAM international conference on data mining (SDM), SIAM, pp 181–189

  • Dua D, Graff C (2017) Uci machine learning repository. http://archive.ics.uci.edu/ml

  • Ester M, Kriegel HP, Sander J, Xu X (1996) Density-based spatial clustering of applications with noise. In: International conference knowledge discovery and data mining, vol 240, p 6

  • Fang M, Cao X, Jia J, Gong N (2020) Local model poisoning attacks to byzantine-robust federated learning. In: 29th USENIX security symposium (USENIX Security 20), pp 1605–1622

  • Gander M, Felderer M, Katt B, Tolbaru A, Breu R, Moschitti A (2012) Anomaly detection in the cloud: detecting security incidents via machine learning. In: International workshop on eternal systems, Springer, pp 103–116

  • George A, Vidyapeetham A (2012) Anomaly detection based on machine learning: dimensionality reduction using pca and classification using svm. Int J Comput Appl 47(21):5–8

    Google Scholar 

  • Griffin RH (2018) 120 years of Olympic history: athletes and results. https://www.kaggle.com/heesoo37/120-years-of-olympic-history-athletes-and-results

  • Hsieh K, Phanishayee A, Mutlu O, Gibbons P (2020) The non-iid data quagmire of decentralized machine learning. In: International conference on machine learning, PMLR, pp 4387–4398

  • Jeong E, Oh S, Kim H, Park J, Bennis M, Kim SL (2018) Communication-efficient on-device machine learning: federated distillation and augmentation under non-iid private data. arXiv preprint arXiv:1811.11479

  • Kairouz P, McMahan HB, Avent B, Bellet A, Bennis M, Bhagoji AN, Bonawitz K, Charles Z, Cormode G, Cummings R et al (2019) Advances and open problems in federated learning. arXiv preprint arXiv:1912.04977

  • Khandpur Singh A, Blanco-Justicia A, Domingo-Ferrer J, Sánchez D, Rebollo Monedero D (2020) Fair detection of poisoning attacks in federated learning. In: 32th IEEE international conference on tools with artificial intelligence – ICTAI 2020, IEEE, pp 224–229

  • Konečnỳ J, McMahan HB, Yu FX, Richtárik P, Suresh AT, Bacon D (2016) Federated learning: strategies for improving communication efficiency. arXiv preprint arXiv:1610.05492

  • Li S, Cheng Y, Liu Y, Wang W, Chen T (2019) Abnormal client behavior detection in federated learning. arXiv preprint arXiv:1910.09933

  • Li S, Cheng Y, Wang W, Liu Y, Chen T (2020) Learning to detect malicious clients for robust federated learning. arXiv preprint arXiv:2002.00211

  • Li T, Hu S, Beirami A, Smith V (2021) Ditto: Fair and robust federated learning through personalization. In: International conference on machine learning, PMLR, pp 6357–6368

  • Li X, Huang K, Yang W, Wang S, Zhang Z (2019) On the convergence of fedavg on non-iid data. arXiv preprint arXiv:1907.02189

  • Lyu L, Xu X, Wang Q, Yu H (2020) Collaborative fairness in federated learning. In: Federated learning, Springer, pp 189–204

  • McMahan B, Moore E, Ramage D, Hampson S, Aguera-y Arcas B (2017) Communication-efficient learning of deep networks from decentralized data. In: Artificial intelligence and statistics, PMLR, pp 1273–1282

  • Narayanan A (2018) Translation tutorial: 21 fairness definitions and their politics. In: Proceedings conference fairness accountability transparency, New York, USA, vol 1170

  • Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V et al (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830

    MathSciNet  MATH  Google Scholar 

  • S Moro PC, Rita P (2014) Bank marketing dataset. https://archive.ics.uci.edu/ml/datasets/Bank+Marketing

  • Sander J, Ester M, Kriegel HP, Xu X (1998) Density-based clustering in spatial databases: the algorithm gdbscan and its applications. Data Min Knowl Discov 2(2):169–194

    Article  Google Scholar 

  • Torra V (2004) Microaggregation for categorical variables: a median based approach. In: International workshop on privacy in statistical databases, Springer, pp 162–174

  • Verma S, Rubin J (2018) Fairness definitions explained. In: 2018 ieee/acm international workshop on software fairness (fairware), IEEE, pp 1–7

  • Wang J, Charles Z, Xu Z, Joshi G, McMahan HB, Aguera-y Arcas B, Al-Shedivat M, Andrew G, Avestimehr S, Daly K, Data D, Diggavi S, Eichner H, Gadhikar A, Garrett Z, Girgis AM, Hanzely F, Hard A, He C, Horvath S, Huo Z, Ingerman A, Jaggi M, Javidi T, Kairouz P, Kale S, Karimireddy SP, Konecny J, Koyejo S, Li T, Liu L, Mohri M, Qi H, Reddi SJ, Richtarik P, Singhal K, Smith V, Soltanolkotabi M, Song W, Suresh AT, Stich SU, Talwakar A, Wang H, Woodworth B, Wu S, Yu FX, Yuan H, Zaheer M, Zhang M, Zhang T, Zheng C, Zhu C, Zhu W (2021) A field guide to federated optimization. arXiv preprint arXiv:2107.06917v1

  • Yin D, Chen Y, Ramchandran K, Bartlett P (2018) Byzantine-robust distributed learning: towards optimal statistical rates. arXiv preprint arXiv:1803.01498

  • Zhang X, Hong M, Dhople S, Yin W, Liu Y (2020) Fedpd: A federated learning framework with optimal rates and adaptivity to non-iid data. arXiv preprint arXiv:2005.11418

  • Zhao Y, Li M, Lai L, Suda N, Civin D, Chandra V (2018) Federated learning with non-iid data. arXiv preprint arXiv:1806.00582

Download references

Acknowledgements

We are indebted to the late David Rebollo-Monedero for his helpful comments and contributions on an earlier version of this paper. The following funding sources are gratefully acknowledged: European Commission (projects H2020-871042 “SoBigData++” and H2020-101006879 “MobiDataLab”), the Government of Catalonia (ICREA Acadèmia Prize to J. Domingo-Ferrer), and the Spanish MCIN/AEI /10.13039/501100011033 /FEDER, UE under project PID2021-123637NB-I00 “CURLING”. The authors are with the UNESCO Chair in Data Privacy, but the views in this paper are their own and are not necessarily shared by UNESCO.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Josep Domingo-Ferrer.

Additional information

Responsible editor: Toon Calders, Salvatore Ruggieri, Bodo Rosenhahn, Mykola Pechenizkiy and Eirini Ntoutsi.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Singh, A.K., Blanco-Justicia, A. & Domingo-Ferrer, J. Fair detection of poisoning attacks in federated learning on non-i.i.d. data. Data Min Knowl Disc 37, 1998–2023 (2023). https://doi.org/10.1007/s10618-022-00912-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-022-00912-6

Keywords