Abstract
We revisit the problem of fair clustering, first introduced by Chierichetti et al. (Fair clustering through fairlets, 2017), which requires each protected attribute to have approximately equal representation in every cluster, i.e., a Balance property. Existing solutions to fair clustering are either not scalable or do not achieve an optimal trade-off between clustering objectives and fairness. In this paper, we propose a new notion of fairness which we call \(\varvec{\tau }\)-ratio fairness, that strictly generalizes the Balance property and enables a fine-grained efficiency vs. fairness trade-off. Furthermore, we show that a simple greedy round-robin-based algorithm achieves this trade-off efficiently. Under a more general setting of multi-valued protected attributes, we rigorously analyze the theoretical properties of the proposed algorithm, the Fair Round-Robin Algorithm for Clustering Over-End (\({\textsc {FRAC}}_{OE}\)). We also propose a heuristic algorithm, Fair Round-Robin Algorithm for Clustering (FRAC), that applies round-robin allocation at each iteration of a vanilla clustering algorithm. Our experimental results suggest that both FRAC and \({\textsc {FRAC}}_{OE}\) outperform all the state-of-the-art algorithms and work exceptionally well even for a large number of clusters.












Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Availability of data and material
All datasets used in the experiments are publicly available on UCI repository.
Notes
Throughout the paper, for simplicity, we call a k-clustering algorithm as a clustering algorithm.
Otherwise, the problem is uninteresting as the balanced clustering may not be feasible.
It is unclear why a certain assignment to a specific cluster helps maintain a fairness guarantee?
For the sake of simplicity, we assume \(\tau _{\ell } n_{\ell } \in {\mathbb {N}}\) and ignore the floor notation.
Note that an optimal fair allocation need not be unique. Our result holds for any optimal fair allocation.
This cycle would have resulted in further reduction of cost with respect to the points in the cycle.
References
Abbasi M, Bhaskara A, Venkatasubramanian S (2021) Fair clustering via equitable group representations. In: Proceedings of the 2021 ACM conference on fairness, accountability, and transparency, pp 504–514
Abraham SS, Padmanabhan D, Sundaram SS (2020) Fairness in clustering with multiple sensitive attributes. In: EDBT/ICDT 2020 joint conference, pp 287–298
Ahmadian S, Epasto A, Kumar R, Mahdian M (2019) Clustering without over-representation. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery and data mining, pp 267–275
Ahmadian S, Epasto A, Kumar R, Mahdian M (2020) Fair correlation clustering. In: Chiappa S, Calandra R (eds) Proceedings of the twenty third international conference on artificial intelligence and statistics, volume 108 of proceedings of machine learning research, pp 4195–4205
Anderson N, Bera SK, Das S, Liu Y (2020) Distributional individual fairness in clustering. arXiv:2006.12589
Anegg G, Angelidakis H, Kurpisz A, Zenklusen R (2020) A technique for obtaining true approximations for k-center with covering constraints. In: International conference on integer programming and combinatorial optimization, pp 52–65. Springer
Backurs A, Indyk P, Onak K, Schieber B, Vakilian A, Wagner T (2019) Scalable fair clustering. In: International conference on machine learning, pp 405–413
Bandyapadhyay S, Fomin FV, Simonov K (2020) On coresets for fair clustering in metric and euclidean spaces and their applications. arXiv:2007.10137
Bandyapadhyay S, Inamdar T, Pai S, Varadarajan K (2019) A constant approximation for colorful k-center. arXiv:1907.08906
Banerjee A, Ghosh J (2006) Scalable clustering algorithms with balancing constraints. Data Min Knowl Discov 13(3):365–395
Barocas S, Selbst AD (2016) Big data’s disparate impact. California Law Rev, 671–732
Baumann E, Rumberger JL (2018) State of the art in fair ML: from moral philosophy and legislation to fair classifiers. CoRR arXiv:1811.09539
Bera S, Chakrabarty D, Flores N, Negahbani M (2019) Fair algorithms for clustering. Adv Neural Inf Process Syst 32:4954–4965
Bercea IO, Groß M, Khuller S, Kumar A, Rösner C, Schmidt DR, Schmidt M (2018) On the cost of essentially fair clusterings. arXiv:1811.10319
Böhm M, Fazzone A, Leonardi S, Schwiegelshohn C (2020) Fair clustering with multiple colors. arXiv:2002.07892
Bose A, Hamilton W (2019) Compositional fairness constraints for graph embeddings. In: International conference on machine learning, pp 715–724
Bottou L, Bengio Y (1994) Convergence properties of the k-means algorithms. Advances in neural information processing systems 7
Brubach B, Chakrabarti D, Dickerson J, Khuller S, Srinivasan A, Tsepenekas L (2020) A pairwise fair and community-preserving approach to k-center clustering. In: International conference on machine learning, pp 1178–1189
Brubach B, Chakrabarti D, Dickerson JP, Srinivasan A, Tsepenekas L (2021) Fairness, semi-supervised learning, and more: A general framework for clustering with stochastic pairwise constraints. In: Proceedings of the AAAI conference on artificial intelligence, vol 35, pp 6822–6830
Byrka J, Pensyl T, Rybicki B, Srinivasan A, Trinh K (2014) An improved approximation for k-median, and positive correlation in budgeted optimization. In: Proceedings of the twenty-sixth annual ACM-SIAM symposium on discrete algorithms, pp 737–756. SIAM
Carey AN, Wu X (2022) The fairness field guide: perspectives from social and formal sciences. arXiv:2201.05216
Chakrabarti D, Dickerson JP, Esmaeili SA, Srinivasan A, Tsepenekas L (2022) A new notion of individually fair clustering: \(\alpha \)-equitable \(k\)-center. In: International conference on artificial intelligence and statistics, pp 6387–6408
Chan TH, Guerqin A, Sozio M (2018) Fully dynamic k-center clustering. In: Proceedings of the 2018 World Wide Web conference, pp 579–587
Chen X, Fain B, Lyu L, Munagala K (2019) Proportionally fair clustering. In: International conference on machine learning, pp 1032–1041
Chhabra A, Masalkovaitė K, Mohapatra P (2021) An overview of fairness in clustering. IEEE Access 9:130698–130720. https://doi.org/10.1109/ACCESS.2021.3114099
Chhabra A, Singla A, Mohapatra P (2021) Fair clustering using antidote data. arXiv:2106.00600
Chierichetti F, Kumar R, Lattanzi S, Vassilvitskii S (2017) Fair clustering through fairlets. In: Proceedings of the 31st international conference on neural information processing systems, pp 5036–5044
Chikahara Y, Sakaue S, Fujino A, Kashima H (2021) Learning individually fair classifier with path-specific causal-effect constraint. In: International conference on artificial intelligence and statistics, pp 145–153
Chlamtáč E, Makarychev Y, Vakilian A (2022) Approximating fair clustering with cascaded norm objectives. In: Proceedings of the 2022 annual ACM-SIAM symposium on discrete algorithms (SODA), pp 2664–2683. SIAM
Cho J, Hwang G, Suh C (2020) A fair classifier using mutual information. In: 2020 IEEE international symposium on information theory (ISIT), pp 2521–2526
Correa J, Cristi A, Duetting P, Norouzi-Fard A (2021) Fairness and bias in online selection. In: International conference on machine learning, pp 2112–2121
Dastin J (2018) Amazon scraps secret ai recruiting tool that showed bias against women. https://www.reuters.com/article/us-amazon-com-jobs-automation-insight-idUSKCN1MK08G. [Online; accessed 15-August-2021]
Davidson I, Ravi S (2020) Making existing clusterings fairer: algorithms, complexity results and insights. In: Proceedings of the AAAI conference on artificial intelligence, vol 34, pp 3733–3740
Deepak, Abraham SS (2020). Representativity fairness in clustering. In: 12th ACM conference on web science
Deepak JM, Jose SV (2020) Fairness in unsupervised learning. In: Proceedings of the 29th ACM international conference on information & knowledge management, CIKM ’20, New York, NY, USA, pp 3511–3512. Association for Computing Machinery
Dwork C, Hardt M, Pitassi T, Reingold O, Zemel R (2012) Fairness through awareness. In: Proceedings of the 3rd innovations in theoretical computer science conference, pp 214–226
Elzayn H, Jabbari S, Jung C, Kearns M, Neel S, Roth A, Schutzman Z (2019) Fair algorithms for learning in allocation problems. In: Proceedings of the conference on fairness, accountability, and transparency, pp 170–179
Esmaeili S, Brubach B, Srinivasan A, Dickerson J (2021) Fair clustering under a bounded cost. Adv Neural Inf Process Syst 34:14345–14357
Esmaeili S, Brubach B, Tsepenekas L, Dickerson J (2020) Probabilistic fair clustering. Adv Neural Inf Process Syst 33:12743–12755
Feng Z, Kacham P, Woodruff D (2021) Dimensionality reduction for the sum-of-distances metric. In: International conference on machine learning, pp 3220–3229
Ghadiri M, Samadi S, Vempala S (2021) Socially fair k-means clustering. In: Proceedings of the 2021 ACM conference on fairness, accountability, and transparency, pp 438–448
Ghadiri M, Singh M, Vempala SS (2022) Constant-factor approximation algorithms for socially fair \( k \)-clustering. arXiv preprint arXiv:2206.11210
Gong S, Liu X, Jain AK (2021) Mitigating face recognition bias via group adaptive classifier. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3414–3424
Goyal D, Jaiswal R (2021) Tight fpt approximation for socially fair clustering. arXiv:2106.06755
Han L, Xu D, Xu Y, Yang P (2022) Approximation algorithms for the individually fair k-center with outliers. J Global Optim, 1–16
Harb E, Lam HS (2020) KFC: a scalable approximation algorithm for \( k \)- center fair clustering. Adv Neural Inf Process Syst 33:14509–14519
Harris DG, Pensyl T, Srinivasan A, Trinh K (2019) A lottery model for center-type problems with outliers. ACM Trans Algorithms 15(3):1–25
Huang L, Jiang S, Vishnoi N (2019) Coresets for clustering with fairness constraints. Adv Neural Inf Process Syst 32:7589–7600
Jia X, Sheth K, Svensson O (2020) Fair colorful k-center clustering. In: International conference on integer programming and combinatorial optimization, pp 209–222. Springer
Jones M, Nguyen H, Nguyen T (2020) Fair k-centers via maximum matching. In: International conference on machine learning, pp 4940–4949
Julia A, Larson J, Mattu S, Kirchner L (2016) Propublica–machine bias. https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing. [Online; accessed 13-August-2021]
Jung C, Kannan S, Lutz N (2020) Service in your neighborhood: fairness in center location. Foundations of Responsible Computing (FORC)
Kalyanakrishnan S (2016) \(k\)-means clustering. https://www.cse.iitb.ac.in/~shivaram/teaching/old/cs344+386-s2017/resources/classnote-2.pdf. [Online; accessed 29-May-2022]
Kar D, Medya S, Mandal D, Silva A, Dey P, Sanyal S (2021) Feature-based individual fairness in k-clustering. arXiv:2109.04554
Kleindessner M, Awasthi P, Morgenstern J (2019) Fair k-center clustering for data summarization. In: International conference on machine learning, pp 3448–3457
Kleindessne M, Awasthi P, Morgenstern J (2020) A notion of individual fairness for clustering. arXiv:2006.04960
Kleindessner M, Samadi S, Awasthi P, Morgenstern J (2019) Guarantees for spectral clustering with fairness constraints. In: International conference on machine learning, pp 3458–3467
Krause A (2016) Clustering and \(k\)-means. https://las.inf.ethz.ch/courses/lis-s16/hw/hw4_sol.pdf. [Online; accessed 29-May-2022]
Kriegel HP, Schubert E, Zimek A (2017) The (black) art of runtime evaluation: are we comparing algorithms or implementations? Knowl Inf Syst 52(2):341–378. https://doi.org/10.1007/s10115-016-1004-2
Le Quy T, Ntoutsi E (2021) Towards fair, explainable and actionable clustering for learning analytics. In: EDM
Le Quy T, Roy A, Iosifidis V, Zhang W, Ntoutsi E (2022) A survey on datasets for fairness-aware machine learning. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, e1452
Lee JK, Bu Y, Rajan D, Sattigeri P, Panda R, Das S, Wornell GW (2021) Fair selective classification via sufficiency. In: International conference on machine learning, pp 6076–6086
Li B, Li L, Sun A, Wang C, Wang Y (2021) Approximate group fairness for clustering. In: International conference on machine learning, pp 6381–6391
Li P, Zhao H, Liu H (2020) Deep fair clustering for visual learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR)
Liu S, Vicente LN (2021) A stochastic alternating balance \( k \)-means algorithm for fair clustering. arXiv:2105.14172
Lohaus M, Perrot M, Luxburg UV (2020) Too relaxed to be fair. In: III HD, Singh A (eds) Proceedings of the 37th international conference on machine learning, vol. 119 of proceedings of machine learning research, pp 6360–6369
Mahabadi S, Vakilian A (2020) Individual fairness for k-clustering. In: International conference on machine learning, pp 6586–6596
Makarychev Y, Vakilian A (2021) Approximation algorithms for socially fair clustering. arXiv:2103.02512
Mehrabi N, Morstatter F, Saxena N, Lerman K, Galstyan A (2021) A survey on bias and fairness in machine learning. ACM Comput Surv 54(6). https://doi.org/10.1145/3457607
Micha E, Shah N (2020) Proportionally fair clustering revisited. In: 47th international colloquium on automata, languages, and pogramming (ICALP 2020). Schloss Dagstuhl-Leibniz-Zentrum für Informatik
Negahbani M, Chakrabarty D (2021) Better algorithms for individually fair \( k \)-clustering. Adv Neural Inf Process Syst 34:13340–13351
Ntoutsi E, Fafalios P, Gadiraju U, Iosifidis V, Nejdl W, Vidal ME, Ruggieri S, Turini F, Papadopoulos S, Krasanakis E et al (2020) Bias in data-driven artificial intelligence systems-an introductory survey. Wiley Interdiscip Rev Data Min Knowl Discov 10(3):e1356
Padmanabhan D (2020) Whither fair clustering? In: AI for social good: Harvard CRCS Workshop
Quy TL, Roy A, Friege G, Ntoutsi E (2021) Fair-capacitated clustering. arXiv:2104.12116
Ranzato F, Urban C, Zanella M (2021) Fair training of decision tree classifiers. arXiv:2101.00909
Rösner C, Schmidt M (2018) Privacy preserving clustering with constraints. arXiv:1802.02497
Schmidt M, Schwiegelshohn C, Sohler C (2019) Fair coresets and streaming algorithms for fair k-means. In: International Workshop on Approximation and Online Algorithms, pp 232–251. Springer
Schmidt M, Wargalla J (2021) Coresets for constrained k-median and k-means clustering in low dimensional euclidean space. arXiv:2106.07319
Song H, Li P, Liu H (2021) Deep clustering based fair outlier detection. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery &; Data Mining, KDD ’21, New York, NY, USA, pp 1481-1489. Association for Computing Machinery
Thejaswi S, Ordozgoiti B, Gionis A (2021) Diversity-aware k-median: Clustering with fair center representation. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp 765–780. Springer
Vakilian A, Yalciner M (2022) Improved approximation algorithms for individually fair clustering. In: International Conference on Artificial Intelligence and Statistics, pp 8758–8779. PMLR
Wang B, Davidson I (2019) Towards fair deep clustering with multi-state protected variables. arXiv preprint arXiv:1901.10053
Zhang H, Davidson I (2021) Deep fair discriminative clustering. arXiv preprint arXiv:2105.14146
Zhang W, Bifet A, Zhang X, Weiss JC, Nejdl W (2021) Farf: A fair and adaptive random forests classifier, Advances in Knowledge Discovery and Data Mining, 245–256. Springer International Publishing
Ziko IM, Yuan J, Granger E, Ayed IB (2021) Variational fair clustering. In: Proceedings of the AAAI Conference on Artificial Intelligence, Volume 35, pp 11202–11209
Funding
The research is funded by Department of Science & Technology, India under grant number SRG/2020/001138 (Recipient name- Dr. Shweta Jain). The authors would like to thank the Prime Minister Research Fellowship, the Ministry of Education, Government of India for generously funding the primary author.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
No potential competing interest was reported by the authors.
Code availability
The code has been made publicly available at https://github.com/shivi98g/Fair-k-means-Clustering-via-Algorithmic-Fairness
Ethics approval
Not applicable
Consent for publication
The paper is the authors’ own original work, which has not been previously published elsewhere. The paper is not currently being considered for publication elsewhere. The paper reflects the authors’ own research and analysis in a truthful and complete manner. The paper properly credits the meaningful contributions of co-authors and co-researchers.
Additional information
Responsible editor: T. Calders, E. Ntoutsi, M. Pechenizkiy, B. Rosenhahn, S. Ruggieri.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Gupta, S., Ghalme, G., Krishnan, N.C. et al. Efficient algorithms for fair clustering with a new notion of fairness. Data Min Knowl Disc 37, 1959–1997 (2023). https://doi.org/10.1007/s10618-023-00928-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-023-00928-6