Abstract
Subspace clustering is one of the efficient techniques for determining the clusters in different subsets of dimensions. Ideally, these techniques should find all possible non-redundant clusters in which the data point participates. Unfortunately, existing hard subspace clustering algorithms fail to satisfy this property. Additionally, with the increase in dimensions of data, classical subspace algorithms become inefficient. This work presents a new density-based subspace clustering algorithm (S_FAD) to overcome the drawbacks of classical algorithms. S_FAD is based on a bottom-up approach and finds subspace clusters of varied density using different parameters of the DBSCAN algorithm. The algorithm optimizes parameters of the DBCAN algorithm through a hybrid meta-heuristic algorithm and uses hashing concepts to discover all non-redundant subspace clusters. The efficacy of S_FAD is evaluated against various existing subspace clustering algorithms on artificial and real datasets in terms of F_Score and rand_index. Performance is assessed based on three parameters: average ranking, SRR ranking, and scalability on varied dimensions. Statistical analysis is performed through the Wilcoxon signed-rank test. Results reveal that S_FAD performs considerably better on the majority of the datasets and scales well up to 6400 dimensions on the actual dataset.













Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Abualigah LMQ (2019) Feature selection and enhanced krill herd algorithm for text document clustering. Springer
Abualigah L, Gandomi AH, Elaziz MA, Hussien AG, Khasawneh AM, Alshinwan M, Houssein EH (2020) Nature-inspired optimization algorithms for text document clustering: a comprehensive analysis. Algorithms. https://doi.org/10.3390/a13120345
Abualigah L, Diabat A, Mirjalili S, Abd Elaziz M, Gandomi AH (2021a) The arithmetic optimization algorithm. Comput Methods Appl Mech Eng 376:1609. https://doi.org/10.1016/j.cma.2020.113609
Abualigah L, Gandomi AH, Elaziz MA, Hamad HA, Omari M, Alshinwan M, Khasawneh AM (2021b) Advances in meta-heuristic optimization algorithms in big data text clustering. Electronics. https://doi.org/10.3390/electronics10020101
Agarwal P, Mehta S (2014) Nature-inspired algorithms: state-of-art, problems and prospects. International Journal of Computer Applications 100(14):14–21. https://doi.org/10.5120/17593-8331
Agarwal P, Mehta S (2015) Comparative analysis of nature inspired algorithms on data clustering. In: IEEE international conference on research in computational intelligence and communication networks (ICRCICN), pp 119–124
Agarwal P, Mehta S (2016) Enhanced flower pollination algorithm on data clustering. Int J Comput Appl 7074:144–155. https://doi.org/10.1080/1206212X.2016.1224401
Agarwal P, Mehta S (2017) Empirical analysis of five nature-inspired algorithms on real parameter optimization problems. Artif Intell Rev. https://doi.org/10.1007/s10462-017-9547-5
Agarwal P, Mehta S (2019a) ABC_DE_FP: a novel hybrid algorithm for complex continuous optimisation problems. Int J Bio Inspired Comput 14(1):46–61. https://doi.org/10.1504/ijbic.2018.10014476
Agarwal P, Mehta S (2019b) Subspace clustering of high dimensional data using differential evolution. In: Nature-inspired algorithms for big data frameworks, pp 47–74, IGI Global. https://doi.org/10.4018/978-1-5225-5852-1.ch003
Aggarwal C, Wolf J, Yu P, Procopiuc C, Park J (1999) Fast algorithm for projected clustering. SIGMOD 28:61–72
Agrawal R, Mannila H, Srikant R, Toivonen H, Verkamo A (1996) Fast discovery of association rules. Adv Knowl Discov Data Min 12:307–328
Assent I (2012) Clustering high dimensional data. Wiley Interdiscip Rev Data Min Knowl Discov 2(4):340–350. https://doi.org/10.1002/widm.1062
Assent I, Krieger R, Muller E, Seidl T (2008) INSCY: indexing subspace clusters with in-process-removal of redundancy. In: ICDM, pp 719–724
Bache K, Lichman M (2006) UCI machine learning repository. http://archive.ics.uci.edu/ml
Brazdil P, Soares C (2000) A comparison of ranking methods for classification algorithm selection. Mach Learn ECML 2000(1810):63–75. https://doi.org/10.1007/3-540-45164-1_8
Daszykowski M, Walczak B, Massart DL (2001) Looking for natural patterns in data: Part 1. density-based approach. Chemom Intell Lab Syst 56:83–92
Demšar J (2006) Statistical Comparisons of Classifiers over Multiple Data Sets. J Mach Learn Res 7:1–30. https://doi.org/10.1016/j.jecp.2010.03.005
Deng Z, Choi KS, Jiang Y, Wang J, Wang S (2016) A survey on soft subspace clustering. Inf Sci 348:84–106. https://doi.org/10.1016/j.ins.2016.01.101
Domeniconi C, Papadopoulos D, Gunopulos D, Ma S (2004) Subspace clustering of high dimensional data. In: SIAM international conference on data mining, pp 31–40
Fahad A, Alshatri N, Tari Z, Alamri A, Khalil I, Zomaya AY, Foufou S, Bouras A (2014) A survey of clustering algorithms for big data: Taxonomy and empirical analysis. IEEE Trans Emerg Top Comput 2(3):267–279. https://doi.org/10.1109/TETC.2014.2330519
Kailing K, Kriegel HP, Oger PK (2004) Density-connected subspace clustering for high-dimensional data. In: SDM, pp 246–257
Karaboga D, Basturk B (2007) A powerful and efficient algorithm for numerical function optimization: artificial bee colony (ABC) algorithm. J Global Optim 39(3):459–471. https://doi.org/10.1007/s10898-007-9149-x
Karami A, Johansson R (2014) Choosing DBSCAN parameters automatically using differential evolution. Int J Comput Appl 91(7):1–11
Kaur A, Datta A (2014) SUBSCALE: fast and scalable subspace clustering for high dimensional data. In: IEEE international conference on data mining workshops, ICDM, pp 621–628. https://doi.org/10.1109/ICDMW.2014.100
Kaur A, Datta A (2015) A novel algorithm for fast and scalable subspace clustering of high-dimensional data. J Big Data 2(1):17. https://doi.org/10.1186/s40537-015-0027-y
Kriegel HP, Kroger P, Renz M, Wurst S (2005) A generic framework for efficient subspace clustering of high-dimensional data. In: ICDM, pp 250–257
Kriegel H-P, Kröger P, Zimek A (2009) Clustering high-dimensional data: a survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Trans Knowl Discov Data 3(1):1–58. https://doi.org/10.1145/1497577.1497578
Kumar D, Bezdek JC, Palaniswami M, Rajasegarar S, Leckie C, Havens TC (2016) A hybrid approach to clustering in big data. IEEE Trans Cybern 46(10):2372–2385. https://doi.org/10.1109/TCYB.2015.2477416
Lin L, Gen M, Liang Y (2014) A hybrid EA for high-dimensional subspace clustering problem. In: Proceedings of the 2014 IEEE congress on evolutionary computation, CEC 2014, pp 2855–2860. https://doi.org/10.1109/CEC.2014.6900313
Liu X, Wang J, Cheng D, Shi D, Zhang Y (2020) Non-convex low-rank representation combined with rank-one matrix sum for subspace clustering. Soft Comput. https://doi.org/10.1007/s00500-020-04865-0
Lu Y, Wang S, Li S, Zhou C (2011) Particle swarm optimizer for variable weighting in clustering high-dimensional data. Mach Learn 82(1):43–70. https://doi.org/10.1007/s10994-009-5154-2
Moise G, Sander J (2008) Finding non-redundant, statistically significant regions in high dimensional data: a novel approach to projected and subspace clustering. In: KDD, pp 533–541.
Moise G, Sander J, Ester M (2006) P3C: a robust projected clustering algorithm. In: ICDM, pp 414–425
Müller E Günnemann S, Assent I, Seidl T, Färber I (2009) Evaluating clustering in subspace projections of high dimensional data. http://dme.rwth-aachen.de/en/OpenSubspace/evaluation
Müller E, Günnemann S, Assent I, Seidl T (2009b) Evaluating clustering in subspace projections of high dimensional data. Proc VLDB Endow 2(1):1270–1281
Parsons L, Haque E, Liu H (2004) Subspace clustering for high dimensional data: a review. ACM SIGKDD Explor Newsl 6(1):90–105. https://doi.org/10.1145/1007730.1007731
Pavlyukevich I (2007) Lévy Flight, non local search and simulated annealing. J Comput Phys 226:1830–1844. https://doi.org/10.1016/j.jcp.2007.06.008
Pesevski A, Franczak BC, McNicholas PD (2018) Subspace clustering with the multivariate-t distribution. Pattern Recogn Lett 112(2002):297–302. https://doi.org/10.1016/j.patrec.2018.07.003
Procopiuc CEA (2002) A monte carlo algorithm for fast projective clustering. In: SIGMOD, pp 418–427
Road H, Jose S (1998) Automatic subspace clustering mining of high dimensional applications for data. In: Proceedings of the 1998 ACM SIGMOD international conference on management of data, 27, pp 94–105. https://doi.org/10.1145/276305.276314
Sarafis IA, Trinder PW, Zalzala AMS (2003) Towards effective subspace clustering with an evolutionary algorithm. In: 2003 congress on evolutionary computation, CEC 2003 - proceedings, 2, pp 797–806. https://doi.org/10.1109/CEC.2003.1299749
Sequeira K, Zaki M (2004) SCHISM: a new approach for interesting subspace mining. In: ICDM, pp 186–193
Steinbach M, Ertoz L, Kumar V (2003) The challenges of clustering high dimensional data. In: New vistas in statistical physics, applications in econophysics, bioinformatics, and pattern recognition, pp 1–33
Steinbach M, Levent E, Kumar V (2004) The challenges of clustering high dimensional data. In: New vistas in statistical physics, applications in econophysics, bioinformatics, and pattern recognition, pp 273–309. https://doi.org/10.1007/978-3-662-08968-2_16
Storn R, Price K (1997) Differential evolution – a simple and efficient heutistic for global optimization over continuous spaces. J Global Optim 11:341–359
Timmerman ME, Ceulemans E, De Roover K, Van Leeuwen K (2013) Subspace K-means clustering. Behav Res Methods 45(4):1011–1023. https://doi.org/10.3758/s13428-013-0329-y
Yan F, Wang XD, Zeng ZQ, Hong CQ (2020) Adaptive multi-view subspace clustering for high-dimensional data. Pattern Recognit Lett 130:299–305. https://doi.org/10.1016/j.patrec.2019.01.016
Yang X-SS (2012a) Flower pollination algorithm for global optimization. Unconvent Comput Nat Comput 7445:240–249. https://doi.org/10.1007/978-3-642-32894-7_27
Yang XS (2012). Flower pollination algorithm for global optimization. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), 7445 LNCS, pp 240–249. https://doi.org/10.1007/978-3-642-32894-7_27
Yiu ML, Mamoulis N (2003) Frequent-pattern based iterative projected clustering. In: ICDM, pp 689–692
Zhao X, An G, Cen Y, Wang H, Zhao R (2019) Robust discriminant low-rank representation for subspace clustering. Soft Comput 23(16):7005–7013. https://doi.org/10.1007/s00500-018-3339-y
Zhong G, Pun CM (2020) Subspace clustering by simultaneously feature selection and similarity learning. Knowl-Based Syst 193:105512. https://doi.org/10.1016/j.knosys.2020.105512
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
All author declares that he/she has no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Agarwal, P., Mehta, S. & Abraham, A. A meta-heuristic density-based subspace clustering algorithm for high-dimensional data. Soft Comput 25, 10237–10256 (2021). https://doi.org/10.1007/s00500-021-05973-1
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-021-05973-1