A meta-heuristic density-based subspace clustering algorithm for high-dimensional data

Agarwal, Parul; Mehta, Shikha; Abraham, Ajith

doi:10.1007/s00500-021-05973-1

A meta-heuristic density-based subspace clustering algorithm for high-dimensional data

Optimization
Published: 21 June 2021

Volume 25, pages 10237–10256, (2021)
Cite this article

Soft Computing Aims and scope Submit manuscript

485 Accesses
14 Citations
Explore all metrics

Abstract

Subspace clustering is one of the efficient techniques for determining the clusters in different subsets of dimensions. Ideally, these techniques should find all possible non-redundant clusters in which the data point participates. Unfortunately, existing hard subspace clustering algorithms fail to satisfy this property. Additionally, with the increase in dimensions of data, classical subspace algorithms become inefficient. This work presents a new density-based subspace clustering algorithm (S_FAD) to overcome the drawbacks of classical algorithms. S_FAD is based on a bottom-up approach and finds subspace clusters of varied density using different parameters of the DBSCAN algorithm. The algorithm optimizes parameters of the DBCAN algorithm through a hybrid meta-heuristic algorithm and uses hashing concepts to discover all non-redundant subspace clusters. The efficacy of S_FAD is evaluated against various existing subspace clustering algorithms on artificial and real datasets in terms of F_Score and rand_index. Performance is assessed based on three parameters: average ranking, SRR ranking, and scalability on varied dimensions. Statistical analysis is performed through the Wilcoxon signed-rank test. Results reveal that S_FAD performs considerably better on the majority of the datasets and scales well up to 6400 dimensions on the actual dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Analyzing Subspace Clustering Approaches for High Dimensional Data

Efficient hybrid algorithms for density based subspace clustering to deal with density divergence for improved quality and conciseness

Article 30 October 2019

ASCRClu: an adaptive subspace combination and reduction algorithm for clustering of high-dimensional data

Article 22 April 2020

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Abualigah LMQ (2019) Feature selection and enhanced krill herd algorithm for text document clustering. Springer
Book Google Scholar
Abualigah L, Gandomi AH, Elaziz MA, Hussien AG, Khasawneh AM, Alshinwan M, Houssein EH (2020) Nature-inspired optimization algorithms for text document clustering: a comprehensive analysis. Algorithms. https://doi.org/10.3390/a13120345
Article Google Scholar
Abualigah L, Diabat A, Mirjalili S, Abd Elaziz M, Gandomi AH (2021a) The arithmetic optimization algorithm. Comput Methods Appl Mech Eng 376:1609. https://doi.org/10.1016/j.cma.2020.113609
Article MathSciNet MATH Google Scholar
Abualigah L, Gandomi AH, Elaziz MA, Hamad HA, Omari M, Alshinwan M, Khasawneh AM (2021b) Advances in meta-heuristic optimization algorithms in big data text clustering. Electronics. https://doi.org/10.3390/electronics10020101
Article Google Scholar
Agarwal P, Mehta S (2014) Nature-inspired algorithms: state-of-art, problems and prospects. International Journal of Computer Applications 100(14):14–21. https://doi.org/10.5120/17593-8331
Article Google Scholar
Agarwal P, Mehta S (2015) Comparative analysis of nature inspired algorithms on data clustering. In: IEEE international conference on research in computational intelligence and communication networks (ICRCICN), pp 119–124
Agarwal P, Mehta S (2016) Enhanced flower pollination algorithm on data clustering. Int J Comput Appl 7074:144–155. https://doi.org/10.1080/1206212X.2016.1224401
Article Google Scholar
Agarwal P, Mehta S (2017) Empirical analysis of five nature-inspired algorithms on real parameter optimization problems. Artif Intell Rev. https://doi.org/10.1007/s10462-017-9547-5
Article Google Scholar
Agarwal P, Mehta S (2019a) ABC_DE_FP: a novel hybrid algorithm for complex continuous optimisation problems. Int J Bio Inspired Comput 14(1):46–61. https://doi.org/10.1504/ijbic.2018.10014476
Article Google Scholar
Agarwal P, Mehta S (2019b) Subspace clustering of high dimensional data using differential evolution. In: Nature-inspired algorithms for big data frameworks, pp 47–74, IGI Global. https://doi.org/10.4018/978-1-5225-5852-1.ch003
Aggarwal C, Wolf J, Yu P, Procopiuc C, Park J (1999) Fast algorithm for projected clustering. SIGMOD 28:61–72
Article Google Scholar
Agrawal R, Mannila H, Srikant R, Toivonen H, Verkamo A (1996) Fast discovery of association rules. Adv Knowl Discov Data Min 12:307–328
Google Scholar
Assent I (2012) Clustering high dimensional data. Wiley Interdiscip Rev Data Min Knowl Discov 2(4):340–350. https://doi.org/10.1002/widm.1062
Article Google Scholar
Assent I, Krieger R, Muller E, Seidl T (2008) INSCY: indexing subspace clusters with in-process-removal of redundancy. In: ICDM, pp 719–724
Bache K, Lichman M (2006) UCI machine learning repository. http://archive.ics.uci.edu/ml
Brazdil P, Soares C (2000) A comparison of ranking methods for classification algorithm selection. Mach Learn ECML 2000(1810):63–75. https://doi.org/10.1007/3-540-45164-1_8
Article Google Scholar
Daszykowski M, Walczak B, Massart DL (2001) Looking for natural patterns in data: Part 1. density-based approach. Chemom Intell Lab Syst 56:83–92
Article Google Scholar
Demšar J (2006) Statistical Comparisons of Classifiers over Multiple Data Sets. J Mach Learn Res 7:1–30. https://doi.org/10.1016/j.jecp.2010.03.005
Article MathSciNet MATH Google Scholar
Deng Z, Choi KS, Jiang Y, Wang J, Wang S (2016) A survey on soft subspace clustering. Inf Sci 348:84–106. https://doi.org/10.1016/j.ins.2016.01.101
Article MathSciNet MATH Google Scholar
Domeniconi C, Papadopoulos D, Gunopulos D, Ma S (2004) Subspace clustering of high dimensional data. In: SIAM international conference on data mining, pp 31–40
Fahad A, Alshatri N, Tari Z, Alamri A, Khalil I, Zomaya AY, Foufou S, Bouras A (2014) A survey of clustering algorithms for big data: Taxonomy and empirical analysis. IEEE Trans Emerg Top Comput 2(3):267–279. https://doi.org/10.1109/TETC.2014.2330519
Article Google Scholar
Kailing K, Kriegel HP, Oger PK (2004) Density-connected subspace clustering for high-dimensional data. In: SDM, pp 246–257
Karaboga D, Basturk B (2007) A powerful and efficient algorithm for numerical function optimization: artificial bee colony (ABC) algorithm. J Global Optim 39(3):459–471. https://doi.org/10.1007/s10898-007-9149-x
Article MathSciNet MATH Google Scholar
Karami A, Johansson R (2014) Choosing DBSCAN parameters automatically using differential evolution. Int J Comput Appl 91(7):1–11
Google Scholar
Kaur A, Datta A (2014) SUBSCALE: fast and scalable subspace clustering for high dimensional data. In: IEEE international conference on data mining workshops, ICDM, pp 621–628. https://doi.org/10.1109/ICDMW.2014.100
Kaur A, Datta A (2015) A novel algorithm for fast and scalable subspace clustering of high-dimensional data. J Big Data 2(1):17. https://doi.org/10.1186/s40537-015-0027-y
Article Google Scholar
Kriegel HP, Kroger P, Renz M, Wurst S (2005) A generic framework for efficient subspace clustering of high-dimensional data. In: ICDM, pp 250–257
Kriegel H-P, Kröger P, Zimek A (2009) Clustering high-dimensional data: a survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Trans Knowl Discov Data 3(1):1–58. https://doi.org/10.1145/1497577.1497578
Article Google Scholar
Kumar D, Bezdek JC, Palaniswami M, Rajasegarar S, Leckie C, Havens TC (2016) A hybrid approach to clustering in big data. IEEE Trans Cybern 46(10):2372–2385. https://doi.org/10.1109/TCYB.2015.2477416
Article Google Scholar
Lin L, Gen M, Liang Y (2014) A hybrid EA for high-dimensional subspace clustering problem. In: Proceedings of the 2014 IEEE congress on evolutionary computation, CEC 2014, pp 2855–2860. https://doi.org/10.1109/CEC.2014.6900313
Liu X, Wang J, Cheng D, Shi D, Zhang Y (2020) Non-convex low-rank representation combined with rank-one matrix sum for subspace clustering. Soft Comput. https://doi.org/10.1007/s00500-020-04865-0
Article Google Scholar
Lu Y, Wang S, Li S, Zhou C (2011) Particle swarm optimizer for variable weighting in clustering high-dimensional data. Mach Learn 82(1):43–70. https://doi.org/10.1007/s10994-009-5154-2
Article MathSciNet Google Scholar
Moise G, Sander J (2008) Finding non-redundant, statistically significant regions in high dimensional data: a novel approach to projected and subspace clustering. In: KDD, pp 533–541.
Moise G, Sander J, Ester M (2006) P3C: a robust projected clustering algorithm. In: ICDM, pp 414–425
Müller E Günnemann S, Assent I, Seidl T, Färber I (2009) Evaluating clustering in subspace projections of high dimensional data. http://dme.rwth-aachen.de/en/OpenSubspace/evaluation
Müller E, Günnemann S, Assent I, Seidl T (2009b) Evaluating clustering in subspace projections of high dimensional data. Proc VLDB Endow 2(1):1270–1281
Article Google Scholar
Parsons L, Haque E, Liu H (2004) Subspace clustering for high dimensional data: a review. ACM SIGKDD Explor Newsl 6(1):90–105. https://doi.org/10.1145/1007730.1007731
Article Google Scholar
Pavlyukevich I (2007) Lévy Flight, non local search and simulated annealing. J Comput Phys 226:1830–1844. https://doi.org/10.1016/j.jcp.2007.06.008
Article MathSciNet MATH Google Scholar
Pesevski A, Franczak BC, McNicholas PD (2018) Subspace clustering with the multivariate-t distribution. Pattern Recogn Lett 112(2002):297–302. https://doi.org/10.1016/j.patrec.2018.07.003
Article Google Scholar
Procopiuc CEA (2002) A monte carlo algorithm for fast projective clustering. In: SIGMOD, pp 418–427
Road H, Jose S (1998) Automatic subspace clustering mining of high dimensional applications for data. In: Proceedings of the 1998 ACM SIGMOD international conference on management of data, 27, pp 94–105. https://doi.org/10.1145/276305.276314
Sarafis IA, Trinder PW, Zalzala AMS (2003) Towards effective subspace clustering with an evolutionary algorithm. In: 2003 congress on evolutionary computation, CEC 2003 - proceedings, 2, pp 797–806. https://doi.org/10.1109/CEC.2003.1299749
Sequeira K, Zaki M (2004) SCHISM: a new approach for interesting subspace mining. In: ICDM, pp 186–193
Steinbach M, Ertoz L, Kumar V (2003) The challenges of clustering high dimensional data. In: New vistas in statistical physics, applications in econophysics, bioinformatics, and pattern recognition, pp 1–33
Steinbach M, Levent E, Kumar V (2004) The challenges of clustering high dimensional data. In: New vistas in statistical physics, applications in econophysics, bioinformatics, and pattern recognition, pp 273–309. https://doi.org/10.1007/978-3-662-08968-2_16
Storn R, Price K (1997) Differential evolution – a simple and efficient heutistic for global optimization over continuous spaces. J Global Optim 11:341–359
Article MathSciNet Google Scholar
Timmerman ME, Ceulemans E, De Roover K, Van Leeuwen K (2013) Subspace K-means clustering. Behav Res Methods 45(4):1011–1023. https://doi.org/10.3758/s13428-013-0329-y
Article Google Scholar
Yan F, Wang XD, Zeng ZQ, Hong CQ (2020) Adaptive multi-view subspace clustering for high-dimensional data. Pattern Recognit Lett 130:299–305. https://doi.org/10.1016/j.patrec.2019.01.016
Article Google Scholar
Yang X-SS (2012a) Flower pollination algorithm for global optimization. Unconvent Comput Nat Comput 7445:240–249. https://doi.org/10.1007/978-3-642-32894-7_27
Article MATH Google Scholar
Yang XS (2012). Flower pollination algorithm for global optimization. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), 7445 LNCS, pp 240–249. https://doi.org/10.1007/978-3-642-32894-7_27
Yiu ML, Mamoulis N (2003) Frequent-pattern based iterative projected clustering. In: ICDM, pp 689–692
Zhao X, An G, Cen Y, Wang H, Zhao R (2019) Robust discriminant low-rank representation for subspace clustering. Soft Comput 23(16):7005–7013. https://doi.org/10.1007/s00500-018-3339-y
Article Google Scholar
Zhong G, Pun CM (2020) Subspace clustering by simultaneously feature selection and similarity learning. Knowl-Based Syst 193:105512. https://doi.org/10.1016/j.knosys.2020.105512
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Information Technology, Jaypee Institute of Information Technology, Sector-62, Noida, India
Parul Agarwal & Shikha Mehta
Machine Intelligence Research Labs (MIR Labs, Scientific Network for Innovation and Research Excellence Auburn, Washington, USA
Ajith Abraham

Authors

Parul Agarwal
View author publications
You can also search for this author inPubMed Google Scholar
Shikha Mehta
View author publications
You can also search for this author inPubMed Google Scholar
Ajith Abraham
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Parul Agarwal.

Ethics declarations

Conflict of interest

All author declares that he/she has no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Agarwal, P., Mehta, S. & Abraham, A. A meta-heuristic density-based subspace clustering algorithm for high-dimensional data. Soft Comput 25, 10237–10256 (2021). https://doi.org/10.1007/s00500-021-05973-1

Download citation

Accepted: 11 June 2021
Published: 21 June 2021
Issue Date: August 2021
DOI: https://doi.org/10.1007/s00500-021-05973-1

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

A meta-heuristic density-based subspace clustering algorithm for high-dimensional data

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Analyzing Subspace Clustering Approaches for High Dimensional Data

Efficient hybrid algorithms for density based subspace clustering to deal with density divergence for improved quality and conciseness

ASCRClu: an adaptive subspace combination and reduction algorithm for clustering of high-dimensional data

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now