Estimating the Number of Clusters in High-Dimensional Large Datasets | IGI Global Scientific Publishing
Reference Hub3
Estimating the Number of Clusters in High-Dimensional Large Datasets

Estimating the Number of Clusters in High-Dimensional Large Datasets

Xutong Zhu, Lingli Li
Copyright: © 2023 |Volume: 19 |Issue: 2 |Pages: 14
ISSN: 1548-3924|EISSN: 1548-3932|EISBN13: 9781668488157|DOI: 10.4018/IJDWM.316142
Cite Article Cite Article

MLA

Zhu, Xutong, and Lingli Li. "Estimating the Number of Clusters in High-Dimensional Large Datasets." IJDWM vol.19, no.2 2023: pp.1-14. https://doi.org/10.4018/IJDWM.316142

APA

Zhu, X. & Li, L. (2023). Estimating the Number of Clusters in High-Dimensional Large Datasets. International Journal of Data Warehousing and Mining (IJDWM), 19(2), 1-14. https://doi.org/10.4018/IJDWM.316142

Chicago

Zhu, Xutong, and Lingli Li. "Estimating the Number of Clusters in High-Dimensional Large Datasets," International Journal of Data Warehousing and Mining (IJDWM) 19, no.2: 1-14. https://doi.org/10.4018/IJDWM.316142

Export Reference

Mendeley
Favorite Full-Issue Download

Abstract

Clustering is a basic primer of exploratory tasks. In order to obtain valuable results, the parameters in the clustering algorithm, the number of clusters must be set appropriately. Existing methods for determining the number of clusters perform well on low-dimensional small datasets, but how to effectively determine the optimal number of clusters on large high-dimensional datasets is still a challenging problem. In this paper, the authors design a method for effectively estimating the optimal number of clusters on large-scale high-dimensional datasets that can overcome the shortcomings of existing estimation methods and accurately and quickly estimate the optimal number of clusters on large-scale high-dimensional datasets. Extensive experiments show that it (1) outperforms existing estimation methods in accuracy and efficiency, (2) generalizes across different datasets, and (3) is suitable for high-dimensional large datasets.