Welcome to the InfoSci Platform

Estimating the Number of Clusters in High-Dimensional Large Datasets

Xutong Zhu (Heilongjiang University, China) and Lingli Li (Heilongjiang University, China)

Source Title: International Journal of Data Warehousing and Mining (IJDWM)19(2)

ISSN: 1548-3924|EISSN: 1548-3932|EISBN13: 9781668488157|DOI: 10.4018/IJDWM.316142

MLA

Zhu, Xutong, and Lingli Li. "Estimating the Number of Clusters in High-Dimensional Large Datasets." IJDWM vol.19, no.2 2023: pp.1-14. https://doi.org/10.4018/IJDWM.316142

APA

Zhu, X. & Li, L. (2023). Estimating the Number of Clusters in High-Dimensional Large Datasets. International Journal of Data Warehousing and Mining (IJDWM), 19(2), 1-14. https://doi.org/10.4018/IJDWM.316142

Chicago

Zhu, Xutong, and Lingli Li. "Estimating the Number of Clusters in High-Dimensional Large Datasets," International Journal of Data Warehousing and Mining (IJDWM) 19, no.2: 1-14. https://doi.org/10.4018/IJDWM.316142

Export Reference

Favorite Full-Issue Download

View Full Text HTML

View Full Text PDF

Abstract

Clustering is a basic primer of exploratory tasks. In order to obtain valuable results, the parameters in the clustering algorithm, the number of clusters must be set appropriately. Existing methods for determining the number of clusters perform well on low-dimensional small datasets, but how to effectively determine the optimal number of clusters on large high-dimensional datasets is still a challenging problem. In this paper, the authors design a method for effectively estimating the optimal number of clusters on large-scale high-dimensional datasets that can overcome the shortcomings of existing estimation methods and accurately and quickly estimate the optimal number of clusters on large-scale high-dimensional datasets. Extensive experiments show that it (1) outperforms existing estimation methods in accuracy and efficiency, (2) generalizes across different datasets, and (3) is suitable for high-dimensional large datasets.