Abstract
Fast centroid-based clustering algorithms such as k-means usually converge to a local optimum. In this work, we propose a method for constructing a better clustering from two such suboptimal clustering solutions based on the fact that each suboptimal clustering has benefits regarding to including some of the correct clusters. We develop the new method COTCLUS to find two centroids from one clustering and replace them by two centroids from the other clustering so that the maximum decrease in the mean square error of the first clustering is achieved. After modifying centroids, k-means algorithm with few iterations is applied for fine-tuning. In an iterative algorithm, the resulting clustering is further improved using a new given clustering. The proposed method can find optimal clustering in a very small number of iterations for datasets with well-separated clusters. We demonstrate by experiments that the proposed method outperforms the selected competitive methods.
Similar content being viewed by others
References
Arthur, D., & Vassilvitskii, S. (2007). K-means++: the advantages of careful seeding. Paper Proceedings of 18th annual ACM-SIAM symposium on Discrete algorithms, pp. 1027–1035.
Brusco, M. J., & Steinley, D. (2007). A comparison of heuristic procedures for minimum within-cluster sums of squares partitioning. Psychometrika, 72(4), 583–600.
Celebi, M. E., Kingravi, H. A., & Vela, P. A. (2013). A comparative study of efficient initialization methods for the k-means clustering algorithm. Expert Systems with Applications, 40(1), 200–210.
de Amorim, R. C., Makarenkov, V., & Mirkin, B. (2016). A-Ward pβ: effective hierarchical clustering using the Minkowski metric and a fast k-means initialisation. Information Sciences, 370, 343–354.
Fränti, P., & Kivijärvi, J. (2000). Randomised local search algorithm for the clustering problem. Pattern Analysis & Applications, 3(4), 358–369.
Fränti, P., & Virmajoki, O. (2006). Iterative shrinking method for clustering problems. Pattern Recognition, 39(5), 761–775.
Franti, P., Virmajoki, O., & Hautamaki, V. (2006). Fast agglomerative clustering using a k-nearest neighbor graph. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(11), 1875–1881.
Fred, A. L., & Jain, A. K. (2005). Combining multiple clusterings using evidence accumulation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(6), 835–850.
Hansen, P., & Mladenović, N. (2001). J-means: a new local search heuristic for minimum sum of squares clustering. Pattern Recognition, 34(2), 405–413.
Hruschka, E. R., Campello, R. J., & Freitas, A. A. (2009). A survey of evolutionary algorithms for clustering. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 39(2), 133–155.
Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2(1), 193–218.
Jain, A. K., Murty, M. N., & Flynn, P. J. (1999). Data clustering: a review. ACM computing surveys (CSUR), 31(3), 264–323.
Klein, R. W., & Dubes, R. C. (1989). Experiments in projection and clustering by simulated annealing. Pattern Recognition, 22(2), 213–220.
Likas, A., Vlassis, N., & Verbeek, J. J. (2003). The global k-means clustering algorithm. Pattern Recognition, 36(2), 451–461.
Maulik, U., & Bandyopadhyay, S. (2000). Genetic algorithm-based clustering technique. Pattern Recognition, 33(9), 1455–1465.
Rezaei, M., & Fränti, P. (2016). Set matching measures for external cluster validity. IEEE Transactions on Knowledge and Data Engineering, 28(8), 2173–2186.
Selim, S. Z., & Alsultan, K. (1991). A simulated annealing algorithm for the clustering problem. Pattern Recognition, 24(10), 1003–1008.
Steinley, D. (2003). Local optima in k-means clustering: what you don't know may hurt you. Psychological Methods, 8(3), 294–304.
Steinley, D., & Brusco, M. J. (2007). Initializing k-means batch clustering: a critical evaluation of several techniques. Journal of Classification, 24(1), 99–121.
Tzortzis, G., & Likas, A. (2014). The MinMax k-means clustering algorithm. Pattern Recognition, 47(7), 2505–2516.
Ward, J. H., Jr. (1963). Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, 58(301), 236–244.
Zhang, T., Ramakrishnan, R., & Livny, M. (1997). BIRCH: a new data clustering algorithm and its applications. Data Mining and Knowledge Discovery, 1(2), 141–182.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Rezaei, M. Improving a Centroid-Based Clustering by Using Suitable Centroids from Another Clustering. J Classif 37, 352–365 (2020). https://doi.org/10.1007/s00357-018-9296-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00357-018-9296-4