Authors:
Libero Nigro
1
and
Franco Cicirelli
2
Affiliations:
1
Engineering Department of Informatics Modelling Electronics and Systems Science, University of Calabria, Rende, Italy
;
2
CNR-National Research Council of Italy–Inst. for High Performance Computing and Networking (ICAR), Rende, Italy
Keyword(s):
K-Means Clustering, Seeding Procedure, Greedy K-Means++, Clustering Accuracy Indexes, Java Parallel Streams, Benchmark and Real-World Datasets, Execution Performance.
Abstract:
This paper proposes a variation of the K-Means clustering algorithm, named Population-Based K-Means (PB-K-MEANS), which founds its behaviour on careful seeding. The new K-Means algorithm rests on a greedy version of the K-Means++ seeding procedure (g_kmeans++), which proves effective in the search for an accurate clustering solution. PB-K-MEANS first builds a population of candidate solutions by independent runs of K-Means with g_kmeans++. Then the reservoir is used for recombining the stored solutions by Repeated K-Means toward the attainment of a final solution which minimizes the distortion index. PB-K-MEANS is currently implemented in Java through parallel streams and lambda expressions. The paper first recalls basic concepts of clustering and of K-Means together with the role of the seeding procedure, then it goes on by describing basic design and implementation issues of PB-K-MEANS. After that, simulation experiments carried out both on synthetic and real-world datasets are rep
orted, confirming good execution performance and careful clustering.
(More)