Wide Gaps and Kleinberg’s Clustering Axioms for k–Means

Cite

The widely applied k-means algorithm produces clusterings that violate our expectations with respect to high/low similarity/density within/between clusters and is in conflict with Kleinberg’s axiomatic system for distance based clustering algorithms that formalizes those expectations. In particular, k-means violates the consistency axiom. We hypothesise that this clash is due to the unexplained expectation that the data themselves should have the property of being clusterable in order to expect the algorithm clustering them to fit a clustering axiomatic system. To demonstrate this, we introduce two new clusterability properties, i.e., variational k-separability and residual k-separability, and show that then Kleinberg’s consistency axiom holds for k-means operating in the Euclidean or non-Euclidean space. Furthermore, we propose extensions of the k-means algorithm that fit approximately Kleinberg’s richness axiom, which does not hold for k-means. In this way, we reconcile k-means with Kleinberg’s axiomatic framework in Euclidean and non-Euclidean settings. Besides contribution to the theory of axiomatic frameworks of clustering and to clusterability theory, the practical benefit is the possibility to construct datasets for testing purposes of algorithms optimizing the k-means cost function. This includes a method of construction of clusterable data with a global optimum known in advance.

eISSN:
2083-8492
Language:
English
Publication timeframe:
4 times per year
Journal Subjects:
Mathematics, Applied Mathematics